Marzhill Musings

...

go-html-transform an html transformation and scraping library

Published On: 2013-02-26 17:05:00

http://code.google.com/p/go-html-transform is my html transformation library for go. I use it as an html templating language and scraping library. It's not your typical approach to html templating but it's an approach I've really come to enjoy. HTML templating can be grouped in roughly about 3 categories.

  1. Templating languages.
  2. HTML DSLs.
  3. Functional transforms.

go-html-transform is an example of that last one. The basic theory is that an html template is just data. No logic is in the template. All the logic is in the functions that operate on the template and any input data. Using the input data you can transfrom a template and then render the transformed AST back into html. This has a number of benefits.

  • Your template transforms are context aware.
  • Multipass templating is just another transform.
  • All your logic is expressed in real honest to goodness code not a limited templating language. In the case of go-html-transform your templating logic is actually typechecked by the go compiler.
  • It's impossible to generate bad html.
  • Your mocks are your templates.
  • You can use an html dsl in combination with this approach as well if the dsl outputs the same AST.

Example usage.

    1 package main
    2 
    3 import (
    4   "strings"
    5   "os"
    6 
    7   "code.google.com/p/go-html-transform/html/transform"
    8   "code.google.com/p/go-html-transform/h5"
    9 )
   10 
   11 func toSSL(url string) string {
   12   return strings.Replace("http:", "https:", 1)
   13 }
   14 
   15 func main() {
   16   f, err := os.Open("~/file.html")
   17   defer f.Close()
   18   if err != nil { return } // handle errors here.
   19   tree, err := transform.NewDocFromReader(f)
   20   if err != nil { return } // handle errors here.
   21   t := transform.NewTransformer(tree)
   22   t.ApplyAll(
   23     Trans(ReplaceChildren(h5.Text("foo"), "span"), // replace every span tags contents with foo
   24     // turn every link and img into an ssl link
   25     Trans(TransformAttrib("href", toSSL), "a"),
   26     Trans(TransformAttrib("src", toSSL), "img"),
   27   )
   28 
   29   t.Render(os.Stdout) // render html to stdout.
   30 }

Tags:

A long overdue update

Published On: 2013-02-13 22:24:00

Hello there,

I haven't updated this blog in over two years. That's a long time for something on the internet. I still get traffic for various articles though so I figured I should probably get back into the swing of things.

I've been working at Google as some of you may already know. I also still do a lot of hacking in my spare time. Lately most of it has been in Go. But I do the occasional thing in Javascript, Erlang, and Haskell as well.

So lets take stock shall we?

  • Latest Projects:
  • Revamped this blog's html and css. I do hope the colors and typography are better. I'll be continuing to tweak it over the next few weeks.
  • Teaching some classes in Downtown chicago for friends.

I hope to write some articles about Go soon as well. The first will probably be about go-html-transform.

I've also been heavily active in church. We attend FBC Lansing now. It's a healthy church and I now serve on the Deacon board as well as teaching Sunday School and Youth Bible study.


Tags:
...