goquery

goquery: jQuery-style HTML manipulation in Go

Writing HTML document handling code with parsers is pretty hard. Even plain DOM tree walking can make your code look like spaghetti. That is why libraries like jQuery are popular: they make it easy to do queries and all sorts of other manipulations with HTML documents.

goquery is like jQuery, but in Go.

How goquery works

Goquery is based on the official Go net/html package (this isn’t a standard library package, but an external package officially supported by Go authors). It also uses a CSS selectors library cascadia.

You pass goquery a document, and then query it using selectors, similar to how you would use jQuery.

Using goquery

Install goquery with go get:

go get github.com/PuerkitoBio/goquery

Here’s an example:

package main

import (
        "fmt"
        "log"

        "github.com/PuerkitoBio/goquery"
)

func main() {
        doc, err := goquery.NewDocument("https://blog.golang.org")
        if err != nil {
                log.Fatal(err)
        }

        doc.Find(".article").Each(func(i int, s *goquery.Selection) {
                title := s.Find("h3").Text()
                link, _ := s.Find("h3 a").Attr("href")
                fmt.Printf("%d) %s - %sn", i+1, title, link)
        })
}

This example prints article titles and corresponding links from Go blog:

1) Errors are values - /errors-are-values
2) GothamGo: gophers in the big apple - /gothamgo
3) The Gopher Gala is the first worldwide Go hackathon - /gophergala
4) Generating code - /generate
5) Go 1.4 is released - /go1.4

Let me explain the code. First we create a new document with goquery.NewDocument. As you can see, we pass an URL into this function, and goquery fetches the document for us. Nice! You can also create a document from a Reader (NewDocumentFromReader), from a HTML node (NewDocumentFromNode), or from an HTTP response (NewDocumentFromResponse).

Then we use Find to query this document: we ask it to find everything with article class, then call Each on results (as you can see, we can chain functions just like in jQuery), passing it a function telling it what to do with each found selection: in our case, we again use Find to first find the title and extract its text (Text), and then find a link and extract href from it using Attr (the second returned value, which we ignore with _, indicates whether the attribute exists).

That was easy! If you ever need to extract some data from an HTML document, use goquery to save time.

Source code and license

GitHub: https://github.com/PuerkitoBio/goquery
Documentation: https://godoc.org/github.com/PuerkitoBio/goquery
Author: Martin Angers (@PuerkitoBio)
License: 3-clause BSD

Learn Go

Programming in Go: Creating Applications for the 21st Century

Leave a Reply

Your email address will not be published. Required fields are marked *