CWYAlpha

Just another WordPress.com site

Thought this was cool: Playing with Go: Embarrassingly Parallel Scripts // Collective Idea

leave a comment »


Comments: “Playing with Go: Embarrassingly Parallel Scripts // Collective Idea”

URL: http://collectiveidea.com/blog/archives/2012/12/03/playing-with-go-embarrassingly-parallel-scripts/

I recently needed to take a list of domain and find which ones point to a specific IP address. For a small list, say less than 10, manually running dig in the console would work great, but this list had almost 800 domains so I needed a script. As domain lookup is a network request and thus very slow, setting up the domain requests in parallel made sense. I could easily just do this in Ruby, my language du-jour, but I’ve done this type of thread work before and frankly it can be tedious to set up, fragile, and still won’t have access to all of my system’s resources due to the GVL. I’ve been keeping an eye on Google’s Go for some time now and decided to see how it handled this problem.

I’ve been intrigued by Go since it was originally announced about three years ago. Here was a compiled, fast, light-weight, low level language with many of the features we take for granted these days, such as garbage collection, while also adding on a very sophisticated concurrency model similar to what’s found in Erlang: very lightweight internal processes managed by the runtime. Sounds like a perfect fit for my requirements.

The code I ended up with is here: https://gist.github.com/4170926. For the sake of comparisons I built a sequential version of the script as well as the parallel version and added timings for running both scripts against the full list of domains.

Running these scripts for yourself is a one-liner: go run [script.go]. The input file domains.txt needs to be a newline-delimited list of domains. I’ll go over the more confusing parts of the two scripts to help with understanding what’s really going on here.

Objects?

Go’s object model is very close to C’s: structs with data and methods that operate on said structs. Both scripts only use a small, two-element struct, DomainMap, to keep track of the IP address found for a given domain. I use the short-form to initialization new instances of the DomainMap structure. The order of values maps directly to the order of the defined fields at the top of the scripts.

type DomainMap struct {
 Domain string
 IpMapping string
}
object := DomainMap{domain, ipAddress}
object.Domain == domain
object.IpMapping == ipAddress

Error handling

Go does error handling by returning multiple values from a function, where the second return value is expected to be a value of type error. You can ignore this with the _ variable.

rawIpAddresses, _ := net.LookupIP(domain)

Parallelism

The parallel version of the script has some new concepts that need explaining, particularly goroutines, channels, and channel communication.

A goroutine is a very lightweight process, sort of like a Ruby Fiber. Creating one is simple:

go domainLookup(responseChannel, domain)

Go will grab the function call after the go keyword and execute it in parallel. However, given that we’re no longer in the main process, we can’t just return values from the function. We now need a different way to get the return value. This is where channels come in.

responseChannel := make(chan DomainMap)

As Go is a statically typed language, we need to define the type of channel being created. Channels can only accept data of the same type as the channel. Communication through channels is done with the reverse-stabby operator <-, which should be read as “the data on the right side is flowing to the left side”:

// Write into a channel
returnChannel <- DomainMap{domain, ipAddress}
// Read from the channel
domainMap := <- responseChannel

And that’s all the special syntax. The only real difference between the parallel and sequential scripts is the map-reduce-esque setup to wait for all the goroutines to finish. I didn’t need to worry about thread pooling, system capabilities, or thread safety. Go makes it so easy to write truly parallel code that there’s no excuse not to anymore. I was able to run almost 800 goroutines (one per domain) all throwing out DNS queries and coming back in less than 10 seconds, in a script that doesn’t even look like it’s running in parallel.

Now that Go 1.0 stable is out, it’s a great time to get familiar with this language. I highly recommend checking out the Tour of Go for basic introductions into every major feature of the language, and there’s a ton of documentation on the main website golang.org. For the little bit of time I’ve played with Go now, I see a very bright future for this language.

Global VM Lock, more about Ruby’s concurrency here: http://www.engineyard.com/blog/2011/ruby-concurrency-and-you/


from Hacker News 50: http://collectiveidea.com/blog/archives/2012/12/03/playing-with-go-embarrassingly-parallel-scripts/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+hacker-news-feed-50+%28Hacker+News+50%29

Written by cwyalpha

十二月 4, 2012 在 5:53 上午

发表在 Uncategorized

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: