Monday, March 12, 2007 at 1:48 PM
I love data. That was no small factor in my decision to join Google. I started a month ago and have already learned a lot, but I wanted to try and put together many of the components I've learned to solve an interesting problem. So, last week I made one up: map the world, based on the frequency of its locations mentioned in books.
We've all seen views of the Earth from space, where the numerous pinpoints of light on the ground combine to yield a speckled map of the world. I wanted to show the Earth viewed from books, where individual mentions of locations in books combine to yield another interpretation of the globe. The intensity of each pixel is proportional to the number of times the location at a given set of coordinates is mentioned across all of the books in Google Books Search.
Fortunately, the hard part was already done: someone had already written the code to get place names and map coordinates from books. As he explained in a previous post, books like this guide to Boston now show a Google Map with the locations mentioned in the book marked on the map. So, I wrote a little program using MapReduce, Bigtable and other cool Google stuff. Running on my desktop would have taken days, but thanks to the wonder that is the Google infrastructure, I had a map in forty-five minutes.
Naturally, this yields some biases towards cities, but it's still interesting. And there's a lot of additional analysis that's fun to do. Filtering the map by publication date, you can see global patterns like the growth and westward expansion of the United States in the 19th century.
Wow, data is fun.