FourFootCar: Cluster Cities

Building Dopplr

We’re sitting on the grass in the sunshine with a bunch of early Dopplr users, including Stowe Boyd and Stephanie Booth - when Stephanie is the first to voice something we’ve heard a lot from Dopplr users since: “make my trips more ‘fuzzy’”.

By which, she and others meant that they would like to see coincidences in the surrounding area of ‘social spacetime’ to their trip - i.e. “show me if there are going to be people I know nearby the stated destination of my trip when I’m going to be there, as I’d probably like to change my plans a little to see them.”

This is a cornerstone of our goal to help optimise travel for Dopplr users - surfacing information about such near coincidences to let them judge whether to alter their plans to make their trip more worthwhile.

We’re going to be releasing a lot of functionality to exploit fuzzy, social spacetime through the early part of 2008, but the first part of it has leaked out into the journal.

Cluster Cities are the way we’ve made this happen. To explain them, here’s Matt B. with the science bit!

To make the database queries perform well enough to implement this feature, we needed to classify cities in densely populated areas into groups. By considering groups of cities as one, we cut down the work the database has to do when calculating who is affected by someone arriving in their area. We decided that these groups should be small enough that a traveller could reasonably expect to travel between any two cities in a group within a day.

Algorithms to cluster a spatial dataset are well known and not hard to implement. Unfortunately, they take a bit of tuning and experimentation to achieve satisfying results. Intuitively we expect cities like London, Tokyo and San Francisco to be at the centres of their clusters. In reality it’s rather hard to teach the cultural/social/economic conditions that cause this to an algorithm that’s only looking at latitude and longitude.

After some initially disappointing results, I stopped looking solely at the geographical data and considered what I could do if I incorporated the historical trip data that Dopplr has built up over our first year. I quickly came to the obvious conclusion: weight the clustering by the popularity of trip destination and let our travellers decide whether San Francisco or San Jose is the gravitational centre of Silicon Valley.

In analysing the top 2000 destinations I discovered that many of the top cities are very close together — for example, Glasgow and Edinburgh are only 40 miles apart. Again I used our trip data to eliminate overlaps. Within any 50 miles radius, only the most popular of two popular cities gets to be the cluster centre. This decision is one reason for the beauty of our central Raumzeitgeist visualisation. The layout has an appealing rhythm to it because the points in popular areas are a natural and fairly efficient circle packing.

Print this post

FourFootCar

Friday, May 09, 2008

Cluster Cities

Archive

Blog Archive

my del.icio.us

My links

Flickr stream

About Me