The inside story on powering Highlights
Last week I released my latest iOS app, "Highlights", which is designed to show you the best places in your area or anywhere in the world based on crowd sourced data from Gowalla and Foursquare. In this article, I want to explain how the recommendations engine works and also how I managed to solve some of the biggest challenges along the way.
Search by recommendations
During SXSW 2011, I made the decision to move away from doing development on apps based around Gowalla items and instead to apps based on travelling and discovery (mainly due to an excellent talk by Gowalla CEO Josh Williams on the subject of check-in apps). It was in Austin that I had the idea for Highlights as I was constantly trying to find the best places to eat, drink, or visit as a tourist and finding it hard to choose amongst the variety of locations on offer. I initially decided to do some prototyping based on the Gowalla API to see if I could come up with a way to sort places based on popularity but the API only lets you sort by creation date or number of checkins. In my opinion, the number of people at a place does not form a recommendation - more people probably go to McDonald's than the local burger joint yet that does not necessarily make it a better choice for dinner. What I needed was a way to filter based on a recently added Gowalla feature; Highlights (from which the name of the app descends). At the time there was no API call to do this (e.g. searching for the "steak" highlight in Austin) so I posted a request on the Gowalla developer boards - The feature was added as part of the "advanced spot search API" by Andrew Dupont just under a month later.
Whilst toying with the idea of searching by Highlights, I realised that there was a big problem with the way I was going to approach recommendations - whilst Highlights were a definitive recommendation, there just weren't enough of them being created to power an app of the scale I imagined. It would work great in places where Gowalla use is pervasive (e.g. Austin) but even in a capital city like London searching by Highlights just doesn't return enough places. What I needed was a way to give a location a score based on a number of criteria such as number of users, highlights, comments, photos, artwork, etc - basically all the data that was available for a spot. The only way that was going to happen was with a huge API change from Gowalla or if I could somehow get a hold of all the data they had...
PubSubHubbub to the rescue
Back in 2010, Gowalla launched a PubSubHubbub hub powered by Superfeedr that allowed for realtime notifications of checkins either by user or spot. Whereas before you'd have to subscribe to an RSS feed and check it every so often for updates, PubSubHubbub allows for you to subscribe to that feed and have them push the data to you as soon as an update is available. It's very similar to the difference between checking mail on your iPhone manually or having it pushed to you automatically.
At the time, I was working on some funky Gowalla item stuff and so I wanted to use realtime notifications so that I could track items around the world. The biggest issue was that I'd need to manually subscribe to each feed which was tricky as there were around 3 million to follow at the time (it would have taken a month for all the subscriptions to go through if I looped through them one at a time and that doesn't include the work to follow new spots as they're created). I had been chatting to Julien Genestoux (CEO and Founder of Superfeedr) about getting access to the Track API which gets around that problem by letting you subscribe to feeds in a much more powerful way (e.g. by keyword, location, or hub). With access to Track, I'd be able to subscribe to every notification, in realtime, that Gowalla pushed out.
It was whilst chatting to Julien after SXSW that I realised that PubSubHubbub could solve my problems. By using the Track API, my server would be hit with data every time somebody checked in anywhere in the world with Gowalla. I could save that information each time and basically build up my own version of Gowalla's database that I could search with a custom ranking engine. With the theory solved, it was time to get practical and put everything to the test.
The Walrus and The Carpenter
After a few nights I was able to put my first test version together codenamed "The Walrus" (purely because it uses feeds and I like the way the oysters say "feed" during the story of "The Walrus and The Carpenter" in Alice in Wonderland). Its task would be to get the spot information from the data Superfeedr were sending and work out if it was a spot I wanted to keep - I automatically remove a lot of nonsense spots (e.g. Airport Gates, Houses, McDonalds) so there was no sense saving what wasn't needed. Only a certain amount of data is sent through such as the name and location of the spot and some of its imagery, but I would need a lot more than that so when a spot was saved to the database for the first time, a note was placed on it to tell me to go and get more information.
At the same time as The Walrus was churning through the data was being sent, a script on the server (codenamed "The Carpenter") would run once a minute and look through the database for any spots with missing data. When it found one, it would go to the Gowalla API and the Foursquare API and get all of the data it needed and then update its record before marking it as suitable for use within the Highlights app.
It took a few revisions (and a massive server upgrade) until everything was running smoothly but it enabled me to process the thousands of checkins that occur every day in realtime. With thousands of checkins coming in every day, I was ready to write an algorithm that would let me search spots based on popularity.
Search by recommendations: Attempt 2
Whilst I originally started by ranking based on checkins and highlights, I was able to do much more than that with the huge dataset I'd amassed. Every spot is now searched based on a huge range of standard criteria (i.e. number of photos taken at a place) as well as time sensitive information that I have such as frequency between checkins and times of day. This means that a new spot in the area can still be ranked higher than an old place with a huge number of visitors as I can see that it is gaining traction faster and will overtake it at some point in the future.
It wasn't until June 2011 when the first version of the app was in beta and I was visiting my parents in Devon that I realised a second big problem. Places with few active Gowalla or Foursquare users were showing very few spots. This wasn't because of a lack of places or data, but just because nobody had checked in anywhere in the last month meant that no alert for me to get the spot data had been sent to "The Carpenter" script. Essentially, whilst my dataset was vast and comprised of millions of active spots, it wasn't showing those niche places (which are often better) that don't get checked into as often. Of course, these places would get added the next time someone checked in but I didn't want to have gaps when the app launched.
To fix this oversight, if the app finds less than 50 results in the area, it will go and do a manual check with Gowalla and get a basic listing of spots with the rubbish ones removed by the same filter "The Walrus" uses. It adds these to the browse page sorted by checkin but it crucially adds them as "pending" to my database so that "The Carpenter" can check them and rank them for next time. Basically, every time you browse an area, the app learns!
Reviews and Photos
Rather than just listing places, I wanted to show more about them when you went through to a detail page. I was able to show a map, address, telephone numbers, and various other bits but the two key areas of information come from Reviews and Photos. With Reviews, a call is made to both the Gowalla and Foursquare APIs in order to get Highlights and Tips from each service and merge them together with the most recent shown first. This gives an excellent and balanced overview of a place that I think works really well. I wanted to add Yelp recommendations as well but unfortunately their API is very poor for matching places or for pulling out recommendations (it limits you to 3 teasers) - I hope to add them in future if their API improves. With Photos, a similar system was used but this time it pulls images from Gowalla, Foursquare, and Flickr thanks to their geolocation API. This again adds a brilliantly balanced view which should always return some results.
The Future
My database currently has over 1 million locations in hundreds of countries with over 7 million checkins recorded, yet this increases every second of the day as people check-in around the world as well as browsing with the Highlights app. I'm constantly tinkering with the algorithm to make sure it gives the best results as well as finding ways to remove duplicate spots or rogue locations but version 1.1 of the app will make it possible to flag up any issues so they can be cleared up immediately.
I wrote Highlights out of a desire to find the best places and I am its biggest user as well as its creator - it has helped me find some great bars as well as finding a dinosaur-themed crazy golf course less than 5km from my house that I knew nothing about! The unique way in which it finds and sorts places means it should be able to find the best places anywhere in the world. In short, I'm confident it will help you whether you are finding places to eat on holiday or are just planning a pub-crawl at university. Give it a try and let me know how you get on with it.