Getting Xbox Live Achievements Data: Part 1 - The PHP Problems
Those of you with an Xbox 360 (or indeed some "Games for Windows" titles) will know all to well about the achievements system prevalent in every game. For those that don't know, every gamer has a profile which has a gamerscore. This score goes up by completing certain tasks within each game as laid down by the developer. This could be something you would do anyway such as "finish the game" or something random such as "destory 10 cars in 10 seconds". Every full game can give out 1000 gamerpoints (1250 with expansion packs) and an Xbox Arcade title can give out 200. These points are somewhat of a geek badge of honour for most Xbox gamers who will try and do everything to get the full 1000 in each of their games - there are also those that want to increase the number as quickly as possible so you can find numerous guides online for the easiest way to get 1000 points (it seems Avatar is still the best way giving you the full 1000 with about 3 minutes of gameplay!)
When I was trying to compete with my ex-boss over the number of gamerpoints we each had (I lost by the way), I found that there was no public API from Microsoft to allow you to get at the Xbox Live data. There was however an internal API and one Microsoft associate had set up a restful API so that you could publicly call the internal one. This worked well enough for the basic site I put together to compare two gamerscores but I've been wanting to do more with the API for some time.
My overall idea is that I'll be able to type in my userid and then have my server poll Xbox Live at a certain time and then update my Facebook Wall when I unlock new achievements. The message would be something along the lines of "Ben just completed the 'Fuzz Off' achievement in Banjo Kazooie: N&B and earned 20G" and would have the correct 32x32px image for the achievement. I initially thought that this would be fairly easy but I was unfortunately very wrong! In this series, I'm going to show you the problems I encountered as well as the final (rather complex) workaround I'm creating in order to get it all to work! If you've got any questions, please leave a comment or get in touch.
Attempt #1: Using the API
When I first sat down to work on this project, my initial thoughts were "I can just reuse the public API I used for my gamerscore comparison site - there's bound to be an achievement section in the returned data". After eagerly re-downloading all the code, I discovered that although there was some achievement data, it was nowhere near as detailed as the information that I would need. The problem was that the API only shows you your recently played games and how many achievements you have unlocked in each one as well as the overall number of points you have earned for that game. Theoretically, I could check the API every few minutes and compare the number of points with a local copy in order to work out when a new achievement had been unlocked but I'd only be able to say that an achievement had been unlocked in a certain game worth a number of points. To make things even trickier, if I unlocked more than one achievement within the timeframe of the API check, then the results would be wrong (e.g. it might say I'd unlocked one achievement worth 45G when in fact I'd done two; one for 20G and one for 25G). This would become even more complex if I unlocked an achievement, then switched games and unlocked one in that game before the API had been called. In short, the public API, useful though it can be, was not going to work for this.
Attempt #2: Screen Scraping
So now we move to option two; screen scraping. This is the process of getting the server to request pages from a website as if it were a browser and then just ripping the content out of the HTML. It's messier than an API as it relies on the websites HTML not changing and it's also a lot more processor intensive (as you're parsing an entire XHTML page - possibly marked up invalidly - rather than a nice small XML or JSON file). I've done lots of screen scraping in the past, both for my Tube Updates API and for the Packrat Market Tracker (a tracking system for a Facebook game), so I didn't think it would be too much hassle. But then I hadn't banked on Microsoft...
The first hurdle is that although my Xbox Live data is set to be shown publicly, you still have to be logged in with a Windows Live account to view it. This is annoying because it means my script is going to have to log in to Windows Live in order to get the HTML of my achievements listings. The second hurdle is that there is no single page listing my latest unlocked achievements - the main profile page shows my last played game (and it's last unlocked achievements) but that's no good as they are not in order and it might be that I've switched games after unlocking something so the last achievement on the profile page may not be the last achievement I've unlocked. This isn't such a big problem as there are pages for each game so I'll just have to crawl each of my recently played games pages and get the achievements on each one but it's slightly more hassle than having one page of latest achievements (as it means I have to make several requests thus increasing bandwidth and script run time).
Logging In to Windows Live
Generally, logging into a site is quite easy using cURL. You need to work out where the form is posting to, put all of the data to be posted in an array, and then make a cURL request that sends that array to that URL. You will also need to enable both a cookie file and a cookie jar (a basic text file that is used for all of the cookies during the requests) as you will probably only want to login once and then have each future request know that you are already logged in as this will save on overall requests per execution of the task.
The Windows Live login, on the other hand, is an entirely different kettle of fish! The URL you are posting to changes on each request as do the variables that you are posting. This means we need to make a request to the login page first of all and extract all of the data from the hidden input fields as well as the action attribute of the form. We can then go about posting that data (along with our email address and password) to the URL we just extracted. This POST goes through a HTTPS connection though, so we need to modify our cURL request further in order to ensure that SSL certificates are just accepted without question. Our overall cURL request, with all of these options, will look roughly like this:
I had thought that this would be the end of it and that the returned data would be the first page after logging into Windows Live. Instead, I got nothing. Absolutely nothing. No matter what settings I tinkered with or parts of the code I changed, it was just returning blank. It was then that I noticed the rather unpleasant JavaScript files on the page and some suspicious <noscript> code at the top of the page. If you load the login page without JavaScript in a normal browser, then the code in the <noscript> section gets read which has a meta redirect to send you to a page telling you that you must have JavaScript enabled! I hadn't noticed this previously as my cURL request doesn't understand HTML, it was just returning it as a big lump so I was able to get all of the variables, etc, out without being redirected as I would be in a normal browser.
I didn't think too much of this as obviously the page worked without JavaScript - it must just be a rudimentary way to make people upgrade their browser (although it didn't actually give you any advice - very bad usability!). But no, the login does require JavaScript as when you submit the form a huge amount of obfuscated code does some crazy code-fu to the POST request and encrypts it all before sending thus making JavaScript a requirement to log in to Windows Live. To my mind, this has obviously been done to prevent people from screen scraping their sites such as Hotmail but it really is a pain!
The AppleScript Idea
It was about 3am by the time I'd realised that screen scraping wasn't going to work and I'd been playing with the code for around 5-6 hours so was pretty annoyed with it. So today I sat down and listed all of the obstacles so I could work out a way round them:
- The data from the API wasn't good enough so couldn't be used
- Although I could screen scrape the Xbox Live profile page / game pages, I couldn't get to them as needed to be logged in to Windows Live
- I couldn't log in to Windows Live without JavaScript
After writing this down and having a think, I realised that I have a static IP address and a mac mini which is always turned on and connected to the internet. I also realised that all my server needed to parse the Xbox Live pages was the HTML itself - it didn't necessarily have to come from a cURL request or even from my server. After this 'mini' enlightenment I set about writing a plan that would allow me to get around the Windows Live login using a combination of a server running some PHP and cURL requests and a mac mini running some AppleScript. It will work roughly like this...
The server will store a record of all of my game achievements in a MySQL database. It will therefore know my gamerscore and be able to compare it to the gamerscore found using the API. Every five minutes it will check this and if it notices a difference in the numbers, it will know that I have earned an achievement and thus needs the HTML that alluded me yesterday. It knows the URL it needs so it will log this in a text file on the server that will be publicly available via a URL.
Meanwhile, the Mac Mini will use AppleScript to check the URL list on the server every five minutes. If it finds a URL, it knows that the server needs some HTML so it will oblige by loading the URL in Safari (which will be set to be permanently logged in to Windows Live thanks to authenticating and choosing "save my email address and password" which stores a cookie) and then getting the source of the page and dumping it in a text file on the Mac Mini.
The text file on the Mac Mini (with the HTML we need) will be available to my server thanks to my Static IP and so when the next CRON job on the server runs, it will see that it wanted some HTML (based on their being some URLs in its own text file) and so will check the text file on the Mac Mini and thus get the HTML it needs. It can then parse this, work out the new achievements and log them in the database accordingly. It will then clear the URL list (so that the mac mini doesn't try and do an update when it doesn't need to) and then continue on it's cycle by checking if the gamerscore is equal to the (now updated) database.
The Next Step
So, after a failed evenings development, I have now come up with a solid plan to get around several key hurdles. I'll be posting part two of this series shortly once I have built the application and it will have all of the source code included for those of you that want to replicate a similar system. In the mean time, I hope this post will show you that problems do pop up in application development and that they can be resolved easily by writing out a list of each hurdle before formulating a plan to get around them.
Update: Part two of this tutorial is now available.