I think by now everyone's heard of the hard time the guys up in Queensland, Australia are having because of the recent floods. People have lost their lives and many others have lost their homes and businesses, product of years of hard work.
As a result of this massive disaster the Queensland Government decided to run a telethon to encourage donations to help the flood victims. The Telethon aired last Sunday, 09/01/11, on Channel 9 and lasted for 2 hours.
The problem was the existing donations system that the Government had been using so far: it was just not thought out to handle the load we were expecting to have on Sunday.
That's when my employer, ThoughtWorks, kindly offered a hand to Smart Services Queensland in the attempt to make sure they could receive all donations that were likely to come through the web.
After that, on the Thursday afternoon before the event, Phillip Calçado, Ben Barnard and I set off on a mission against the clock: we had a little over 48 hours to develop, test and deploy an application that was expected to handle thousands of users. Not only that but an application that, should it fail, would prevent millions of dollars from reaching the people in need in Queensland. This was a great responsibility but we knew we could do it.
Given the time constraints it was a bit obvious that we would use Ruby on Rails for this app. Both because of the productivity it's known for and because we had the knowledge right there. With that out of the way, we had to decide how and where we would deploy this thing. We thought a little about it and came down to to 2 options: Amazon EC2 or Heroku (which is powered by Amazon EC2 under the hood). I pushed hard for Heroku and that's what we ended up going with.
Now it was time to get down and dirty and start coding the app. In principle it should be fairly simple. It needed a form where a potential donor would fill out his/her information, giving the option to receive the tax receipt by email or regular mail - more on that later. Upon clicking submit users would be taken to the secure payment gateway website where they could input their credit card number and finish the payment, after which they would be taken back to our app with a success - or an error - message and a transaction number.
Now this work flow has a couple of implications: First, all emails would have to be sent in the background so as to not interfere with the website performance. We were expecting to be sending thousands of them - workers anyone?
Second, the payment gateway integration would have to be developed and tested from scratch. Up until now the Queensland Government integrated with it in a different manner that could not be reused in this case.
And most important of all, although simple in concept, we had no idea of the load we should be preparing for. There was just no data from previous telethons. Thus we decided to prepare for the maximum we possibly could.
As we developed the application we deployed continuously to Heroku in order to test the payment gateway integration, benchmark the app using Apache AB, setup cache headers - Heroku uses Varnish - and find bottlenecks.
Email was one of these bottlenecks and that's why we decided to handle that in the background using Delayed::Jobs.
Since the first deployment, we also tweaked a couple of things at Heroku, such as migrating from their free PostgreSQL offering to a dedicated instance that we believed would both take the load and have plenty of room for all the data - as I write this post, we are already well over the 5MB limit they offer for free.
Long story short, by Saturday evening the website was up and running on 5 app instances, a 6th instance running background jobs - sending emails - and a dedicated PostgreSQL database server.
As Heroku is outside the Government network, their SMTP server was a no go on the short term so we also integrated the app with SendGrid, an email delivery service that fitted perfectly our needs - although the site got so much traction that we went over our monthly quota with them. But the nice guys from SendGrid increased our limit after I opened a ticket explaining the situation!
As for performance we used NewRelic to monitor the application, which Heroku also makes a breeze to integrate with.
We all went home to rest and get ready for Sunday, the day of the Telethon, when we would be monitoring the app throughout the day. We were all excited and when the show went live, we started seeing all those beautiful access charts moving like crazy, spiking over 720 requests per minute and being solid like a rock with flat and fast response times throughout the night.
In about two hours we had over AUD$2,000,000.00 (two million) donated through our website.
Since then the number of transactions dropped but has stayed constant and as of today we've received AUD$25,438,518.32 (over 25 millions of dollars) that will be donated to the flood victims in Queensland.
Oh, and the site is still up and going strong so move your fingers and go help: telethon.smartservice.qld.gov.au - there will be heaps of people grateful for your donation.