A quick post-mortem of Fridays world record attempt. We didn’t get enough people into the game to break the record. Our peak player count was 997 concurrent players – we needed 4,076 – we were architecte, designed for and expecting somewhere between 50,000 and 70,000.
The technology held up wonderfully. All our servers performed flawlessly, the diagnostic information system, and all our data recording and logging was accurate and the UI delivered a smooth end engaging experience. I know it is easy to say that when we were only running at 2% load, but there are a lot of moving parts to the system and while we have been doing automated testing of the game, this was its first time with more than 100 real users.
We were told to expect there would be 2,000,000 people trying to join the game. To protect the game servers, we built a gating and throttling mechanism (called Zuul) that would allow us to control the number of players in each data centre and the rate at which we let them in. Once a player has been admitted to the game, we pick a game server (Gaia) based on how busy they were – initially, not to spread the load, but to bunch the players together to ensure there were always people to shoot at ;-) The default behaviour is to prioritise servers with less than 200 players on them, then to choose the one that is furthest from having 200.
On Friday when we made the game live, we gave Zuul 2000 user tokens per datacentre – an initial capacity for 12,000 players – timid by what we expect to do, but still 3x the world record. Then, they didn’t come! As people arrived to a sparsely populated server, they started flying off to try and find other players. Some of these will have flown into areas of space that other servers were responsible for, causing them to transition to the new server. This now caused the code selecting the server for a new player to try and fill more and more servers. This meant that we never really got critical mass on any one server, the game play never reached the levels of excitement we had designed for and it was time consuming to find an opponent.
Now, I don’t believe for a second that a change to that stacking algorithm would have given us the number of people we had designed the system for, but it might have meant that we reduced the churn of players to allow us to break the world record.
This leaves the questions – “were we just too ambitious” and “should you kick yourself for following the spec if hindsight shows it was based on some poor assumptions”. I know the answer to both of those is no, but I’ve still got to ask ;-)
My huge thanks to Marcus Tillett , Matt Warren and Stuart Lodge for all their hard work over the last couple of months. The quality of the code you built and the ownership you took of your parts of the project is humbling.
More details of how the services work together and an explanation for our esoteric naming strategy to follow.
Age of Ascent has taken her maiden flight, I can’t wait to take her out again to see what she can really do!