Downtime Update | March 11th 2021
Just after midnight UTC on March 10, all of our services went down. We attempted to log onto the servers within minutes to investigate the problem, but unfortunately one of the database servers that powers the site was not responding. Our provider were reporting "a disturbance" which was being looked into, and an increasing number of services at their location were going offline.
A few hours later, around 3am, it was reported that our server provider had a major fire in one of their locations. As the situation unfolded it became clear that this fire had completely destroyed one datacentre at the location, and severely damaged another. Thankfully nobody was injured in the fire, but we had multiple servers hosted in those buildings. Those have been irrecoverably destroyed.
Naturally this has caused disruption to our website and app, which were down for approximately 37 hours whilst we moved to and completely rebuilt our infrastructure with a new provider. It wasn't just us - it seems like up to 3.6 million websites and services were also affected by the outage.
We've now restored everything from the most recent backups we could obtain, but we regret to say that in some cases this has led to missing data on the website. We've been unable to retreive our most recent backups, which were on a server that's powered off due to the fire situation. As this won't be available until next week at the earliest (if at all), we've had to make do.
The missing data includes, but is not limited to, trades, reports, support tickets, chat messages and user accounts. We appreciate this may be a confusing and frustrating situation, but we've done everything we can to load the most recent data available to us. In some cases data is available from yesterday morning, but in most cases will be around a week out of date.
To prevent confusion we have removed all existing trades from the site. While we wanted to keep them, we felt that the percentage of trades that were no longer valid would be too high.
Although this incident was entirely out of our hands, we have identified a number of places that we should be doing better. While we do perform backup restoration tests, our backups from the past few days were not far enough off-site for the fire to not impact them. Even if they had been available, that data would have still been up to 15 hours out of date due to our backup schedule. Likewise, while our file backups appeared okay, the archives have only been partially recoverable. We thought everything was okay, but in retrospect we should have been doing more. It's a painful lesson, but also a very useful one.
We're reviewing the situation and will keep you updated as we decide what to do going forwards. We had planned on moving our infrastructure to a different provider in the near future anyway, but this incident has forced us to pick on the spot without as much thought as we'd like. We'll keep you posted on future updates. More downtime may be required as we adjust to our new home, we'll keep this to a minimum. Follow us on Twitter for the latest updates.
We hope you understand the trouble we've been through, and we do apologize for the inconvenience. A huge thanks needs to go on record to vicegold and our new "server guy" Toby who went above and beyond yesterday, working for almost 18 hours straight to allow this to be possible. It was a stressful day, to say the least. Plans were set back by the backup situation, the sheer size of our backup files, and then some connectivity issues with supporting services later in the day. All of this combined to mean we couldn't relaunch last night in a way that we were confident enough in.
A lot of you have asked about the cause of the fire - that's not yet known. Articles on TechRadar and the MalwareBytes Blog cover it well, and there are plenty more out there should you wish to read up. OVH are a major player in the server space, this situation may be in the news for a while as they recover. Questions also have to be asked about how things got this bad, but for now we're just glad that nobody was hurt in the fire.
PS: While we can understand not being able to trade is frustrating, this situation was out of our control. A lot of users rated our app with 1 star due to the downtime. If you've rated us with 1 star, please reconsider your rating as it would help us a lot. If you haven't rated us yet, you can do so on the Apple App Store and on Google play.
Non, les données contenues dans les serveurs de SBG2 ne seront vraisemblablement pas récupérables.
Mais si vous utilisez le Backup FTP #OVH, vos sauvegardes sont dans un autre datacenter. (Les photos sont des pompiers) https://t.co/tRonks8qIw pic.twitter.com/EL5BcMjbc7