Issue #9 – Long-term Maintenance – The FB outage
Projects that have managed to stay alive for years go through many changes, and it’s easy to understand given all the new technologies and practices released every year.
Developers dedicate a lot of time to developing and architecting a project to be scalable. But often forget about the processes and frameworks needed to keep that standard of scalability.
Software is typically meant to last for 20 years, and even with all the new advancements, that can still be achieved with the proper framework. Maintenance management, if properly executed, can be an essential component of well-functioning and long-lasting software.
Facebook experienced a week of shortages and lost billions in market share value and maybe users in the process. But in the world of the internet, that was ages ago. For those more interested in the technical aspect, we will take a deeper dive into what occurred.
It may be no secret that Facebook would have hundreds of machines running in its data centres that make up what is the behemoth of a social network. They refer to it as the Backbone in a blog post published by the team sometime after the incident.
Let us first look at DNS, which is popularly known as the address book for the internet. There’s BGP (Border Gateway Protocol) which is the protocol that is mainly used by the routers that connect networks such as the Backbone and is responsible for exchanging routing information with other servers across a network.
We refer back to the article, as the team does an excellent job explaining how its network works and what systems are put in place in the event of mini outbreaks. But in short, a misconfiguration error pushed onto the main network during a maintenance run disabled the connections BGP usually distributes and handles, which also affected all of their data centres, effectively disconnecting FB from the internet.
Incidents like this probably occur all the time, but in smaller volumes, so Facebook is ready for significant outages like this. Now your systems may not be as extensive as Facebook, but you to can set up services, rules and practices on your network layer that prevent similar things from happening for you.
We’ve already delved into a few networking concepts and briefly mentioned DNS’s role in making up the internet. AWS Route53 is a service built to provide DNS services in the cloud.
Allowing developers to manage traffic between applications both set up in your internal cloud environment and applications sitting externally.
The event where the tech giant announces new products, designs, and upgrades took place this week. Expect a flood of memes and news around the event throughout the week.
An open-source desktop application for managing your Kubernetes instances and containers without the need for a registry.
A pick close to home, Paystack has made a few updates to their subscription feature, allowing developers to provide their users with more control.