On Monday, 4th October 2021, around 8:45 pm IST, interlinked social media platforms Facebook, Facebook Messenger, WhatsApp and Instagram suffered a global outrage for more than 6 hours. Through a major part of the day, users on the platforms complained of errors that eventually turned into a complete worldwide shutdown of these apps, resulting in decline of stock price by 5% for the owner Silicon Valley firm. In fact, Standard Media Index, an ad measurement firm, further estimated that Facebook had lost over 5,45,000 USD per hour in revenue during the major fault.
When more than 10.6 million users started reporting about the breakdown (they were unable to send or receive any kind of messages or media), Facebook officials quickly took to Twitter to apologise for the inconvenience. They had further stated that they had been facing some network problems and all their teams were working in collaboration to fix the issue and restore the apps as early as possible.
After a wait of almost 6 hours, Facebook-powered services were finally restored and responsive. Billions of users across the world faced this huge outrage across all 4 platforms. This has been described as one of the longest outrages in the history of Facebook connected platforms, by Downdetector, a firm that monitors global internet issues.
What was the actual reason behind the outrage?
According to media reports, the primary cause of the fault was a ‘Domain Name System’ or DNS error on Facebook’s main page. This, in turn, led to a ‘technical glitch’ or withdrawing of its BGP (Border Gateway Protocol) routes affecting all the connected apps. These routes are actually part of the DNS system of the internet, which is an important feature that denotes which way the internet traffic should proceed. DNS’s job is to convert Facebook’s domain name. i.e., ‘facebook.com’ to billions of IP addresses. But on Monday, DNS was unable to do that. Instead, Facebook’s DNS data was suddenly unavailable, rendering the apps and web addresses useless. However, experts believe that the main issue was a misconfiguration (code push) in Facebook’s routers, that couldn’t connect its custom-made network to the global internet system.
How did the Facebook team finally resolve it?
During the outrage, the company’s internal working tools (including their own email system and security badge applications) were also down, which is why the officials could not immediately work on the issue. The employees were also prohibited from leaving or entering the company’s headquarters, until the crisis was resolved. The teams took about 6 hours to debug the DNS issue, after which slowly and carefully the platforms were restored back to normalcy. It was found that the DNS issue that had hit Facebook-powered services was indeed complex and that’s the reason why the firm took so long to reboot it. However, Facebook authorities have not yet issued an official report giving details of the same.