whitehatwiz.com

CrowdStrike IT Outage Affected 8.5 Million

CrowdStrike IT Outage Affected 8.5 Million

Friday July 19th 2024 and 8.5 million Microsoft devices went haywire due to a botched system update. That’s 1% of all Microsoft devices worldwide and affected businesses and services across the globe. Airlines, banks, hospitals and emergency call centers were all impacted. And now we see just how much one update can mess up our digital world.

What Caused the Microsoft Systems Outage?

The global outage was traced back to a faulty software update from CrowdStrike, one of the largest independent cybersecurity companies with 30,000+ customers worldwide. The update caused affected systems to go into a BSOD (Blue Screen of Death) state and enter an infinite reboot cycle where the systems couldn’t boot up correctly. Each time you tried to reboot it would give the same error and the cycle would continue indefinitely.

CrowdStrike’s CEO George Kurtz posted on X (formerly Twitter) that the company was “actively working with customers impacted by a defect found in a single content update for Windows hosts.”. He said this was not a security incident or cyber-attack but a technical issue that was identified, isolated and fixed.

The Aftermath of the incident

Although the CrowdStrike event only affected less than 1% of all Microsoft enabled devices, the impact was huge. One of the hardest hit was the air travel industry. On the day of the outage 3,300+ flights were cancelled globally. In the US, Delta, American and United had to ground their flights for several hours causing a huge backlog of customer and commercial travel. Airports in Tokyo, Amsterdam and Delhi and many other international locations were severely impacted.

Banks were also affected. System outages impacted ATMs, mobile banking apps and call centers. The ripple effect went to essential services like hospitals and 911 dispatch teams. Massachusetts General Hospital released a statement about the impact on their operations and had to cancel all non-urgent surgeries, procedures and medical visits.

Resolving the Issue

CrowdStrike is on top of it. As Kurtz said in his statement, the issue is fixed on their end and they are working with customers to get it fully resolved. But in an interview on the TODAY show he said It could be some time for some systems that just automatically won’t recover.”.

IT pros say it could take days for larger organizations to get back to normal. The BSODs are making it harder to recover. CrowdStrike has pushed an update to fix the issue but many customers can’t reboot their systems to get the update.

For those affected, CrowdStrike has published manual remediation steps for IT admins. These steps involve booting the OS into “safe mode”, modifying installed drivers and then rebooting safely. While we’re confident in the permanent fix, the recovery process will require IT admins to get manual access to remote servers and systems running Microsoft OS.

Economic Impact and Future Implications

The economic impact is huge, with estimates of total losses already at $1B. This incident highlights how vulnerable our tech ecosystem is. It also raises questions about automated updates and the processes in place to handle a global outage.

In summary, the CrowdStrike IT outage is a wakeup call for the importance of cybersecurity and update processes in our digital world. As we become more and more dependent on technology, we need to be more and more vigilant and proactive with our cybersecurity. The lessons learned from this will shape the future of cybersecurity and system updates to prevent this from happening again.

What to Do During Major IT Outages

When major IT outages like the CrowdStrike one happen, businesses and individuals should have a plan in place to mitigate the damage. Here’s what to do:

1. Back up Systems and Data

Back up all critical data so you don’t lose it. Use both local and cloud based backup solutions for redundancy.

2. Have a Disaster Recovery Plan

Have a detailed disaster recovery plan that outlines what to do in case of a system failure. Test it regularly to make sure it works.

3. Keep Offline Copies

Keep offline copies of important documents and resources. So you can still operate at some level even if digital systems are down.

4. Use Multi-Factor Authentication

Add an extra layer of security with multi-factor authentication (MFA). So it’s harder for unauthorized users to get into your systems.

5. Monitor System Health

Use monitoring tools to watch system performance and health. Early detection of anomalies can prevent or minimize the outage.

6. Have Clear Communication Channels

Make sure all employees and stakeholders know how to communicate during an outage. This includes having backup communication methods like mobile phones or alternative email systems.

7. Train Employees

Train employees on how to respond to IT outages. This includes not panicking, following the disaster recovery plan and knowing who to call for help.

8. Keep Software Up to Date

While this was caused by an update, it’s always good to keep software up to date to protect against vulnerabilities. Test updates in a controlled environment before deploying.

9. Isolate

If it happens, isolate the affected systems to stop it spreading. This will help contain the problem and speed up the recovery.

10. Talk to IT and Vendors

Stay in touch with your IT team and vendors. They will be able to give you updates and guidance on what to do.

11. Document

Keep a record of the outage, what happened, how you fixed it and what you learned. This will be useful for next time and for improving your response.

By doing these businesses and individual can better handle major IT outages and reduce the impact.

Conclusion

The CrowdStrike IT outage is a stark reminder of how important cybersecurity and patching is to our infrastructure. It affected many sectors; we are so dependent on technology working seamlessly. CrowdStrike have fixed the issue quickly but the economic impact and operational disruption shows how important testing and planning is. This will no doubt influence how we approach cybersecurity and system updates going forward to harden our defences against similar disruptions and make our connected world more stable.