AWS went down again and took down a whole host of websites – now we know why

Cloud computing service AWS has suffered another significant outage, taking down a large number of popular websites and now we know the cause.

The outage began around 7am PT/3pm GMT, with thousands of incident reports flagged on tracker site DownDetector.

Although ultimately only lasting for just over an hour and a half, multiple high-profile customer sites were affected, with users across the globe reporting issues. The outage is now more or less over, and the culprit appears to be another mistake, (unclear if it was automated or human) in the system that handles network loads. In any case, you can relive it all in our live blog below….

It’s happened again – we’re seeing multiple reports of AWS going down, causing issues across a number of high-profile sites.

As mentioned, thousands of complaints have landed on DownDetector, with users across the US, Europe and Asia all reporting AWS issues.

This has led to a knock-on affect for other popular websites that are hosted on AWS services, which also appear to have gone offline.

According to DownDetector, the likes of Hulu, Intuit QuickBooks and DoorDash have all seen issues, as has

Downdetector services hit by AWS outage

(Image credit: Future / DownDetector)

Video game services appear to be particularly affected, with PlayStation Network, Twitch, League of Legends, Valorant, Apex Legends and Halo all seeing problems.

The official AWS service status dashboard isn’t showing any major issues as yet, but the site itself is very slow to load, possibly indicating something is going wrong.

The only issues currently displayed are concerning “AWS Internet Connectivity” across its Northern California and Oregon areas – part of the AWS US-WEST-1 region.

AWS says it is, “investigating Internet connectivity issues to the US-WEST-1 Region.”

See more

Not exactly the “happiest place on Earth” at the moment, it seems….

It seems the issues are affecting both the US-WEST-1 and US-WEST-2 AWS regions – two huge areas for the company, and home to a huge number of customers.

This could be why a large number of sites and tools are currently down – DownDetector is showing other services such as Zoom, Okta, Salesforce and Crunchyroll also affected.

AWS says it may have the issue in hand – the latest update on the AWS Status Dashboard notes:

“We have identified the root cause of the Internet connectivity to the US-WEST-1 Region and have taken steps to restore connectivity. We have seen some improvement to Internet connectivity in the last few minutes but continue to work towards full recovery.”

Downdetector outage reports on AWS services

(Image credit: DownDetector)

Outage reports are starting to fall on DownDetector – could things be repairing and going back to normal?

Big update – AWS says the issue with the US-WEST-1 region in Northern California is now fixed!

“We have resolved the issue affecting Internet connectivity to the US-WEST-1 Region,” the AWS status page reports. “Connectivity within the region was not affected by this event. The issue has been resolved and the service is operating normally.”

The US-WEST-2 region in Oregon is still under investigation, but DownDetector reports are falling fast, so fingers crossed it should be resolved soon too…

And there you have it – the Oregon region is resolved too.

“We have resolved the issue affecting Internet connectivity to the US-WEST-2 Region,” says AWS. “Connectivity within the region was not affected by this event. The issue has been resolved and the service is operating normally.”

Well that was a wild ride wasn’t it?

In case you’re just joining us – two major AWS regions, US-WEST-1 and US-WEST 2 both suffered “internet conenctivity” issues.

This affected a whole host of sites running on AWS services, with the likes of Hulu, PlayStation Network and even seeing problems.

AWS says that the issues have now been fixed, so fingers crossed that’s the end of the updates from us – thanks for reading TechRadar Pro!

With all systems now green, at least according to the AWS dashboard, AWS added a bit of context to the second major outage in as many weeks. The US-WEST-1 and WEST-2 regions were impacted by identical issues. We’ll let them explain it: 

“Between 7:14 AM PST and 7:59 AM PST, customers experienced elevated network packet loss that impacted connectivity to a subset of Internet destinations. Traffic within AWS Regions, between AWS Regions, and to other destinations on the Internet was not impacted. 

“The issue was caused by network congestion between parts of the AWS Backbone and a subset of Internet Service Providers, which was triggered by AWS traffic engineering, executed in response to congestion outside of our network. 

“This traffic engineering incorrectly moved more traffic than expected to parts of the AWS Backbone that affected connectivity to a subset of Internet destinations. The issue has been resolved, and we do not expect a recurrence.”

It sounds like the trouble started with AWS traffic engineering, which saw network traffic but then made the wrong call and moved too much of it to the AWS Backbone, which got in the way of Internet connectivity for some of your favorite destinations.

By now, things should be working smoothly in most of your AWS-backed systems, but we’ve still seen a handful of reports on Twitter of intermittent, extended outages (Oculus VR Headset connectivity, anyone?). Maybe all will be fully resolved by the morning.

Go to Source