Amazon Web Services powers a significant portion of the internet. When AWS goes down, the impact cascades across thousands of businesses, from startups running on a single EC2 instance to Fortune 500 companies with multi-service deployments. Understanding the history of AWS outages is not just an academic exercise — it is the foundation for building resilient systems.
Major AWS Outages by Year
2020: The Year US-EAST-1 Became a Household Name
In November 2020, AWS experienced one of its most significant outages when the Kinesis service in US-EAST-1 suffered a capacity failure. The incident cascaded rapidly: CloudWatch stopped reporting metrics, Lambda functions timed out, and the AWS Service Health Dashboard itself went dark — leaving operators without reliable status information for hours.
The root cause was traced to a relatively small addition of capacity to the Kinesis front-end fleet, which triggered an operating system configuration issue that overwhelmed threads on every server in the fleet. This single event highlighted a critical lesson: tightly coupled services in a shared region can create failure modes that no single team anticipates.
2021: DynamoDB, Lambda, and the API Gateway Chain Reaction
December 2021 brought another major US-EAST-1 disruption. An automated activity to scale DynamoDB capacity triggered an impairment of the networking devices that connect DynamoDB storage nodes. The failure propagated to services that depend on DynamoDB internally — including Lambda, EventBridge, SQS, and API Gateway.
For operators, this outage was notable because it exposed hidden dependencies. Teams that believed they had no direct DynamoDB usage discovered their applications depended on AWS services that themselves depended on DynamoDB. The incident lasted roughly five hours and prompted many organizations to re-evaluate their single-region strategies.
2022 – 2023: Stabilization with Persistent Weak Spots
AWS invested heavily in infrastructure resilience after the high-profile 2020–2021 failures. Outage frequency decreased, and the average resolution time shortened. However, several service-specific incidents kept operators alert. S3 experienced brief availability issues, CloudFront saw intermittent edge failures, and us-east-1 continued to produce more incidents than any other region.
A pattern emerged during this period: while full regional failures became rarer, partial degradations — where a single service operates below normal performance — became the more common failure mode. These are harder to detect with simple uptime checks and often require deep observability tooling to identify.
2024 – 2026: The Current Landscape
Recent AWS incidents show a shift toward shorter but more frequent disruptions. Configuration changes and deployment rollouts continue to be the leading trigger categories. AWS has improved its transparency, with faster status page updates and more detailed post-incident summaries, but the fundamental challenge remains: any system at sufficient scale will experience failures.
IncidentHub data from the past 90 days shows that AWS maintains a strong reliability score overall, but specific services — particularly in compute and networking — account for a disproportionate share of incidents. The reliability rankings at /reliability provide a current comparison across all major cloud providers.
Common Patterns in AWS Outages
After analyzing years of AWS incident data, several recurring themes stand out:
- US-EAST-1 concentration: This region consistently produces more incidents than others, partly because it is the oldest and most heavily utilized AWS region.
- Cascading failures: A problem in one foundational service (networking, IAM, DynamoDB) often cascades to dozens of dependent services within minutes.
- Configuration and deployment triggers: Automated scaling events and configuration changes are the most common root cause category, not hardware failures.
- Status page lag: AWS status updates often trail real-world impact by 15 to 30 minutes, making independent monitoring essential.
- Recovery in waves: Services rarely recover all at once. A regional outage might show partial recovery for hours before full resolution.
Business Impact of AWS Outages
When AWS experiences a significant outage, the ripple effects are immediate and widespread. E-commerce platforms lose transaction capability. SaaS products become unavailable. CI/CD pipelines stall. Internal tools that teams rely on for communication and coordination may themselves be hosted on the affected infrastructure.
The most dangerous assumption in cloud architecture is that your provider's outage will not affect you because you only use 'simple' services.
— A recurring observation from post-incident reviews
Industry estimates suggest that major cloud outages cost affected businesses anywhere from thousands to millions of dollars per hour, depending on the nature of the disruption and the organization's dependency on the affected services.
How to Prepare for the Next AWS Outage
No cloud provider offers a guarantee of zero downtime. The question is not whether AWS will experience another outage, but how your team will respond when it happens. Here are practical steps to improve your readiness:
- Deploy across multiple regions: Critical workloads should not depend on a single availability zone or region. Multi-region architectures provide the strongest protection against regional failures.
- Monitor independently: Do not rely solely on the provider's status page. Use independent monitoring tools and set up outage alerts through services like IncidentHub to get notified within minutes of a detected issue.
- Map your dependencies: Understand which AWS services your application depends on — including transitive dependencies through other AWS services. Document these in a dependency map that your incident response team can reference.
- Build and test runbooks: Create step-by-step runbooks for common failure scenarios. Practice failover procedures before you need them in an actual outage.
- Track outage history: Use historical data from IncidentHub's AWS outage tracker (/aws-outages) and reliability rankings (/reliability) to identify recurring patterns and make informed infrastructure decisions.
Looking Ahead
AWS continues to invest in reliability, and the trend line shows improvement. But as cloud adoption grows and architectures become more complex, the surface area for potential failures grows with it. Teams that treat outage preparedness as an ongoing practice — not a one-time project — are the ones that weather these events with minimal impact.
IncidentHub tracks every AWS incident in real time and maintains a complete historical record. Bookmark the AWS outage page, explore the reliability rankings, and set up alerts so your team is never caught off guard.