Periodic network interruption communicating with multiple Azure regions
Summary
Our monitoring has shown Azure’s fix to be stable since our last update. This incident has been resolved.
Impact
minor
Timeline
[investigating] Degraded network capacity in an Azure datacenter is causing network packet loss and increased latency when communicating with AWS in eastern US regions. Customers may experience communications failures trying to submit data from agents running in AWS and may experience delayed data from AWS integrations. Microsoft has identified the root cause and is working on mitigations.
via statuspage[monitoring] The root cause of the issue has been identified and Microsoft has implemented mitigations, we are monitoring network traffic to confirm.
via statuspage[monitoring] Azure has temporarily mitigated the network capacity issues which have caused episodic packet loss for customers who are hosted in Azure data centers and are using Datadog’s US1 region (accessible via https://app.datadoghq.com). Azure engineers are continuing to work to fully resolve this issue. Until they fully resolve the issue, customers with Datadog agents running in Azure data centers may see brief periods of delayed ingestion of data from agents and from Azure integrations. We don’t expect a noticeable impact thanks to agent buffering but cannot exclude the possibility of spurious alerts due to temporarily delayed data. We are continuing to monitor the situation in conjunction with Azure, and will do so throughout the weekend. We will post status page updates as soon as the situation improves, and at least every 24 hours. We thank you for your patience throughout this incident.
via statuspage[monitoring] Azure has implemented a permanent fix to the network issue. Both Azure and Datadog engineers are continuing to monitor overnight and will provide an update tomorrow.
via statuspage[resolved] Our monitoring has shown Azure’s fix to be stable since our last update. This incident has been resolved.
via statuspageLessons Learned
⚠Datadog has experienced 33 incidents in the past year. This frequency suggests systemic reliability challenges that may warrant additional monitoring.
📊Incidents related to network have occurred 48 times across all providers in the past year. This is one of the most common failure categories in cloud infrastructure.
💡This incident is categorized as: Network / Routing. Consider implementing preventive measures specific to this failure category.
Similar Incidents
Ubuntu/Debian Package Mirror Failure
DigitalOcean · Mar 9, 2026
HTTP 522 Error on App Platform
DigitalOcean · Mar 6, 2026
Intermittent connectivity issues in Ashburn, VA
Cloudflare · Mar 6, 2026
Elevated TCP three-way handshake failures on api.anthropic.com
Anthropic · Mar 6, 2026
Disruption with some GitHub services
GitHub · Mar 5, 2026