Atlassian's cross product user search service is currently degraded.

lowAtlassianDec 18, 2023 15:04Duration: 12h 24m
compute
Capacity Issue

Summary

### **SUMMARY** On Dec 18, 2023, between 12:29 p.m. and 3:35 p.m. UTC, Dec 18, 2023, Atlassian's cloud customers using Atlas, Bitbucket Cloud, Compass, Confluence Cloud, Jira Service Management, Jira Software, Jira Work Management, Jira Product Discovery products were unable to search for users or use the "@mention" functionality. Customers' user search results failed or were delayed as Atlassian's service returning user search results was degraded in several regions. The incident originated f

Impact

none

Timeline

Dec 18, 2023 15:04

[investigating] We are investigating reports of intermittent errors for <SOME/ALL> Atlassian, Confluence, Jira Work Management, Jira Service Management, Jira Software, Atlassian Bitbucket, Jira Align, Jira Product Discovery, Atlas, and Compass Cloud customers. We will provide more details once we identify the root cause.

via statuspage
+57m
Dec 18, 2023 16:01

[investigating] Atlassian's cross product user search service is recovering. Searches for users within Atlassian products are returning to normal.

via statuspage
+1m
Dec 18, 2023 16:02

[investigating] Atlassian's cross product user search service is recovering. Searches for users within Atlassian products are returning to normal.

via statuspage
+4m
Dec 18, 2023 16:06

[investigating] Atlassian's cross product user search service is recovering. Searches for users within Atlassian products are returning to normal.

via statuspage
+18m
Dec 18, 2023 16:23

[investigating] Atlassian's cross product user search service is currently healthy. Searches for users within Atlassian products are working as expected. We are in the process of investigating the root cause of this incident.

via statuspage
+11h 5m
Dec 19, 2023 03:28

[resolved] It has been resolved. Atlassian's cross product user search is working.

via statuspage
+215h 15m
Dec 28, 2023 02:43

[postmortem] ### **SUMMARY** On Dec 18, 2023, between 12:29 p.m. and 3:35 p.m. UTC, Dec 18, 2023, Atlassian's cloud customers using Atlas, Bitbucket Cloud, Compass, Confluence Cloud, Jira Service Management, Jira Software, Jira Work Management, Jira Product Discovery products were unable to search for users or use the "@mention" functionality. Customers' user search results failed or were delayed as Atlassian's service returning user search results was degraded in several regions. The incident originated from a computationally intensive operation that was triggered multiple times in rapid succession, resulting in degraded performance of Atlassian's user search service across several regions. Notably, customers in the EU west region were most affected. The incident was detected within 2 minutes by automated monitoring, and our team promptly took action by recovering unhealthy systems and scaling up the service's infrastructure temporarily. The resolution process concluded in 3 hours and 06 minutes. ### **IMPACT** The overall impact was between Dec 18, 2023, between 12:29 p.m. UTC and Dec 18, 2023, 3:35 p.m. UTC. The Incident caused service disruption to cloud customers worldwide. Customers experienced delayed or failed user searches when using the following Atlassian cloud products: * Atlas * Bitbucket Cloud * Compass * Confluence Cloud * Jira Service Management * Jira Software * Jira Work Management * Jira Product Discovery ### **ROOT CAUSE** The incident stemmed from Atlassian's user search service receiving commands to process multiple computationally intensive operations in rapid succession. These operations were directed at the same customer data set, and therefore overloaded resources within a clustered database system, leading to memory exhaustion and subsequent unresponsiveness to user search requests.  ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** To prevent a recurrence of such incidents, we are implementing the following measures: * Implement a mechanism to queue computationally intensive operations in order to avoid overloading the resources within the systems and process them without impact on customer experience. * Fine-tune our clustered database settings to mitigate the impact of resource exhaustion on the overall system.  We apologize to customers whose services were affected during this incident; we are taking immediate steps to improve the service’s resiliency. Thanks, Atlassian Customer Support

via statuspage

Lessons Learned

📊Incidents related to compute have occurred 26 times across all providers in the past year. This is one of the most common failure categories in cloud infrastructure.

💡This incident is categorized as: Capacity Issue. Consider implementing preventive measures specific to this failure category.