Microsoft Data Centre Issue - Australia East | September 2023 | History

Resolved

All Data Centre operations have now returned to normal operation. Thanks for your patience. If you are still experiencing issues, please get in touch with our support team for troubleshooting via your defined support pathway.

08:14 AEST - 4 September 2023

Updated

Having successfully recovered 99% of storage services and 99% of impacted Virtual Machines, Microsoft are actively investigating individual downstream services to confirm their recovery status and mitigate remaining issues. At this stage, they believe most downstream services that are still experiencing impact are the result of dependencies on one of three services with investigations ongoing. Firstly, our Storage team are making progress with the final remaining storage scale unit that is still experiencing isolated issues - Microsoft have engaged our onsite datacenter team to support replacing drives as needed. Secondly, our SQL team are working to mitigate one final cluster that is experiencing a capacity issue, due to several Service Fabric nodes that have not fully recovered - Microsoft are rebalancing capacity to mitigate. Finally, our Cosmos DB team continue to investigate why some services have not yet recovered fully. While the majority of customers and the majority of services are already mitigated, further updates on these remaining investigations will be provided in 60 minutes, or as events warrant.

11:03 AEST - 31 August 2023

Updated

We have made significant progress in restoring core services, and expect that the vast majority of remaining services should be back online in the next 1-2 hours. After restoring power and stabilizing temperatures, all network infrastructure and 95% of storage services are back online. All premium disk storage has fully recovered, we continue to work towards mitigating the final remaining storage devices. The majority of underlying compute services are back online, with more than 85% of Virtual Machines (VMs) that were impacted now back online and healthy. For the remaining VMs, we are investigating potential issues in connecting to their corresponding storage services.

While many customers have already recovered, we continue to work with downstream impacted services to ensure that they are coming back online in the next 1-2 hours as expected. Further updates will be provided in 60 minutes, or as events warrant.

06:21 AEST - 31 August 2023

Updated

Mitigation efforts are continuing, we have made significant progress in restoring core services, and we expect that the vast majority of remaining services should be back online in the next 2-3 hours. After restoring power and stabilizing temperatures, all network infrastructure and 95% of storage services are back online. All premium disk storage has fully recovered, we continue to work towards mitigating the final remaining storage devices. The majority of underlying compute services are back online, with more than 85% of Virtual Machines that were impacted now back online and healthy. As a result, many customers of these services have already recovered - but we continue to work with downstream impacted services to ensure that they are coming back online in the next 2-3 hours as expected. Further updates will be provided in 60 minutes, or as events warrant.

05:01 AEST - 31 August 2023

Recovering

Starting at approximately 08:30 UTC on 30 August 2023, a utility power surge in the Australia East region tripped a subset of the cooling units offline in one datacenter, within one of the Availability Zones. While working to restore cooling, temperatures in the datacenter increased so we proactively powered down a small subset of selected compute and storage scale units, to avoid damage to hardware.

Multiple downstream services were impacted, with targeted communications being distributed via Azure Service Health. Impact to services is limited to Australia East, except for Azure Kubernetes Service (AKS) which has impact in both Australia East and Australia Southeast due to a dependency in the former. If your workloads are protected by Azure Site Recovery or Azure Backup, and you need critical services back online before all services in this datacenter are fully recovered, we recommend either to initiate a failover to the recovery region or recover using Cross Region Restore. Note that any new allocation requests for the Australia East region will automatically avoid the impacted scale units.

05:01 AEST - 31 August 2023

Microsoft Data Centre Issue - Australia East

Find Your Subscription

Subscribe to Status Updates