Front was unavailable for 10 minutes
Incident Report for Front
Postmortem

On May 14th at 16:10 UTC our engineering team deployed a configuration change to our infrastructure. This change contained an error that caused some processes to continuously restart, resulting in high CPU usage. This error did not manifest in a partial rollout that occurred previously.

The team was immediately notified and reverted the configuration change. However, because of the high CPU usage, some servers did not immediately respond and it took 11 minutes to execute the rollback instead of the expected 1 minute.

Our team has since added new sanity check to our global configuration to prevent this situation from happening again in the future. We are also reviewing our deployment system to make it faster in degraded conditions.

We are very sorry about this incident, we understand that even relatively brief incidents are very disruptive for our customers.

Posted 7 months ago. May 15, 2018 - 14:43 UTC

Resolved
This incident has been resolved.
Posted 7 months ago. May 14, 2018 - 14:49 UTC
Monitoring
Front was unavailable for 10 minutes between 7:10am PST and 7:20am PST.
We are monitoring the situation but everything should be back to normal.
Posted 7 months ago. May 14, 2018 - 14:27 UTC