Elevated rendering errors
Incident Report for imgix
Postmortem

What happened?

On June 10, 2022, 07:16 UTC, the imgix service experienced an increase in elevated rendering errors. A fix was implemented at 08:35 UTC, which restored the service back to normal error levels. 

How were customers impacted?

Between 07:16 UTC and 08:32 UTC, some customers received errors when making requests through the Rendering API. Previously cached assets continued to serve a successful response, but some files that were not cached returned a 502 or 503 error. At its peak, error rates reached 6% for requests to the Rendering API.

What went wrong during the incident?

Erratic network behavior from our upstream network provider caused an increase in error rates to our backend services. As errors began to grow, one of our systems we designed to automatically remediate backend failures failed to trigger, allowing errors to surface through the Rendering API.

Remediations were being identified, though we were delayed in posting a public status update. Eventually, a fix was pushed, immediately restoring service.

What will imgix do to prevent this in the future?

We are investigating the network behavior detected at our upstream provider in order to update our configurations. We are expecting these changes to prevent a similar incident from occurring. We will also be fixing our automated tooling so that error rates get resolved before they impact the rendering service.

Lastly, we will be revisiting our policies for status updates to ensure that incidents are communicated in a timely manner.

Posted Jun 13, 2022 - 16:22 PDT

Resolved
Service has been completely restored.
Posted Jun 10, 2022 - 02:04 PDT
Monitoring
Our engineering team has applied a fix, restoring services to normal. We are currently monitoring the situation.
Posted Jun 10, 2022 - 01:48 PDT
Investigating
We are currently investigating elevated render error rates for uncached derivative images. We will update once when we obtain more information.

Previously cached derivatives are not impacted.
Posted Jun 10, 2022 - 01:29 PDT
This incident affected: Rendering Infrastructure.