On June 10, 2022, 07:16 UTC, the imgix service experienced an increase in elevated rendering errors. A fix was implemented at 08:35 UTC, which restored the service back to normal error levels.
Between 07:16 UTC and 08:32 UTC, some customers received errors when making requests through the Rendering API. Previously cached assets continued to serve a successful response, but some files that were not cached returned a 502 or 503 error. At its peak, error rates reached 6% for requests to the Rendering API.
Erratic network behavior from our upstream network provider caused an increase in error rates to our backend services. As errors began to grow, one of our systems we designed to automatically remediate backend failures failed to trigger, allowing errors to surface through the Rendering API.
Remediations were being identified, though we were delayed in posting a public status update. Eventually, a fix was pushed, immediately restoring service.
We are investigating the network behavior detected at our upstream provider in order to update our configurations. We are expecting these changes to prevent a similar incident from occurring. We will also be fixing our automated tooling so that error rates get resolved before they impact the rendering service.
Lastly, we will be revisiting our policies for status updates to ensure that incidents are communicated in a timely manner.