Intermittent rendering errors
Incident Report for imgix
Postmortem

What happened?

On October 23, 2023, between 21:43 UTC and 23:14 UTC, imgix experienced a partial outage affecting images served from the Rendering API. During this time, a small percentage (<0.45% on average) of non-cached requests returned a server error.

A fix was implemented at 23:02 UTC, which allowed the service to recover by 23:14 UTC fully.

How were customers impacted?

Between 21:46 UTC and 23:14 UTC, requests to the Rendering API returned a server error, with 0.65% of all requests to our CDN returning an error at the height of the incident.

Additionally, Sources returned an unknown status between 21:06 UTC to 21:09 UTC. During this period, customers reported being unable to create Sources. 

What went wrong during the incident?

Our Rendering API experienced an unexpected interaction that caused a dramatic increase in server load. This caused error rates to increase as the network became overloaded slowly. The errors fluctuated between 0.07% to 0.65% until we resolved the issue. 

To restore the service, our engineers re-configured our network traffic to handle the unexpected Rendering behavior.

During the incident, a separate issue (unrelated to rendering) impacted our Source data. This led to a delay in investigating the cause of the rendering errors.

What will imgix do to prevent this in the future?

We have taken the following steps to prevent this issue from recurring:

  • Fixed the misconfigured server interaction 
  • We will put an alert system in place to notify us when traffic congestion happens from a misconfigured source interaction. 

We are in the process of implementing the following:

  • Conducting a review of our current tooling to increase our traffic and network configuration capabilities.
  • Reviewing our current configuration to limit the affected services should a similar incident happen.
Posted Nov 03, 2023 - 15:36 PDT

Resolved
This incident has been resolved.
Posted Oct 23, 2023 - 16:32 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 23, 2023 - 16:15 PDT
Investigating
We are investigating an issue affecting a small percentage of renders.
Posted Oct 23, 2023 - 15:34 PDT
This incident affected: Rendering Infrastructure.