Elevated rendering errors
Incident Report for imgix
Postmortem

Postmortem

What happened?

On May 23, 2024, at 19:23 UTC, an increased load on the rendering infrastructure was detected. Actions were taken to scale out our system to handle the additional traffic. This incident was resolved at 19:36. 

How were customers impacted?

During the incident, customers experienced increased error rates for recent renders, intermediate errors increased in our system, and response times for requests increased. 

What went wrong during the incident?

During the incident, our team implemented a service change that led to assets being dropped. This led to an increase in requests to our system. The increased requests to our system led to 429 and 5XX errors.

What will imgix do to prevent this in the future?

To prevent similar incidents, we will:

  • Improve procedures for pre-scaling instances during critical updates.
  • Conduct impact assessments before issuing significant changes.
  • Enhance monitoring and alerting systems to predict and manage load increases better.
Posted May 24, 2024 - 13:50 PDT

Resolved
This incident has been resolved.
Posted May 23, 2024 - 12:50 PDT
Monitoring
On May 23rd, 19:19 UTC, we identified an issue affecting our rendering services due to a caching problem. This caused elevated rendering times and intermittent failures for some users. Our engineering team quickly diagnosed the issue and implemented a fix at 19:36 UTC.

We are monitoring the system closely to ensure stability and confirm that the issue has been fully resolved. We appreciate your patience and understanding during this time.
Posted May 23, 2024 - 12:44 PDT
This incident affected: Rendering Infrastructure.