Elevated rendering errors
Incident Report for imgix

What happened?

On February 19, 17:20 UTC the imgix rendering service experienced a major outage affecting uncached image renders. Once our engineers were alerted, we immediately began to implement mitigations towards fixing the issue, with the service being fully restored at 17:50 UTC.

How were customers impacted?

During this incident, requests for some uncached derivative images received a 503 response, with approximately 3% of all requests to the imgix service returning a 503 error during the incident.

What went wrong during the incident?

Our service experienced an unexpected issue with retrieving assets from origins behind certain CDNs. While the issue initially was not enough to cause a service disruption on its own, the issue uncovered gaps with our monitoring tools, which prevented alerts from going to our site reliability team. By the time the alarm was manually sounded, the issue had escalated to the point where it had begun to affect a broader range of traffic coming in to the service.

What will imgix do to prevent this in the future?

We will be correcting our monitoring patterns to ensure similar retrieval issues notify our engineers in the future. We will also be modifying our retrieval behavior to place limits on conditions that would have caused an outage.

Posted Feb 25, 2021 - 07:47 PST

Service has been completely restored.
Posted Feb 19, 2021 - 10:21 PST
Service has been restored. We are currently monitoring the situation.
Posted Feb 19, 2021 - 10:09 PST
The issue has been identified and mitigations are in place. Errors are trending down, though we are continuing investigations into the issue.
Posted Feb 19, 2021 - 09:58 PST
We are currently investigating elevated render error rates for uncached derivative images. We will update once when we obtain more information.

Previously cached derivatives are not impacted.
Posted Feb 19, 2021 - 09:40 PST
This incident affected: Rendering Infrastructure.