Elevated rendering errors
Incident Report for imgix
Postmortem

What happened?

At June 20th 2019 15:11 UTC, the imgix service saw elevated origin latency when retrieving images. This caused an increase in latency and error rates for newly rendered derivative images. This lasted for approximately two hours with the issue subsiding by 17:20 UTC.

How were customers impacted?

There were sustained error rates for two to three percent of all traffic for approximately two hours. While many requests were being served properly, there was latency on new renders. Note that previously rendered and cached derivative images were not affected.

What went wrong during the incident?

Our service monitoring identified the elevated error rates and we began remediation efforts immediately. While we did see some positive benefit from the initial mitigation, it was not as complete as we had hoped for. Subsequent efforts did enable us to successfully resolve the issue.

What will imgix do to prevent this in the future?

We have identified work which can be done to aid in fault isolation when encountering increased latency while requesting content from origins. Several of these changes have been put in place already and work is progressing on others. Additional work is scheduled to provide advance warning when origin latency increases in order to better isolate faults and deploy mitigating measures.

Posted Jun 25, 2019 - 20:55 PDT

Resolved
Rendering performance has returned to normal.
Posted Jun 20, 2019 - 11:46 PDT
Monitoring
We have identified the issue and have implemented a fix. We will continue monitoring the situation.
Posted Jun 20, 2019 - 10:22 PDT
Investigating
We are investigating elevated rendering errors on uncached derivative images. We will update once we have more information.
Posted Jun 20, 2019 - 08:32 PDT
This incident affected: Rendering Infrastructure.