At 16:09 UTC the imgix rendering service began experiencing elevated error rates when rendering some customer images. Normal service was fully restored at 18:12 UTC. imgix engineers continued to monitor the service behavior until 18:35 UTC before closing the incident.
Image requests which had already been cached were not impacted by this incident.
Between 16:09 UTC and 17:55 UTC: Approximately 30% of rendering requests failed, partially impacting many imgix customers.
Between 17:55 UTC and 18:12 UTC: 100% of rendering requests failed, partially impacted all imgix customers. During this period, previously rendered images continued to be served.
imgix engineers were able to quickly identify the cause of service degradation, but encountered difficulties in implementing the necessary remediation due to internal tooling and monitoring issues.
This resulted in both a slower time to resolution than ideally possible, as well as contributing to the further elevation of render error rates between 17:55 UTC and 18:12 UTC.
As a result of this incident, imgix engineering has identified revised procedures which in the immediate term are expected to reduce the severity and time window of any similar future incidents.
The team is also continuing to work on future iterations to the imgix service architecture and internal tooling. These changes are focused solely around reducing the possibility and severity of similar incidents.