On September 09, 2021, at 14:02 UTC, an improper configuration prevented imgix servers from connecting to some Web folder and Web Proxy origins, which caused non-cached derivative image requests for affected Web Folder / Web Proxy customer origins to return a 503
error.
The impact of this incident was isolated to some Web Folder and Web Proxy customers sharing a common configuration setting.
Between the hours of 14:02 UTC and 18:56 UTC, affected Web Folder and Web Proxy customers experienced a variable increase in errors to non-cached derivative images.
At the height of the incident, a small percentage of Web Folder and Web Proxy requests returned a 503
error, which amounted to 0.16% of all imgix requests.
At 18:56 UTC, a fix was applied, allowing the service to be completely restored.
At 14:20 UTC, our team was alerted to a small increase in fetch errors to some Web Folder and Web Proxy origins. Due to the small number of errors that were reported by our monitoring service, it was unclear whether or not this was the result of some customer origins misbehaving, or if this was an issue with our service’s ability to fetch images.
Eventually, our engineering team tracked down the change to a specific service provider, which we correlated to the increase in errors for some Web Folder / Web Proxy customers.
As our team looked into solutions, several external factors severely slowed remediation efforts:
Eventually, the imgix team deployed a fix that enabled our servers to successfully talk to all Web Folder and Web Proxy origins.
We will be updating our configurations for fetching assets from customer origins to prevent similar issues from occurring, along with updating our service runbooks to include rolling restarts for some types of configuration updates.
We will also be migrating some of our database tooling to mitigate connectivity limitations, along with updating our internal processes to address cases where communication outages occur.