Anthropic – Elevated errors for requests to Anthropic API

Dec 1, 18:30 PST
Resolved – This incident has been resolved and we are observing fewer 529 errors.

Dec 1, 07:23 PST
Monitoring – Some requests to the Anthropic API received elevated HTTP 529 errors. At this time, we have seen success rates normalize and are continuing to monitor.

Nov 30, 22:51 PST
Investigating – We are currently investigating this issue.

GitHub – Disruption with some GitHub services

Dec 2, 01:05 UTC
Resolved – Between Dec 1 12:20 UTC and Dec 2 1:05 UTC, availability of large hosted runners for Actions was degraded due to failures in background VM provisioning jobs. Users would see workflows queued waiting for a runner. On average, 8% of all workflows requiring large runners over the incident time were affected, peaking at 37.5% of requests. There were also lower levels of intermittent queuing on Dec 1 beginning around 3:00 UTC. Standard and Mac runners were not affected.

The job failures were caused by timeouts to a dependent service in the VM provisioning flow and gaps in the jobs’ resilience to those timeouts. The incident was mitigated by circumventing the dependency as it was not in the critical path of VM provisioning.

There are a few immediate improvements we are making in response to this. We are addressing the causes of the failed calls to improve the availability of calls to that backend service. Even with that impact, the critical flow of large VM provisioning should not have been impacted, so we are improving the client behavior to fail fast and circuit break non-critical calls. Finally the alerting for this service was not adequate in this particular scenario to ensure fast response by our team. We are improving our automated detection from this to reduce our time to detection and mitigation of issues like this one in the future.

Dec 2, 00:57 UTC
Update – We’ve applied a mitigation to fix the issues with large runner jobs processing. We are seeing improvements in telemetry and are monitoring for full recovery.

Dec 2, 00:14 UTC
Update – We continue to investigate large hosted runners not picking up jobs.

Dec 1, 23:43 UTC
Update – We continue to investigate issues with large runners.

Dec 1, 23:24 UTC
Update – We’re seeing issues related to large runners not picking up jobs and are investigating.

Dec 1, 23:18 UTC
Investigating – We are currently investigating this issue.

Vercel – Issue with .ai Domain Purchases & Transfer

Dec 1, 19:47 UTC
Resolved – This incident has been resolved.

Dec 1, 09:02 UTC
Update – We are continuing to work on a fix for this issue.

Nov 30, 17:55 UTC
Identified – We’re experiencing intermittent issues with the purchase of .ai domains on our platform. Our team is aware of the issue and are actively working to resolve this.