December 2024 | Page 8 of 27 | Down Log Status Tracker v0.23

Firebase – UPDATE: Firebase Console incorrectly lists custom domain status as “pending” even after completion

December 17, 2024 by Down Log

Incident began at 2024-11-21 00:00 (all times are US/Pacific).

The App Hosting console is reporting that custom domains are in a pending state, even though they’re not. This is caused by a communication failure inside our systems that we’re working to fix. The information in the console will be correct as soon as we can correct it. The domains are working fine; this should have no impact on end users.

Affected products: App Hosting

Anthropic – Unauthorized post from @AnthropicAI X.com account

December 24, 2024December 17, 2024 by Down Log

Dec 17, 13:16 PST
Resolved – We have identified and addressed the root cause of unauthorized posts on @AnthropicAI, our official X account. No Anthropic systems or services were affected in this incident.

Dec 17, 09:08 PST
Monitoring – We have regained secure access to the @AnthropicAI account and will be continuing our investigation, with the support of X, into how these unauthorized posts were made.

Dec 17, 08:43 PST
Identified – We are aware of a second unauthorized post on @AnthropicAI on X.com, and are continuing to work to regain access to the impacted account. There are no impacts to other Anthropic services.

Dec 17, 08:04 PST
Investigating – We are aware of an unauthorized post originating from our official X.com account, @AnthropicAI. At this time the post has been removed, and we are investigating the issue.

Zapier – Stripe Auth Expiration and Reconnection Errors

December 24, 2024December 17, 2024 by Down Log

Dec 17, 11:38 PST
Resolved – This incident has been resolved.

On Dec 16th at 4:27 PM UTC, Stripe connections began failing, and attempts to reconnect returned the following error:

‘Invalid auth connection’

Our team implemented a fix at 5:30 PM UTC on the same day.

Due to these authentication errors, some Zaps using a Stripe trigger were turned off. As of 9:30 PM UTC on Dec 16th, our team unpaused these Zaps. During the downtime, we continued to receive hooks for paused Zaps, and any failed Zap runs were replayed by our team as of Dec 17th at 6:01 PM UTC. No data was lost.

We appreciate your patience during this incident and sincerely apologize for any inconvenience caused. If you have any further questions, please don’t hesitate to reach out to our support team here: https://zapier.com/app/get-help.

Dec 16, 10:48 PST
Monitoring – We are currently looking into an issue where users experienced errors when trying to establish a connection with Stripe. The specific error message was ‘Invalid auth connection.’

We are pleased to report that we have implemented a fix and users should now be able to reconnect/enable any Zaps using Stripe without encountering the earlier problem.

If any further issues arise or you do have questions, please do not hesitate to contact our dedicated Support Team, which can be reached via this link: https://zapier.com/app/get-help

We do apologize for any inconvenience that this may have caused and appreciate your understanding as we worked to resolve the issue. We will continue to monitor the situation closely.

Dec 16, 09:00 PST
Investigating – We’re currently investigating an issue where Stripe auths are expiring, and attempts to reconnect return the following error:

‘Invalid auth connection.’

We’ll update this page with more information as it becomes available. If you have any questions, please contact our support team at https://zapier.com/app/get-help.

Bubble – Issues with Main Bubble Cluster

December 24, 2024December 17, 2024 by Down Log

Dec 17, 13:09 EST
Resolved – Our systems are functional and we are closing out this incident.

Dec 17, 12:53 EST
Investigating – We are investigating reports of issues with our systems.

Cloudflare – LAX (Los Angeles) on 2024-12-17

December 17, 2024 by Down Log

Dec 13, 21:44 UTC
Update – We will be performing scheduled maintenance in LAX (Los Angeles) datacenter on 2024-12-17 between 17:00 and 22:00 UTC.

Traffic might be re-routed from this location, hence there is a possibility of a slight increase in latency during this maintenance window for end-users in the affected region. For PNI / CNI customers connecting with us in this location, please make sure you are expecting this traffic to fail over elsewhere during this maintenance window as network interfaces in this datacentre may become temporarily unavailable.

You can now subscribe to these notifications via Cloudflare dashboard and receive these updates directly via email, PagerDuty and webhooks (based on your plan): https://developers.cloudflare.com/notifications/notification-available/#cloudflare-status.

THIS IS A SCHEDULED EVENT Dec 17, 17:00 – 22:00 UTC

Dec 10, 21:48 UTC
Scheduled – We will be performing scheduled maintenance in LAX (Los Angeles) datacenter on 2024-12-17 between 09:00 and 12:00 UTC.

GitHub – Live updates on pages not loading reliably

December 24, 2024December 17, 2024 by Down Log

Dec 17, 16:00 UTC
Resolved – On December 17th, 2024, between 14:33 UTC and 14:50 UTC, many users experienced intermittent errors and timeouts when accessing github.com. The error rate was 8.5% on average and peaked at 44.3% of requests. The increased error rate caused a broad impact across our services, such as the inability to log in, view a repository, open a pull request, and comment on issues. The errors were caused by our web servers being overloaded as a result of planned maintenance that unintentionally caused our live updates service to fail to start. As a result of the live updates service being down, clients reconnected aggressively and overloaded our servers.

We only marked Issues as affected during this incident despite the broad impact. This oversight was due to a gap in our alerting while our web servers were overloaded. The engineering team’s focus on restoring functionality led us to not identify the broad scope of the impact to customers until the incident had already been mitigated.

We mitigated the incident by rolling back the changes from the planned maintenance to the live updates service and scaling up the service to handle the influx of traffic from WebSocket clients.

We are working to reduce the impact of the live updates service’s availability on github.com to prevent issues like this one in the future. We are also working to improve our alerting to better detect the scope of impact from incidents like this.

Dec 17, 15:32 UTC
Update – Issues is operating normally.

Dec 17, 15:29 UTC
Update – We have taken some mitigation steps and are continuing to investigate the issue. There was a period of wider impact on many GitHub services such as user logins and page loads which should now be mitigated.

Dec 17, 15:05 UTC
Update – Issues is experiencing degraded availability. We are continuing to investigate.

Dec 17, 14:53 UTC
Update – We are currently seeing live updates on some pages not working. This can impact features such as status checks and the merge button for PRs.

Current mitigation is to refresh pages manually to see latest details.

We are working to mitigate this and will continue to provide updates as the team makes progress.

Dec 17, 14:51 UTC
Investigating – We are investigating reports of degraded performance for Issues

Firebase – RESOLVED: Vertex AI In Firebase – Elevated RESOURCE_EXHAUSTED in asia-northeast1

December 17, 2024 by Down Log

Incident began at 2024-11-23 13:50 and ended at 2024-11-23 14:50 (all times are US/Pacific).

Problem recognized within asia-northeast1 only, due to upstream vertex AI service having problem.

Affected products: Vertex AI for Firebase

Vercel – Vercel Dashboard and API Functionality Degraded

December 24, 2024December 17, 2024 by Down Log

Dec 17, 10:13 UTC
Resolved – This incident has been resolved.

Dec 17, 07:41 UTC
Monitoring – A fix has been implemented and we are monitoring the results.

Dec 17, 07:12 UTC
Investigating – We are still investigating the issue affecting Vercel Dashboard and API access from bom1, iad1, cle1 fra1. This is not affecting existing customer deployments on Vercel.

Dec 17, 05:13 UTC
Update – We are continuing to monitor for any further issues.

Dec 17, 05:04 UTC
Monitoring – A fix has been implemented and we are monitoring the results.

Dec 17, 04:06 UTC
Investigating – We have identified an issue affecting a subset of Vercel Dashboard features related to Teams. We are currently investigating and will provide additional information as this progresses.

Cloudflare – SOF (Sofia) on 2024-12-17

December 16, 2024 by Down Log

Dec 9, 09:12 UTC
Scheduled – We will be performing scheduled maintenance in SOF (Sofia) datacenter on 2024-12-17 between 01:00 and 06:00 UTC.

THIS IS A SCHEDULED EVENT Dec 17, 01:00 – 06:00 UTC

OpenAI – High error rate for fine-tuning API

December 24, 2024December 16, 2024 by Down Log

Dec 16, 14:58 PST
Resolved – This incident has been resolved.

Dec 16, 14:54 PST
Monitoring – A fix has been deployed and the Fine-tuning API endpoints are no longer returning 500 responses.

Dec 16, 14:47 PST
Update – We are continuing to work on a fix for this issue.

Dec 16, 14:47 PST
Identified – Fine-tuning API endpoints (`/v1/fine_tuning/jobs/*`) are returning high rates of 500 responses. The issue has been identified and a fix is being rolled out.