Down Log Status Tracker v0.23

Zapier – Stripe Auth Expiration and Reconnection Errors

December 22, 2024December 17, 2024 by Down Log

Dec 17, 11:38 PST
Resolved – This incident has been resolved.

On Dec 16th at 4:27 PM UTC, Stripe connections began failing, and attempts to reconnect returned the following error:

‘Invalid auth connection’

Our team implemented a fix at 5:30 PM UTC on the same day.

Due to these authentication errors, some Zaps using a Stripe trigger were turned off. As of 9:30 PM UTC on Dec 16th, our team unpaused these Zaps. During the downtime, we continued to receive hooks for paused Zaps, and any failed Zap runs were replayed by our team as of Dec 17th at 6:01 PM UTC. No data was lost.

We appreciate your patience during this incident and sincerely apologize for any inconvenience caused. If you have any further questions, please don’t hesitate to reach out to our support team here: https://zapier.com/app/get-help.

Dec 16, 10:48 PST
Monitoring – We are currently looking into an issue where users experienced errors when trying to establish a connection with Stripe. The specific error message was ‘Invalid auth connection.’

We are pleased to report that we have implemented a fix and users should now be able to reconnect/enable any Zaps using Stripe without encountering the earlier problem.

If any further issues arise or you do have questions, please do not hesitate to contact our dedicated Support Team, which can be reached via this link: https://zapier.com/app/get-help

We do apologize for any inconvenience that this may have caused and appreciate your understanding as we worked to resolve the issue. We will continue to monitor the situation closely.

Dec 16, 09:00 PST
Investigating – We’re currently investigating an issue where Stripe auths are expiring, and attempts to reconnect return the following error:

‘Invalid auth connection.’

We’ll update this page with more information as it becomes available. If you have any questions, please contact our support team at https://zapier.com/app/get-help.

Bubble – Issues with Main Bubble Cluster

December 22, 2024December 17, 2024 by Down Log

Dec 17, 13:09 EST
Resolved – Our systems are functional and we are closing out this incident.

Dec 17, 12:53 EST
Investigating – We are investigating reports of issues with our systems.

Cloudflare – LAX (Los Angeles) on 2024-12-17

December 17, 2024 by Down Log

Dec 13, 21:44 UTC
Update – We will be performing scheduled maintenance in LAX (Los Angeles) datacenter on 2024-12-17 between 17:00 and 22:00 UTC.

Traffic might be re-routed from this location, hence there is a possibility of a slight increase in latency during this maintenance window for end-users in the affected region. For PNI / CNI customers connecting with us in this location, please make sure you are expecting this traffic to fail over elsewhere during this maintenance window as network interfaces in this datacentre may become temporarily unavailable.

You can now subscribe to these notifications via Cloudflare dashboard and receive these updates directly via email, PagerDuty and webhooks (based on your plan): https://developers.cloudflare.com/notifications/notification-available/#cloudflare-status.

THIS IS A SCHEDULED EVENT Dec 17, 17:00 – 22:00 UTC

Dec 10, 21:48 UTC
Scheduled – We will be performing scheduled maintenance in LAX (Los Angeles) datacenter on 2024-12-17 between 09:00 and 12:00 UTC.

GitHub – Live updates on pages not loading reliably

December 22, 2024December 17, 2024 by Down Log

Dec 17, 16:00 UTC
Resolved – On December 17th, 2024, between 14:33 UTC and 14:50 UTC, many users experienced intermittent errors and timeouts when accessing github.com. The error rate was 8.5% on average and peaked at 44.3% of requests. The increased error rate caused a broad impact across our services, such as the inability to log in, view a repository, open a pull request, and comment on issues. The errors were caused by our web servers being overloaded as a result of planned maintenance that unintentionally caused our live updates service to fail to start. As a result of the live updates service being down, clients reconnected aggressively and overloaded our servers.

We only marked Issues as affected during this incident despite the broad impact. This oversight was due to a gap in our alerting while our web servers were overloaded. The engineering team’s focus on restoring functionality led us to not identify the broad scope of the impact to customers until the incident had already been mitigated.

We mitigated the incident by rolling back the changes from the planned maintenance to the live updates service and scaling up the service to handle the influx of traffic from WebSocket clients.

We are working to reduce the impact of the live updates service’s availability on github.com to prevent issues like this one in the future. We are also working to improve our alerting to better detect the scope of impact from incidents like this.

Dec 17, 15:32 UTC
Update – Issues is operating normally.

Dec 17, 15:29 UTC
Update – We have taken some mitigation steps and are continuing to investigate the issue. There was a period of wider impact on many GitHub services such as user logins and page loads which should now be mitigated.

Dec 17, 15:05 UTC
Update – Issues is experiencing degraded availability. We are continuing to investigate.

Dec 17, 14:53 UTC
Update – We are currently seeing live updates on some pages not working. This can impact features such as status checks and the merge button for PRs.

Current mitigation is to refresh pages manually to see latest details.

We are working to mitigate this and will continue to provide updates as the team makes progress.

Dec 17, 14:51 UTC
Investigating – We are investigating reports of degraded performance for Issues

Firebase – RESOLVED: Vertex AI In Firebase – Elevated RESOURCE_EXHAUSTED in asia-northeast1

December 17, 2024 by Down Log

Incident began at 2024-11-23 13:50 and ended at 2024-11-23 14:50 (all times are US/Pacific).

Problem recognized within asia-northeast1 only, due to upstream vertex AI service having problem.

Affected products: Vertex AI for Firebase

Vercel – Vercel Dashboard and API Functionality Degraded

December 22, 2024December 17, 2024 by Down Log

Dec 17, 10:13 UTC
Resolved – This incident has been resolved.

Dec 17, 07:41 UTC
Monitoring – A fix has been implemented and we are monitoring the results.

Dec 17, 07:12 UTC
Investigating – We are still investigating the issue affecting Vercel Dashboard and API access from bom1, iad1, cle1 fra1. This is not affecting existing customer deployments on Vercel.

Dec 17, 05:13 UTC
Update – We are continuing to monitor for any further issues.

Dec 17, 05:04 UTC
Monitoring – A fix has been implemented and we are monitoring the results.

Dec 17, 04:06 UTC
Investigating – We have identified an issue affecting a subset of Vercel Dashboard features related to Teams. We are currently investigating and will provide additional information as this progresses.

Cloudflare – SOF (Sofia) on 2024-12-17

December 16, 2024 by Down Log

Dec 9, 09:12 UTC
Scheduled – We will be performing scheduled maintenance in SOF (Sofia) datacenter on 2024-12-17 between 01:00 and 06:00 UTC.

THIS IS A SCHEDULED EVENT Dec 17, 01:00 – 06:00 UTC

OpenAI – High error rate for fine-tuning API

December 22, 2024December 16, 2024 by Down Log

Dec 16, 14:58 PST
Resolved – This incident has been resolved.

Dec 16, 14:54 PST
Monitoring – A fix has been deployed and the Fine-tuning API endpoints are no longer returning 500 responses.

Dec 16, 14:47 PST
Update – We are continuing to work on a fix for this issue.

Dec 16, 14:47 PST
Identified – Fine-tuning API endpoints (`/v1/fine_tuning/jobs/*`) are returning high rates of 500 responses. The issue has been identified and a fix is being rolled out.

Vercel – ‘No Production Domain’ Message in Project Overview

December 22, 2024December 16, 2024 by Down Log

Dec 16, 18:28 UTC
Resolved – This incident has been resolved.

Dec 16, 15:45 UTC
Monitoring – A fix has been implemented and new production domains will not have a problem. We’re currently working on applying the fix to existing domains. Please continue to follow our status page for updates on this issue.

Dec 16, 15:11 UTC
Identified – We are skipping the domain assignment erroneously for deployments that are intended to be production.

Supabase – Supavisor and Storage connectivity issues in ap-southeast-1 (Singapore)

December 22, 2024December 16, 2024 by Down Log

Dec 16, 17:23 UTC
Resolved – This incident has been resolved.

Dec 16, 14:42 UTC
Monitoring – A fix has been implemented and we are monitoring the results.

Dec 16, 13:21 UTC
Update – Our engineers have identified the root cause of the issue and some connectivity has improved. We are now working on resolving the issue fully.

Dec 16, 11:20 UTC
Identified – We have identified a Supavisor connectivity issue in ap-southeast-1. This issue is affecting Supavisor and our Storage functionality. Engineers are working on resolving the issue.

Dec 16, 11:09 UTC
Investigating – We are currently investigating this issue.