Google Ads – RESOLVED: We’re investigating reports of an issue with Google Ad Manager

Incident began at 2024-12-10 23:10 and ended at 2024-12-11 06:00 (times are in Coordinated Universal Time (UTC)).

The problem with Google Ad Manager has been resolved. We apologize for the inconvenience and thank you for your patience and continued support. The affected users are able to access Google Ad Manager, but are seeing error messages, high latency, and/or other unexpected behavior.

The Ad Manager Bidders UI is now loading the list of “Bidders” for all of the available demand channels (Authorized Buyers, Open Bidding, SDK Bidding, Header bidding).


Affected products: Google Ad Manager

DigitalOcean – Network maintenance in AMS3

Dec 11, 01:30 UTC
Completed – The scheduled maintenance has been completed.

Dec 10, 19:30 UTC
In progress – Scheduled maintenance is currently in progress. We will provide updates as necessary.

Dec 10, 19:26 UTC
Scheduled – We are reaching out again to inform you that the Network maintenance in AMS3 region which was previously scheduled to start on 2024-12-10 at 12:00 UTC has been rescheduled to the following window:

Start: 2024-12-10 19:30 UTC
End: 2024-12-11 01:00 UTC

We apologize for any inconvenience this short notice causes and thank you for your understanding. You may find the initial maintenance notice along with a description of any expected impact related to this work included at the bottom of this message.

Expected impact:

During the maintenance window users may experience delays or failures with event processing for a brief duration on Droplets and Droplet-based services including Droplets, Managed Kubernetes, Load Balancers, Container Registry, and App Platform. We will endeavor to keep this to a minimum for the duration of the change.

If you have any questions related to this issue please send us a ticket from your cloud support page. https://cloudsupport.digitalocean.com/s/createticket

DigitalOcean – Managed Databases CRUD Operations

Dec 11, 00:56 UTC
Resolved – From 16:10 UTC to 22:46 UTC, users may have experienced issues while executing Managed Database CRUD Operations.

Our Engineering team has confirmed the full resolution of the issue impacting Managed Database CRUD Operations, and all systems are now operating normally.
Users may safely resume operations, including upgrades, resizes, forking, and ad-hoc maintenance patches.

If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.

Dec 10, 22:54 UTC
Monitoring – Our Engineering team has implemented a fix for the issue impacting Managed Database CRUD Operations. The team is monitoring the situation, and we will share an update once this is fully resolved.

Dec 10, 22:44 UTC
Identified – Our Engineering team has identified the cause of the issue that is impacting Managed Database CRUD Operations and is actively working on a fix.

To avoid potential downtime, we continue to ask users to refrain from performing operations that trigger node rotations, such as upgrades, resizes, forking, and ad-hoc maintenance patches.

Existing database clusters remain unaffected as long as no node rotation occurs due to DNS issues, and all other services are functioning as expected.

We apologize for any inconvenience caused and appreciate your patience as we work diligently to address the situation. Further updates will be shared as soon as they become available.

Dec 10, 20:11 UTC
Update – During our investigation, we identified that operations triggering node rotation, such as upgrades, resizes, forking and ad-hoc maintenance patches, may also be impacted. To prevent potential downtime, we recommend avoiding these operations until the issue is fully resolved.

We apologize for the inconvenience and appreciate your patience as we work to address the situation.

Dec 10, 19:22 UTC
Investigating – Our Engineering team is investigating an issue causing Managed Database clusters to take longer than usual to be created. Our team is currently assessing the root cause and working to resolve the issue as quickly as possible.

Existing database clusters are not affected, and all other services are operating normally.

We apologize for any inconvenience this may cause and will share an update as soon as we have more information.

Atlassian Analytics – Jira issue query errors

Dec 10, 23:15 UTC
Resolved – The incident is resolved

Dec 10, 22:42 UTC
Monitoring – A fix has been made and we’re monitoring it’s stability

Dec 10, 21:33 UTC
Identified – We’ve identified the issue and are working to deploy a fix.

Dec 10, 20:41 UTC
Investigating – We are currently investigating an issue with querying Jira data in Atlassian Analytics.

Instructure – Our support phone provider has run into an issue that is causing us to not receive user’s calls. We are working with them to resolve this. For now please contact support via email, or our online chat.

Dec 10, 13:21 MST
Monitoring – A fix has been implemented and we are monitoring the results.

Dec 10, 14:30 MST
Resolved – This incident has been resolved.

Dec 10, 13:05 MST
Identified – The issue has been identified and a fix is being implemented.

Google Cloud – RESOLVED: We experienced elevated errors with BigQuery accessing Google Drive.

Incident Report

Summary

Starting on 4 December 2024 at 14:30 US/Pacific, Google BigQuery experienced elevated invalid value and internal system errors globally for traffic related to BigQuery and Google Drive integration for 3 hours and 25 minutes. The incident affected users and tasks attempting to export data to Google Drive, resulting in failed export jobs.

Incident began at 2024-12-04 14:30 and ended at 2024-12-04 18:28 (all times are US/Pacific).

To our BigQuery customers whose business analytics were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

The impacted users would have encountered “API key not valid” and “Failed to read the spreadsheet” errors for export jobs when accessing Google Drive. This resulted in service unavailability or failing jobs for the duration of this disruption for Google BigQuery.

  • An internally used API key was flagged for a Google policy non-compliance and deemed no longer in use which led to the deleting of the API key.
  • Unclear Internal Google Project Ownership: The project ownership was not clearly recorded, and thus incorrectly associated with a deprecated service.
  • Outdated Information: The combination of the perceived lack of recent activity and the incorrect ownership led to the project being mistakenly classified as abandoned and deprecated.

Remediation and Prevention

Google engineers were alerted to the service degrading via a support case on 4 December 2024, at approximately 14:21 US/Pacific when users began experiencing failures in data export operations. Google Engineers were alerted to the service disruption through internal monitoring systems and user reports. Upon investigation, the deletion of the project was identified as the root cause.

Root Cause

This disruption of data export functionality was triggered by the deletion of an internal project containing essential API keys. This deletion was an unintended consequence of several contributing factors:

Google is committed to continually improving our technology and operations to prevent service disruptions. We apologize for any inconvenience this incident may have caused and appreciate your understanding.

To mitigate the impact, the project was restored at approximately 15:45 US/Pacific. This action successfully recovered the API keys and over time restored the data export functionality for all affected users. The final error related to this incident was observed at approximately 17:55 US/Pacific, indicating full service recovery.

  • Remove dependency on API keys for BigQuery integrations with other Google services: This will eliminate the entire failure mode.

    Google is committed to preventing a repeat of this issue in the future and is completing the following actions.

  • Enhance Project Metadata: We are implementing a process for regular review and validation of project ownership and metadata. This will ensure that critical information about project usage and status is accurate and up-to-date, reducing the risk of incorrect assumptions about project status.

  • Implement accidental deletion protection for critical internal resources: Use mechanisms like project liens to ensure that a critical resource cannot be deleted accidentally.

Detailed Description of Impact

Starting on 4 December 2024, Google BigQuery experienced elevated error rates for data export operations to Google Drive globally. Between approximately 14:21 and 18:04 US/Pacific, users attempting to export data from BigQuery to Google Drive encountered failures, resulting in service disruption for this specific functionality.

  • Strengthen Internal Processes and Access Controls: We are strengthening our processes for deprecating and deleting projects, including mandatory reviews, impact assessments, and stakeholder approvals. This will prevent accidental deletion of critical projects and ensure that all potential impacts are thoroughly evaluated before any action is taken. We are also strengthening access controls for project deletion, ensuring that only authorized personnel with appropriate approvals can perform this action. This will add an additional layer of protection against unintended project deletion.

    Google BigQuery

    This disruption specifically impacted users and automated tasks relying on the BigQuery to Google Drive export functionality. Export jobs initiated during this period failed to complete, preventing data transfer and potentially impacting downstream processes and workflows dependent on this data.

    The incident affected all regions and impacted users encountered errors such as “API key not valid,” “Failed to read the spreadsheet,” or “[Error: 80324028]”. Internal error messages further specified the issue as “Dremel returned third-party error from GDRIVE: FAILED_PRECONDITION: Encountered an error while creating temporary directory” with an underlying status of “Http(400) Bad Request, API key not valid. Please pass a valid API key.”

    Affected locations: Johannesburg (africa-south1), Taiwan (asia-east1), Hong Kong (asia-east2), Tokyo (asia-northeast1), Osaka (asia-northeast2), Seoul (asia-northeast3), Mumbai (asia-south1), Delhi (asia-south2), Singapore (asia-southeast1), Jakarta (asia-southeast2), Sydney (australia-southeast1), Melbourne (australia-southeast2), Warsaw (europe-central2), Finland (europe-north1), Madrid (europe-southwest1), Berlin (europe-west10), Turin (europe-west12), London (europe-west2), Frankfurt (europe-west3), Netherlands (europe-west4), Zurich (europe-west6), Milan (europe-west8), Paris (europe-west9), Doha (me-central1), Dammam (me-central2), Tel Aviv (me-west1), Montréal (northamerica-northeast1), Toronto (northamerica-northeast2), São Paulo (southamerica-east1), Santiago (southamerica-west1), Iowa (us-central1), South Carolina (us-east1), Northern Virginia (us-east4), Columbus (us-east5), Dallas (us-south1), Oregon (us-west1), Los Angeles (us-west2), Salt Lake City (us-west3), Las Vegas (us-west4)


  • Affected products: Google BigQuery

    Google Cloud – RESOLVED: We are investigating elevated error rates and latency for streaming ingestion into BigQuery

    Mini Incident Report

    We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support or to Google Workspace Support using help article https://support.google.com/a/answer/1047213.

    Incident began at 2024-12-09 09:33 and ended at 2024-12-09 11:50 (all times are US/Pacific).

    Incident Start: 9 December 2024 09:24

    (All Times US/Pacific)

    Duration: 2 hours, 24 minutes

    Incident End: 9 December 2024 11:40

    Google BigQuery
    Cloud Dataflow

    Affected Services and Features:

    • Google BigQuery – US multi-region
    • Cloud Dataflow – us-west1, us-east1, us-east4, us-west2 & us-central1 were the most impacted but all Dataflow pipelines writing to the BigQuery US multi-region have likely been impacted too.

    Description:

    Regions/Zones:

    Google will complete a full IR in the following days that will provide a full root cause.

    Google BigQuery experienced increased latency and elevated error rates in US multi-region for a duration of 2 hours, 24 minutes. Cloud Dataflow customers also observed elevated latency in their streaming jobs to the BigQuery US multi-region. Preliminary analysis indicates that the root cause of the issue was a sudden burst of traffic, which overloaded and slowed the backend in the availability zone. This led to aggressive retries and overloaded the frontend service. The incident was mitigated by rate-limiting requests and by evacuating the slow availability zone.

    Google BigQuery

    • During the incident, customers calling google.cloud.bigquery.v2.TableDataService.InsertAll API method may have experienced transient failures with 5XX status code, which should have succeeded after retries.
    • Customers using google.cloud.bigquery.storage.v1.AppendRows may have experienced increased latency during this incident.

    Cloud Dataflow

    • Customers would have experienced increased latency for streaming jobs in the us-east1, us-east4, us-west1, us-west2, and us-central1 regions.

    Affected products: Google BigQuery, Google Cloud Dataflow

    Customer Impact:

    Affected locations: Multi-region: us, Iowa (us-central1), South Carolina (us-east1), Northern Virginia (us-east4), Oregon (us-west1), Los Angeles (us-west2)

    Firebase – RESOLVED: Firebase App Hosting may intermittently serve incorrect content.

    Incident began at 2024-12-09 23:00 and ended at 2024-12-10 09:20 (all times are US/Pacific).

    The issue where Firebase App Hosting intermittently serve incorrect content has been resolved. We identified the root cause and have deployed a fix.

    If you continue to experience any issues, please contact Firebase Support: https://firebase.google.com/support


    Affected products: App Hosting