Google Ads – RESOLVED: We’re investigating reports of an issue with Google Ad Manager

Incident began at 2024-12-10 23:10 and ended at 2024-12-11 06:00 (times are in Coordinated Universal Time (UTC)).

The problem with Google Ad Manager has been resolved. We apologize for the inconvenience and thank you for your patience and continued support. The affected users are able to access Google Ad Manager, but are seeing error messages, high latency, and/or other unexpected behavior.

The Ad Manager Bidders UI is now loading the list of “Bidders” for all of the available demand channels (Authorized Buyers, Open Bidding, SDK Bidding, Header bidding).


Affected products: Google Ad Manager

Google Cloud – RESOLVED: We experienced elevated errors with BigQuery accessing Google Drive.

Incident Report

Summary

Starting on 4 December 2024 at 14:30 US/Pacific, Google BigQuery experienced elevated invalid value and internal system errors globally for traffic related to BigQuery and Google Drive integration for 3 hours and 25 minutes. The incident affected users and tasks attempting to export data to Google Drive, resulting in failed export jobs.

Incident began at 2024-12-04 14:30 and ended at 2024-12-04 18:28 (all times are US/Pacific).

To our BigQuery customers whose business analytics were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

The impacted users would have encountered “API key not valid” and “Failed to read the spreadsheet” errors for export jobs when accessing Google Drive. This resulted in service unavailability or failing jobs for the duration of this disruption for Google BigQuery.

  • An internally used API key was flagged for a Google policy non-compliance and deemed no longer in use which led to the deleting of the API key.
  • Unclear Internal Google Project Ownership: The project ownership was not clearly recorded, and thus incorrectly associated with a deprecated service.
  • Outdated Information: The combination of the perceived lack of recent activity and the incorrect ownership led to the project being mistakenly classified as abandoned and deprecated.

Remediation and Prevention

Google engineers were alerted to the service degrading via a support case on 4 December 2024, at approximately 14:21 US/Pacific when users began experiencing failures in data export operations. Google Engineers were alerted to the service disruption through internal monitoring systems and user reports. Upon investigation, the deletion of the project was identified as the root cause.

Root Cause

This disruption of data export functionality was triggered by the deletion of an internal project containing essential API keys. This deletion was an unintended consequence of several contributing factors:

Google is committed to continually improving our technology and operations to prevent service disruptions. We apologize for any inconvenience this incident may have caused and appreciate your understanding.

To mitigate the impact, the project was restored at approximately 15:45 US/Pacific. This action successfully recovered the API keys and over time restored the data export functionality for all affected users. The final error related to this incident was observed at approximately 17:55 US/Pacific, indicating full service recovery.

  • Remove dependency on API keys for BigQuery integrations with other Google services: This will eliminate the entire failure mode.

    Google is committed to preventing a repeat of this issue in the future and is completing the following actions.

  • Enhance Project Metadata: We are implementing a process for regular review and validation of project ownership and metadata. This will ensure that critical information about project usage and status is accurate and up-to-date, reducing the risk of incorrect assumptions about project status.

  • Implement accidental deletion protection for critical internal resources: Use mechanisms like project liens to ensure that a critical resource cannot be deleted accidentally.

Detailed Description of Impact

Starting on 4 December 2024, Google BigQuery experienced elevated error rates for data export operations to Google Drive globally. Between approximately 14:21 and 18:04 US/Pacific, users attempting to export data from BigQuery to Google Drive encountered failures, resulting in service disruption for this specific functionality.

  • Strengthen Internal Processes and Access Controls: We are strengthening our processes for deprecating and deleting projects, including mandatory reviews, impact assessments, and stakeholder approvals. This will prevent accidental deletion of critical projects and ensure that all potential impacts are thoroughly evaluated before any action is taken. We are also strengthening access controls for project deletion, ensuring that only authorized personnel with appropriate approvals can perform this action. This will add an additional layer of protection against unintended project deletion.

    Google BigQuery

    This disruption specifically impacted users and automated tasks relying on the BigQuery to Google Drive export functionality. Export jobs initiated during this period failed to complete, preventing data transfer and potentially impacting downstream processes and workflows dependent on this data.

    The incident affected all regions and impacted users encountered errors such as “API key not valid,” “Failed to read the spreadsheet,” or “[Error: 80324028]”. Internal error messages further specified the issue as “Dremel returned third-party error from GDRIVE: FAILED_PRECONDITION: Encountered an error while creating temporary directory” with an underlying status of “Http(400) Bad Request, API key not valid. Please pass a valid API key.”

    Affected locations: Johannesburg (africa-south1), Taiwan (asia-east1), Hong Kong (asia-east2), Tokyo (asia-northeast1), Osaka (asia-northeast2), Seoul (asia-northeast3), Mumbai (asia-south1), Delhi (asia-south2), Singapore (asia-southeast1), Jakarta (asia-southeast2), Sydney (australia-southeast1), Melbourne (australia-southeast2), Warsaw (europe-central2), Finland (europe-north1), Madrid (europe-southwest1), Berlin (europe-west10), Turin (europe-west12), London (europe-west2), Frankfurt (europe-west3), Netherlands (europe-west4), Zurich (europe-west6), Milan (europe-west8), Paris (europe-west9), Doha (me-central1), Dammam (me-central2), Tel Aviv (me-west1), Montréal (northamerica-northeast1), Toronto (northamerica-northeast2), São Paulo (southamerica-east1), Santiago (southamerica-west1), Iowa (us-central1), South Carolina (us-east1), Northern Virginia (us-east4), Columbus (us-east5), Dallas (us-south1), Oregon (us-west1), Los Angeles (us-west2), Salt Lake City (us-west3), Las Vegas (us-west4)


  • Affected products: Google BigQuery

    Google Cloud – RESOLVED: We are investigating elevated error rates and latency for streaming ingestion into BigQuery

    Mini Incident Report

    We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support or to Google Workspace Support using help article https://support.google.com/a/answer/1047213.

    Incident began at 2024-12-09 09:33 and ended at 2024-12-09 11:50 (all times are US/Pacific).

    Incident Start: 9 December 2024 09:24

    (All Times US/Pacific)

    Duration: 2 hours, 24 minutes

    Incident End: 9 December 2024 11:40

    Google BigQuery
    Cloud Dataflow

    Affected Services and Features:

    • Google BigQuery – US multi-region
    • Cloud Dataflow – us-west1, us-east1, us-east4, us-west2 & us-central1 were the most impacted but all Dataflow pipelines writing to the BigQuery US multi-region have likely been impacted too.

    Description:

    Regions/Zones:

    Google will complete a full IR in the following days that will provide a full root cause.

    Google BigQuery experienced increased latency and elevated error rates in US multi-region for a duration of 2 hours, 24 minutes. Cloud Dataflow customers also observed elevated latency in their streaming jobs to the BigQuery US multi-region. Preliminary analysis indicates that the root cause of the issue was a sudden burst of traffic, which overloaded and slowed the backend in the availability zone. This led to aggressive retries and overloaded the frontend service. The incident was mitigated by rate-limiting requests and by evacuating the slow availability zone.

    Google BigQuery

    • During the incident, customers calling google.cloud.bigquery.v2.TableDataService.InsertAll API method may have experienced transient failures with 5XX status code, which should have succeeded after retries.
    • Customers using google.cloud.bigquery.storage.v1.AppendRows may have experienced increased latency during this incident.

    Cloud Dataflow

    • Customers would have experienced increased latency for streaming jobs in the us-east1, us-east4, us-west1, us-west2, and us-central1 regions.

    Affected products: Google BigQuery, Google Cloud Dataflow

    Customer Impact:

    Affected locations: Multi-region: us, Iowa (us-central1), South Carolina (us-east1), Northern Virginia (us-east4), Oregon (us-west1), Los Angeles (us-west2)

    Firebase – RESOLVED: Firebase App Hosting may intermittently serve incorrect content.

    Incident began at 2024-12-09 23:00 and ended at 2024-12-10 09:20 (all times are US/Pacific).

    The issue where Firebase App Hosting intermittently serve incorrect content has been resolved. We identified the root cause and have deployed a fix.

    If you continue to experience any issues, please contact Firebase Support: https://firebase.google.com/support


    Affected products: App Hosting

    Google Cloud – RESOLVED: Recommendation AI & Vertex AI Search for Retail are experiencing increased delays in data indexing (p90 delay was up to ~10 hours).

    The issue with Vertex AI Search, Recommendation AI has been resolved for all affected users as of Monday, 2024-12-09 21:48 US/Pacific.

    Incident began at 2024-12-09 09:30 and ended at 2024-12-09 21:48 (all times are US/Pacific).


    Affected products: Cloud Machine Learning, Recommendation AI, Vertex AI Search

    We thank you for your patience while we worked on resolving the issue.

    Affected locations: Multi-region: eu, Global, Multi-region: us

    Google Cloud – RESOLVED: Latency in Dataflow pipelines observed on Apigee Edge Public Cloud, Apigee X and Apigee Hybrid

    The issue with Apigee Edge Public Cloud, Apigee Hybrid & Apigee have been resolved for all affected projects as of Monday, 2024-12-09 16:45 US/Pacific.

    Incident began at 2024-12-09 13:55 and ended at 2024-12-09 17:01 (all times are US/Pacific).


    Affected products: Apigee, Apigee Edge Public Cloud, Apigee Hybrid

    We thank you for your patience while we worked on resolving the issue.

    Affected locations: Global, Iowa (us-central1), South Carolina (us-east1), Northern Virginia (us-east4), Columbus (us-east5), Dallas (us-south1), Oregon (us-west1), Los Angeles (us-west2), Salt Lake City (us-west3), Las Vegas (us-west4)