Warm tip: This article is reproduced from stackoverflow.com, please click
google-bigquery

How to set up job dependencies in google bigquery?

发布于 2020-03-29 12:48:08

I have a few jobs, say one is loading a text file from a google cloud storage bucket to bigquery table, and another one is a scheduled query to copy data from one table to another table with some transformation, I want the second job to depend on the success of the first one, how do we achieve this in bigquery if it is possible to do so at all?

Many thanks.

Best regards,

Questioner
JJZ
Viewed
55
Pentium10 2020-01-31 19:25

Right now a developer needs to put together the chain of operations. It can be done either using Cloud Functions (supports, Node.js, Go, Python) or via Cloud Run container (supports gcloud API, any programming language).

Basically you need to

  1. issue a job
  2. get the job id
  3. poll for the job id
  4. job is finished trigger other steps

If using Cloud Functions

  1. place the file into a dedicated GCS bucket
  2. setup a GCF that monitors that bucket and when a new file is uploaded it will execute a function that imports into GCS - wait until the operations ends
  3. at the end of the GCF you can trigger other functions for next step

another use case with Cloud Functions:

A: a trigger starts the GCF
B: function executes the query (copy data to another table)
C: gets a job id - fires another function with a bit of delay

I: a function gets a jobid
J: polls for job is ready?
K: if not ready, fires himself again with a bit of delay
L: if ready triggers next step - could be a dedicated function or parameterized function