GCP Datafusion repeating same data from GCS

Tlaquetzal 2020-03-10 01:51

I just replicated the issue. My guess is that you are inserting one record in BigQuery for each record in the files. If you choose, for example, Blob format, then you will have only one record per file.

code tutorial 1970-01-01 08:00:00

I am not reading the files, files that I am reading are DICOM files with .dcm extension. I just want to capture the path of the file. Even there is only file, it loops indefinitely and repeat the same data again and again.

Tlaquetzal 1970-01-01 08:00:00

How is the pipeline configured? What sources and transformations you are using to take the file and insert it into the table?

code tutorial 1970-01-01 08:00:00

Source is GCS. I gave a bucket path (Which has 20 .dcm images) and output schema has path and body. The transformation is javascript plugin (where I want to pick only path) and sink is HTTP plugin where i am posting the data.

Tlaquetzal 1970-01-01 08:00:00

During the javascript transform add a log to see if you are receiving just once the filepath. In addition, check the http return code in the post endpoint, it could be repeating because of http retries.

Related issues

Getting 403 while trying to upload to GCP signed URL that was generated with multiple metadata

CSS File Not Updating on Deploy (Google AppEngine)

ASP.NET Core 2.1 no HTTP/HTTPS redirection in App Engine

How to receive publishTime

Creating Composer Environment in existing GKE cluster

How to replace all white spaces in file names in GCP Cloud Storage?

Is GCP pub/sub region specific?

How to upload my training data into google for Tensorflow cloud training

Make GCP Function Public

How can I back fill null values in bigquery?