Warm tip: This article is reproduced from serverfault.com, please click

Connecting to BigQuery from Rstudio running on a Dataproc cluster

发布于 2020-11-27 17:49:12

I created a Dataproc cluster and launched RStudio Server successfully using the instructions below: https://cloud.google.com/solutions/running-rstudio-server-on-a-cloud-dataproc-cluster

I also installed sparklyr and created a Spark instance successfully.

sc <- spark_connect(master = "local")

However, I am wondering how I can connect to BigQuery. There is a sparkbq library but I am not sure how I can pass the bigquery jar connector (in runtime) that is described here: https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-spark-example

Questioner
denim
Viewed
0
Gaurangi Saxena 2020-12-01 02:42:36

You can use Dataproc init actions to install spark-bigquery connector on all the nodes of your cluster. https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors.

You may have to recreate the cluster with updated init actions and launch RStudio Server again. If you don't wish to do that and your cluster is small, you could also ssh into the nodes and download SparkBigQuery-connector jar manually.