DATABRICKS DBFS - 糯米PHP

Eva 2019-02-25 20:57

I have experience with DBFS, it is a great storage which is holding data which you can upload from your local computer using DBFS CLI! The CLI setup a bit tricky, but when you manage, you can easily move whole folders around in this environment (remember using -overwrite! )

create folders
upload files
modify, remove files and folders

With Scala you can easily pull in the data you store in this storage with a code like this:

val df1 = spark
      .read
      .format("csv")
      .option("header", "true")
      .option("inferSchema", "true")
      .load("dbfs:/foldername/test.csv")
      .select(some_column_name)

Or read in the whole folder to process all csv the files available:

val df1 = spark
      .read
      .format("csv")
      .option("header", "true")
      .option("inferSchema", "true")
      .load("dbfs:/foldername/*.csv")
      .select(some_column_name)

I think it is easy to use and learn, I hope you find this info helpful!

Billy B 2019-02-26 21:12:45

Thanks for that Eva, that is really helpful, appreciate the time and effort you took to elaborate on that

Related issues

Can`t deploy docker-compose infrastructure to Azure Container Instances

Which .NET Azure Service Bus library should I use for queues?

get messages from backend service app from device twins in microsoft azure

how to retrieve the query stats for a cosmosdb through rest api exposed?

Blob file upload

Error on loading request B2C login screen on Blazor WebAssembly app

Does not found Listblob() in CloudBlobContainer

Xamarin forms

Avoid severity-level-0 logging in application insights from function app

Who can create?