How to overwrite/update a collection in Azure Cosmos DB from Databrick/PySpark

发布于 2020-04-11 22:36:38

I have the following PySpark code written on Databricks Notebook that sucessfully saves the results from the sparkSQL to Azure Cosmos DB with the line of code:

df.write.format("com.microsoft.azure.cosmosdb.spark").mode("overwrite").options(**writeConfig3).save()

The complete code is as follows:

test = spark.sql("""SELECT
  Sales.CustomerID AS pattersonID1
 ,Sales.InvoiceNumber AS myinvoicenr1
FROM Sales
limit 4""")


## my personal cosmos DB
writeConfig3 = {
    "Endpoint": "https://<cosmosdb-account>.documents.azure.com:443/",
    "Masterkey": "<key>==",
    "Database": "mydatabase",
    "Collection": "mycontainer",
    "Upsert": "true"
}

df = test.coalesce(1)

df.write.format("com.microsoft.azure.cosmosdb.spark").mode("overwrite").options(**writeConfig3).save()

Using the above code I have successfully written to my Cosmos DB database (mydatabase) and collection (mycontainer)

When I try to overwrite the container with by changing SparkSQL with the following(just changing pattersonID1 to pattersonID2, and myinvoicenr1 to myinvoicenr2

test = spark.sql("""SELECT
  Sales.CustomerID AS pattersonID2
 ,Sales.InvoiceNumber AS myinvoicenr2
FROM Sales
limit 4""")

Instead overwriting/updating the collection with the new query Cosmos DB appends the container as follows:

And still leaves the original query in the collection:

Is there a way to completely overwrite or update cosmos DB?

Questioner

Carltonp

Viewed

Chinese

Original

How to overwrite/update a collection in Azure Cosmos DB from Databrick/PySpark

Related issues