My goal is to create columns from another MapType
column. The names of the columns being the keys of the Map and their associated values.
Below my starting dataframe:
+-----------+---------------------------+
|id | mapColumn |
+-----------+---------------------------+
| 1 |Map(keyA -> 0, keyB -> 1) |
| 2 |Map(keyA -> 4, keyB -> 2) |
+-----------+---------------------------+
Below the desired output:
+-----------+----+----+
|id |keyA|keyB|
+-----------+----+----+
| 1 | 0| 1|
| 2 | 4| 2|
+-----------+----+----+
I found a solution whith a Foldleft with accumulators (work but extremely slow):
val colsToAdd = startDF.collect()(0)(1).asInstanceOf[Map[String,Integer]].map(x => x._1).toSeq
res1: Seq[String] = List(keyA, keyB)
val endDF = colsToAdd.foldLeft(startDF)((startDF, key) => startDF.withColumn(key, lit(0)))
//(lit(0) for testing)
The real starting dataframe being enormous, I need optimization.
You could simply use explode
function to explode the map type column and then use pivot
to get each key as new column. Something like this:
val df = Seq((1,Map("keyA" -> 0, "keyB" -> 1)), (2,Map("keyA" -> 4, "keyB" -> 2))
).toDF("id", "mapColumn")
df.select($"id", explode($"mapColumn"))
.groupBy($"id")
.pivot($"key")
.agg(first($"value"))
.show()
Gives:
+---+----+----+
| id|keyA|keyB|
+---+----+----+
| 1| 0| 1|
| 2| 4| 2|
+---+----+----+
nice, I forgot explode will create keys/values col by default