How to create multiples columns from a MapType columns efficiently (without foldleft)

发布于 2020-04-05 23:39:17

My goal is to create columns from another MapType column. The names of the columns being the keys of the Map and their associated values.

Below my starting dataframe:

+-----------+---------------------------+
|id         |         mapColumn         |
+-----------+---------------------------+
| 1         |Map(keyA -> 0, keyB -> 1)  |
| 2         |Map(keyA -> 4, keyB -> 2)  |
+-----------+---------------------------+

Below the desired output:

+-----------+----+----+
|id         |keyA|keyB|
+-----------+----+----+
| 1         |   0|   1|
| 2         |   4|   2|
+-----------+----+----+

I found a solution whith a Foldleft with accumulators (work but extremely slow):

val colsToAdd = startDF.collect()(0)(1).asInstanceOf[Map[String,Integer]].map(x => x._1).toSeq
res1: Seq[String] = List(keyA, keyB)

val endDF = colsToAdd.foldLeft(startDF)((startDF, key) => startDF.withColumn(key, lit(0)))

//(lit(0) for testing)

The real starting dataframe being enormous, I need optimization.

Questioner

Jalil Mankouri

Viewed

Chinese

Original

val df = Seq((1,Map("keyA" -> 0, "keyB" -> 1)), (2,Map("keyA" -> 4, "keyB" -> 2)) ).toDF("id", "mapColumn") df.select($"id", explode($"mapColumn")) .groupBy($"id") .pivot($"key") .agg(first($"value")) .show()

How to create multiples columns from a MapType columns efficiently (without foldleft)

Related issues