Warm tip: This article is reproduced from stackoverflow.com, please click
apache-spark scala rdd

How to read from a csv file to create a scala Map object?

发布于 2020-03-27 10:28:06

I have a path to a csv I'd like to read from. This csv includes three columns: "topic, key, value" I am using spark to read this file as a csv file. The file looks like the following(lookupFile.csv):

Topic,Key,Value
fruit,aaa,apple
fruit,bbb,orange
animal,ccc,cat
animal,ddd,dog

//I'm reading the file as follows
val lookup = SparkSession.read.option("delimeter", ",").option("header", "true").csv(lookupFile)

I'd like to take what I just read and return a map that has the following properties:

  • The map uses the topic as a key
  • The value of this map is a map of the "Key" and "Value" columns

My hope is that I would get a map that looks like the following:

val result = Map("fruit" -> Map("aaa" -> "apple", "bbb" -> "orange"),
                 "animal" -> Map("ccc" -> "cat", "ddd" -> "dog"))

Any ideas on how I can do this?

Questioner
jjaguirre394
Viewed
292
mikeL 2019-07-03 23:21

read in your data

val df1= spark.read.format("csv").option("inferSchema", "true").option("header", "true").load(path)

first put "key,value" into and array and groupBy Topic to get your target separted into a key part and a value part.

val df2= df.groupBy("Topic").agg(collect_list(array($"Key",$"Value")).as("arr"))

now convert to dataset

val ds= df2.as[(String,Seq[Seq[String]])]

apply logic on the fields to get your map of maps and collect

val ds1 =ds.map(x=> (x._1,x._2.map(y=> (y(0),y(1))).toMap)).collect

now you data is set up with the Topic as a key and "key,value" as a Value, so now apply Map to get your result

ds1.toMap

Map(animal -> Map(ccc -> cat, ddd -> dog), fruit -> Map(aaa -> apple, bbb -> orange))