Warm tip: This article is reproduced from serverfault.com, please click

scala project export files

发布于 2020-11-28 23:50:06

I have the below query

val CubeData = spark.sql (""" SELECT gender, department, count(bibno) AS count FROM borrowersTable, loansTable  WHERE borrowersTable.bid = loansTable.bid GROUP BY gender,department WITH CUBE ORDER BY gender,department """) 

And i want to export 4 files with specific data and names.

File1 consist of gender and departments and the name of file is geneder_departments File2 gender,null name of file is gender_null File3 departments,null name of file is departments_null File4 null,null name of file is null_null theses files are results from sql query (with cube)

i try the below

val df1 = CubeData.withColumn("combination",concat(col("gender") ,lit(","), col("department")))
df1.coalesce(1).write.partitionBy("combination").format("csv").option("header", "true").mode("overwrite").save("final")

but i took more than 4 files - combination of gender - departments. Also names of those files are random. Is it possible to choose the name of those files?

Questioner
nubie
Viewed
0
mck 2020-11-29 15:46:07

Perhaps it's a bug in Spark, I don't see any problem in your query, but the query below seems to work. You don't need to specify table names if they are unique columns.

val CubeData = spark.sql ("""
SELECT gender, department, count(bibno) AS count
FROM borrowersTable
JOIN loansTable USING(bid)
GROUP BY gender, department WITH CUBE
ORDER BY gender, department
""")

But there seems to be some problems in your file parsing, try this instead:

val borrowersDF = spark.read.format("csv").option("delimiter", "|").option("header", "True").option("inferSchema", "True").load("BORROWERS.txt")
borrowersDF.createOrReplaceTempView("borrowersTable")
val loansDF = spark.read.format("csv").option("delimiter", "|").option("header", "True").option("inferSchema", "True").load("LOANS.txt")
loansDF.createOrReplaceTempView("loansTable")

val CubeData = spark.sql ("""
SELECT gender, department, count(bibno) AS count
FROM borrowersTable
JOIN loansTable USING(bid)
GROUP BY gender, department WITH CUBE
ORDER BY gender, department
""")