Warm tip: This article is reproduced from serverfault.com, please click

apache spark-将pyspark数据框转换为python词典列表

(apache spark - Convert pyspark dataframe into list of python dictionaries)

发布于 2020-11-29 12:19:06

嗨,我是pyspark的新手,正在尝试将pyspark.sql.dataframe转换为词典列表。

以下是我的数据框,类型为<class'pyspark.sql.dataframe.DataFrame'>>:

+------------------+----------+------------------------+
|             title|imdb_score|Worldwide_Gross(dollars)|
+------------------+----------+------------------------+
| The Eight Hundred|       7.2|               460699653|
| Bad Boys for Life|       6.6|               426505244|
|             Tenet|       7.8|               334000000|
|Sonic the Hedgehog|       6.5|               308439401|
|          Dolittle|       5.6|               245229088|
+------------------+----------+------------------------+

我想将其转换为:

[{"title":"The Eight Hundred", "imdb_score":7.2, "Worldwide_Gross(dollars)":460699653},
 {"title":"Bad Boys for Life", "imdb_score":6.6, "Worldwide_Gross(dollars)":426505244},
 {"title":"Tenet", "imdb_score":7.8, "Worldwide_Gross(dollars)":334000000},
 {"title":"Sonic the Hedgehog", "imdb_score":6.5, "Worldwide_Gross(dollars)":308439401},
 {"title":"Dolittle", "imdb_score":5.6, "Worldwide_Gross(dollars)":245229088}]

我应该怎么做?提前致谢!

Questioner
Lydia Wen
Viewed
0
mck 2020-11-29 20:38:33

你可以将每一行映射到字典中并收集结果:

df.rdd.map(lambda row: row.asDict()).collect()