Warm tip: This article is reproduced from serverfault.com, please click

其他-使用PySpark合并没有重复的Spark模式?

(其他 - Combining Spark schema without duplicates using PySpark?)

发布于 2020-11-27 21:01:06

无法将以下三项合并为一个没有重复列的最终架构。
这是下面的代码:

schema1 = StructType([StructField("A", StringType(), True),
                        StructField("B", StringType(), True)])
schema2 = StructType([StructField("c", StringType(), True),
                        StructField("B", StringType(), True)])
schema3 = StructType([StructField("D", StringType(), True),
                        StructField("A", StringType(), True)])
final=(schema1 ++ schema2 ++ schema3).distinct
print( final)
Questioner
john
Viewed
0
mck 2020-11-28 15:13:50
schema1 = StructType([StructField("A", StringType(), True),
                        StructField("B", StringType(), True)])
schema2 = StructType([StructField("c", StringType(), True),
                        StructField("B", StringType(), True)])
schema3 = StructType([StructField("D", StringType(), True),
                        StructField("A", StringType(), True)])
final = StructType(list(set(schema1.fields+schema2.fields+schema3.fields)))
print(final)

StructType(List(StructField(B,StringType,true),StructField(D,StringType,true),StructField(c,StringType,true),StructField(A,StringType,true)))