温馨提示:本文翻译自stackoverflow.com，查看原文请点击：scala - How to compute cumulative sum on multiple float columns?

apache-spark apache-spark-sql scala

scala - 如何计算多个浮点列的累加和？

发布于 2020-04-09 10:01:19

我在Dataframe中有100个按日期排序的float列。

ID   Date         C1       C2 ....... C100
1     02/06/2019   32.09  45.06         99
1     02/04/2019   32.09  45.06         99
2     02/03/2019   32.09  45.06         99
2     05/07/2019   32.09  45.06         99

我需要根据ID和日期将C1转换为C100。

目标数据框应如下所示：

ID   Date         C1       C2 ....... C100
1     02/04/2019   32.09  45.06         99
1     02/06/2019   64.18  90.12         198
2     02/03/2019   32.09  45.06         99
2     05/07/2019   64.18  90.12         198

我想实现这一点而无需从C1-C100循环。

一栏的初始代码：

var DF1 =  DF.withColumn("CumSum_c1", sum("C1").over(
         Window.partitionBy("ID")
        .orderBy(col("date").asc)))

我在这里找到了类似的问题，但他手动对两列进行了处理：Spark中的累计和

提问者

Vikrant

被浏览

173

查看英文版

查看原文

blackbishop 2020-01-31 22:38

这是使用简单选择表达式的另一种方式：

val w = Window.partitionBy($"id").orderBy($"date".asc).rowsBetween(Window.unboundedPreceding, Window.currentRow) 

// get columns you want to sum
val columnsToSum = df.drop("ID", "Date").columns

// map over those columns and create new sum columns
val selectExpr = Seq(col("ID"), col("Date")) ++ columnsToSum.map(c => sum(col(c)).over(w).alias(c)).toSeq

df.select(selectExpr:_*).show()

给出：

+---+----------+-----+-----+----+                                               
| ID|      Date|   C1|   C2|C100|
+---+----------+-----+-----+----+
|  1|02/04/2019|32.09|45.06|  99|
|  1|02/06/2019|64.18|90.12| 198|
|  2|02/03/2019|32.09|45.06|  99|
|  2|05/07/2019|64.18|90.12| 198|
+---+----------+-----+-----+----+

相关问题

1

在Spark SQL中应用to_date（）和add_months函数时出错

2

使用ArrayType列将UDF重写为pandas udf

3

如何在Spark集群上运行C算法？

4

Delta Lake的VACUUM操作是否需要始终启用Databricks集群？

5

将值添加到Spark DataFrame列中的现有嵌套json中

6

使用PySpark合并没有重复的Spark模式？

7

Pyspark：提取数据框的行，其中值包含字符串

8

具有窗口功能的PySpark数据偏度

9

如何在Apache-Spark中连接主机和从机？

10

AWS EMR上的spark-submit运行但在访问S3时失败

热门github

1

real time face swap and one-click video deepfake with only a single image

2

A quick example of how one can "synchronize" a 3d scene across multiple windows using three.js and localStorage

3

ChatGPT DAN, Jailbreaks prompt

4

21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/ (翻译：12 节课程，开始使用生成式 AI 进行构建)

5

Curated list of project-based tutorials (翻译：收藏了基于项目的教程列表)

6

Truly independent web browser

7

Python - 100天从新手到大师

8

An open source payments switch written in Rust to make payments fast, reliable and affordable (翻译：YOLOv8 🚀 in PyTorch > ONNX > CoreML > TFLite)

9

Agent S: an open agentic framework that uses computers like a human

10

Master programming by recreating your favorite technologies from scratch. (翻译：在这个项目中，你能学会如何创造自己的各种工具，引擎，游戏，框架，库......)

11

Jelly Evolution Simulator

12

Collection of leaked system prompts

13

🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application. (翻译：LobeChat 是开源的高性能聊天机器人框架，支持语音合成、多模态、可扩展的（Function Call）插件系统。)