温馨提示:本文翻译自stackoverflow.com,查看原文请点击:python - pandas column transformation to get cumulative dollar amount
dataframe math pandas python

python - pandas 列转换以获得累积美元金额

发布于 2020-03-29 12:50:24

我有如下数据集。我正在尝试按地区对它们进行分组,以找到每种产品的总金额。我想扩展我的计算,以找到该地区的累计销售额以及总销售额。

资料集:

district      item       salesAmount
Arba          pen        10
Arba          pen        20
Arba          pencil     30
Arba          laptop     10000
Arba          coil       100
Arba          coil       200
Cebu          pen        100
Cebu          pen        20
Cebu          laptop     20000
Cebu          laptop     20000
Cebu          fruit      800
Cebu          oil        300

我可以按地区分组并找到每种产品的总金额,如下所示

df.groupby(['district', 'item']).agg({'salesAmount': 'sum'}) 结果如下:

district      item       salesAmount
Arba          laptop     10000
Arba          coil       300
Arba          pencil     30
Arba          pen        30
Cebu          laptop     40000
Cebu          fruit      800
Cebu          oil        300
Cebu          pen        120

我想从每个区域的最高金额到最低金额顺序订购。

然后添加以下累计和总销售额列:(按地区)

district    item    salesAmount cumsalesAmount  totaldistrictAmount
Arba        laptop  10000       10000           10360
Arba        coil    300         10300           10360
Arba        pencil  30          10330           10360
Arba        pen     30          10360           10360
Cebu        laptop  40000       40000           41220
Cebu        fruit   800         40800           41220
Cebu        oil     300         41100           41220
Cebu        pen     120         41220           41220

谢谢。

查看更多

查看更多

提问者
Lilly
被浏览
96
jezrael 2020-01-31 17:49

sum每两列的第一个汇总

print (df.dtypes)
district       object
item           object
salesAmount     int64
dtype: object

df1 = df.groupby(['district', 'item'], as_index=False)['salesAmount'].sum()

要么:

df1 = df.groupby(['district', 'item'], as_index=False).agg({'salesAmount': 'sum'})
print (df1)
  district    item  salesAmount
0     Arba    coil          300
1     Arba  laptop        10000
2     Arba     pen           30
3     Arba  pencil           30
4     Cebu   fruit          800
5     Cebu  laptop        40000
6     Cebu     oil          300
7     Cebu     pen          120

然后排序与两列DataFrame.sort_values,使用GroupBy.cumsum和最后GroupBy.transformsum

df1 = df1.sort_values(['district','salesAmount'], ascending=[True, False])
df1['cumsalesAmount'] = df1.groupby('district')['salesAmount'].cumsum()
df1['totaldistrictAmount'] = df1.groupby('district')['salesAmount'].transform('sum')
 #alternative
 #df1['totaldistrictAmount'] = df1.groupby('district')['cumsalesAmount'].transform('last')
print (df1)
  district    item  salesAmount  cumsalesAmount  totaldistrictAmount
1     Arba  laptop        10000           10000                10360
0     Arba    coil          300           10300                10360
2     Arba     pen           30           10330                10360
3     Arba  pencil           30           10360                10360
5     Cebu  laptop        40000           40000                41220
4     Cebu   fruit          800           40800                41220
6     Cebu     oil          300           41100                41220
7     Cebu     pen          120           41220                41220