Warm tip: This article is reproduced from stackoverflow.com, please click
dataframe math pandas python

pandas column transformation to get cumulative dollar amount

发布于 2020-03-29 12:45:55

I have dataset as below. I am trying to group them by district and find the total amount for each product. I want to extend my calculations to find the cumulative sales amount as well as adding total sales amount in that district.

Dataset:

district      item       salesAmount
Arba          pen        10
Arba          pen        20
Arba          pencil     30
Arba          laptop     10000
Arba          coil       100
Arba          coil       200
Cebu          pen        100
Cebu          pen        20
Cebu          laptop     20000
Cebu          laptop     20000
Cebu          fruit      800
Cebu          oil        300

I could group by district and find the total amount for each product as below

df.groupby(['district', 'item']).agg({'salesAmount': 'sum'}) Results as below:

district      item       salesAmount
Arba          laptop     10000
Arba          coil       300
Arba          pencil     30
Arba          pen        30
Cebu          laptop     40000
Cebu          fruit      800
Cebu          oil        300
Cebu          pen        120

I want to order from highest amount to lowest amount for each district first.

Then add cumulative and total sales amount column as below:(per district)

district    item    salesAmount cumsalesAmount  totaldistrictAmount
Arba        laptop  10000       10000           10360
Arba        coil    300         10300           10360
Arba        pencil  30          10330           10360
Arba        pen     30          10360           10360
Cebu        laptop  40000       40000           41220
Cebu        fruit   800         40800           41220
Cebu        oil     300         41100           41220
Cebu        pen     120         41220           41220

Thanks.

Questioner
Lilly
Viewed
51
jezrael 2020-01-31 17:49

First aggregate sum per both columns:

print (df.dtypes)
district       object
item           object
salesAmount     int64
dtype: object

df1 = df.groupby(['district', 'item'], as_index=False)['salesAmount'].sum()

Or:

df1 = df.groupby(['district', 'item'], as_index=False).agg({'salesAmount': 'sum'})
print (df1)
  district    item  salesAmount
0     Arba    coil          300
1     Arba  laptop        10000
2     Arba     pen           30
3     Arba  pencil           30
4     Cebu   fruit          800
5     Cebu  laptop        40000
6     Cebu     oil          300
7     Cebu     pen          120

Then sort by both columns with DataFrame.sort_values, use GroupBy.cumsum and last GroupBy.transform with sum:

df1 = df1.sort_values(['district','salesAmount'], ascending=[True, False])
df1['cumsalesAmount'] = df1.groupby('district')['salesAmount'].cumsum()
df1['totaldistrictAmount'] = df1.groupby('district')['salesAmount'].transform('sum')
 #alternative
 #df1['totaldistrictAmount'] = df1.groupby('district')['cumsalesAmount'].transform('last')
print (df1)
  district    item  salesAmount  cumsalesAmount  totaldistrictAmount
1     Arba  laptop        10000           10000                10360
0     Arba    coil          300           10300                10360
2     Arba     pen           30           10330                10360
3     Arba  pencil           30           10360                10360
5     Cebu  laptop        40000           40000                41220
4     Cebu   fruit          800           40800                41220
6     Cebu     oil          300           41100                41220
7     Cebu     pen          120           41220                41220