pandas column transformation to get cumulative dollar amount

jezrael 2020-01-31 17:49

First aggregate sum per both columns:

print (df.dtypes)
district       object
item           object
salesAmount     int64
dtype: object

df1 = df.groupby(['district', 'item'], as_index=False)['salesAmount'].sum()

Or:

df1 = df.groupby(['district', 'item'], as_index=False).agg({'salesAmount': 'sum'})
print (df1)
  district    item  salesAmount
0     Arba    coil          300
1     Arba  laptop        10000
2     Arba     pen           30
3     Arba  pencil           30
4     Cebu   fruit          800
5     Cebu  laptop        40000
6     Cebu     oil          300
7     Cebu     pen          120

Then sort by both columns with DataFrame.sort_values, use GroupBy.cumsum and last GroupBy.transform with sum:

df1 = df1.sort_values(['district','salesAmount'], ascending=[True, False])
df1['cumsalesAmount'] = df1.groupby('district')['salesAmount'].cumsum()
df1['totaldistrictAmount'] = df1.groupby('district')['salesAmount'].transform('sum')
 #alternative
 #df1['totaldistrictAmount'] = df1.groupby('district')['cumsalesAmount'].transform('last')
print (df1)
  district    item  salesAmount  cumsalesAmount  totaldistrictAmount
1     Arba  laptop        10000           10000                10360
0     Arba    coil          300           10300                10360
2     Arba     pen           30           10330                10360
3     Arba  pencil           30           10360                10360
5     Cebu  laptop        40000           40000                41220
4     Cebu   fruit          800           40800                41220
6     Cebu     oil          300           41100                41220
7     Cebu     pen          120           41220                41220

Lilly 2020-01-31 16:28:19

Thanks. Your first line of code gives me error as below: TypeError: groupby() got an unexpected keyword argument 'level'

jezrael 2020-01-31 16:30:44

@Lilly - What is print (df.info()) before my solution?

Lilly 2020-01-31 16:32:05

district 19551 non-null object item 19551 non-null object salesAmount 19551 non-null object

jezrael 2020-01-31 16:34:11

@Lilly - I think df1 = df.groupby(['district', 'item']).agg({'salesAmount': 'sum'}) is necessary change to df1 = df.groupby(['district', 'item'], as_index=False).agg({'salesAmount': 'sum'})

Lilly 2020-01-31 18:02:35

Shoot... Actually my mistake.. I am working on pysaprk df and used Kolas to convert pyspark df to kolas_df and trying to use pandas functions. The issue with Kolas is transform function is not available. Hence I converted my pyspark df to pandas and now your code works. Should be careful while using Koalas. Thanks for all your inputs and really appreciate your time.

Related issues

How to use python cut method to create bins, accept one parameter and return appropriate bin?

Create a dictionary from a list of lists with certain criteria

selecting columns based on row value, Python, Pandas

plotting count of zeros and ones in a dataframe

BeautifulSoup find.all() web scraping returns empty

python function. output a keys list from a dictionary if the key is todays date

Best way to perform multiple amount of Pandas lookups between two DataFrames

How to get the number of columns and the width of each column in a Pandas pivot table?

Display a column when a desired value is missing while grouping in Pandas dataframe

Python hide ticks but show tick labels