I want to calculate percentage for each row. Below is an example dataframe:
KEY DESCR counts
0 2 to A 1
1 2 to B 1
2 20 to C 1
3 35 to D 2
4 110 to E 4
5 110 to F 1
6 110 to G 1
percentage formula is: (counts / sum of counts.indicator on column KEY)*100
Example: (1/2)*100
below is a stuck code since i try many times but not happen.
percentage = []
for i in range(len(df)):
percentage.append((df['counts'][i] / ...............) * 100)
df['PERCENTAGE'] = percentage
df
Expected output is:
KEY DESCR counts PERCENTAGE
0 2 to A 1 50
1 2 to B 1 50
2 20 to C 1 100
3 35 to A 2 100
4 110 to E 4 67
5 110 to C 1 16
6 110 to G 1 16
Can anyone help me to solve this. Thank you
If performance is important use GroupBy.transform
with sum
and division original column by Series.div
, last multiple by Series.mul
:
df['PERCENTAGE'] = df['counts'].div(df.groupby('KEY')['counts'].transform('sum')).mul(100)
You can divide each value per groups, but if large DataFrame or many groups is is less effective:
df['PERCENTAGE'] = df.groupby('KEY')['counts'].transform(lambda x: x / x.sum()).mul(100)
print (df)
KEY DESCR counts PERCENTAGE
0 2 to A 1 50.000000
1 2 to B 1 50.000000
2 20 to C 1 100.000000
3 35 to D 2 100.000000
4 110 to E 4 66.666667
5 110 to F 1 16.666667
6 110 to G 1 16.666667
.apply(math.floor) ?
@koalaok or
np.floor(df.groupby('KEY')['counts'].transform(lambda x: x / x.sum()).mul(100))
Is np faster , more pythonic or just an alternative?
@koalaok - it is faster, because numpy is faster like pure python loops