Warm tip: This article is reproduced from stackoverflow.com, please click

dataframe pandas python

calculate percentage based on specific column value

发布于 2020-03-31 22:59:50

I want to calculate percentage for each row. Below is an example dataframe:

    KEY  DESCR  counts
0   2    to A   1
1   2    to B   1
2   20   to C   1
3   35   to D   2
4   110  to E   4
5   110  to F   1
6   110  to G   1

percentage formula is: (counts / sum of counts.indicator on column KEY)*100
Example: (1/2)*100

below is a stuck code since i try many times but not happen.

percentage = []

for i in range(len(df)):
    percentage.append((df['counts'][i] / ...............) * 100) 

df['PERCENTAGE'] = percentage 
df

Expected output is:

    KEY  DESCR  counts  PERCENTAGE
0   2    to A   1       50
1   2    to B   1       50
2   20   to C   1       100
3   35   to A   2       100
4   110  to E   4       67
5   110  to C   1       16
6   110  to G   1       16

Can anyone help me to solve this. Thank you

Questioner

bieha

Viewed

15

Chinese

Original

jezrael 2020-01-31 20:00

If performance is important use GroupBy.transform with sum and division original column by Series.div, last multiple by Series.mul:

df['PERCENTAGE'] = df['counts'].div(df.groupby('KEY')['counts'].transform('sum')).mul(100)

You can divide each value per groups, but if large DataFrame or many groups is is less effective:

df['PERCENTAGE'] = df.groupby('KEY')['counts'].transform(lambda x: x / x.sum()).mul(100)

print (df)
   KEY DESCR  counts  PERCENTAGE
0    2  to A       1   50.000000
1    2  to B       1   50.000000
2   20  to C       1  100.000000
3   35  to D       2  100.000000
4  110  to E       4   66.666667
5  110  to F       1   16.666667
6  110  to G       1   16.666667

koalaok 2020-01-31 20:32:54

.apply(math.floor) ?

jezrael 2020-01-31 20:33:34

@koalaok or np.floor(df.groupby('KEY')['counts'].transform(lambda x: x / x.sum()).mul(100))

koalaok 2020-01-31 20:55:17

Is np faster , more pythonic or just an alternative?

jezrael 2020-01-31 20:57:42

@koalaok - it is faster, because numpy is faster like pure python loops

Related issues

1

How to use python cut method to create bins, accept one parameter and return appropriate bin?

2

Create a dictionary from a list of lists with certain criteria

3

selecting columns based on row value, Python, Pandas

4

plotting count of zeros and ones in a dataframe

5

BeautifulSoup find.all() web scraping returns empty

6

python function. output a keys list from a dictionary if the key is todays date

7

Best way to perform multiple amount of Pandas lookups between two DataFrames

8

How to get the number of columns and the width of each column in a Pandas pivot table?

9

Display a column when a desired value is missing while grouping in Pandas dataframe

10

Python hide ticks but show tick labels