我想计算每一行的百分比。下面是一个示例数据框:
KEY DESCR counts
0 2 to A 1
1 2 to B 1
2 20 to C 1
3 35 to D 2
4 110 to E 4
5 110 to F 1
6 110 to G 1
百分比公式为:(计数/ KEY列上的counts.indicator之和)* 100
示例:
由于我尝试了很多次但没有发生,下面的(1/2)* 100 是卡住的代码。
percentage = []
for i in range(len(df)):
percentage.append((df['counts'][i] / ...............) * 100)
df['PERCENTAGE'] = percentage
df
预期输出为:
KEY DESCR counts PERCENTAGE
0 2 to A 1 50
1 2 to B 1 50
2 20 to C 1 100
3 35 to A 2 100
4 110 to E 4 67
5 110 to C 1 16
6 110 to G 1 16
谁能帮我解决这个问题。谢谢
如果性能是重要的用途GroupBy.transform
与sum
和除以原始列Series.div
,最后通过多次Series.mul
:
df['PERCENTAGE'] = df['counts'].div(df.groupby('KEY')['counts'].transform('sum')).mul(100)
您可以按组划分每个值,但是如果较大的DataFrame或许多组效果不佳:
df['PERCENTAGE'] = df.groupby('KEY')['counts'].transform(lambda x: x / x.sum()).mul(100)
print (df)
KEY DESCR counts PERCENTAGE
0 2 to A 1 50.000000
1 2 to B 1 50.000000
2 20 to C 1 100.000000
3 35 to D 2 100.000000
4 110 to E 4 66.666667
5 110 to F 1 16.666667
6 110 to G 1 16.666667
.apply(math.floor)吗?
@koalaok或
np.floor(df.groupby('KEY')['counts'].transform(lambda x: x / x.sum()).mul(100))
是np更快,更pythonic还是只是替代品?
@koalaok-它更快,因为numpy像纯python循环一样快