Warm tip: This article is reproduced from serverfault.com, please click

python-在Pandas数据框中分组时缺少所需值时显示一列

(python - Display a column when a desired value is missing while grouping in Pandas dataframe)

发布于 2020-11-27 23:26:27

早上好,我有一个包含区域,客户和某些交货的数据框。有作为本专栏购买的类型以及第一最后购买被标记为“第一”和“最后”,有时我们在两者之间交付标记为“交货”我需要标志客户和区域没有任何在两者之间,在所有交付,如所期望的输出列。连续标记中间交付并不难,但是需要标记整个客户

    import pandas as pd  
    data = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
           ['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
           ['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['FL', 'A','DELIVERY', 12], ['FL', 'A','DELIVERY', 25], ['FL', 'A','LAST', 20],
           ['FL', 'C','FIRST', 15], ['FL', 'C','LAST', 10],
           ['FL', 'D','FIRST', 10], ['FL', 'D','DELIVERY', 20], ['FL', 'D','LAST', 30],
           ['FL', 'E','FIRST', 20], ['FL', 'E','LAST', 20]
           ] 
      
    # Create the pandas DataFrame 
    df = pd.DataFrame(data, columns = ['region', 'customer', 'purchaseType', 'price']) 
      
    # print dataframe. 
    df

打印:

   region customer purchaseType  price
0      NY        A        FIRST     10
1      NY        A     DELIVERY     20
2      NY        A     DELIVERY     30
3      NY        A         LAST     25
4      NY        B        FIRST     15
5      NY        B     DELIVERY     10
6      NY        B         LAST     20
7      FL        A        FIRST     15
8      FL        A     DELIVERY     10
9      FL        A     DELIVERY     12
10     FL        A     DELIVERY     25
11     FL        A         LAST     20
12     FL        C        FIRST     15
13     FL        C         LAST     10
14     FL        D        FIRST     10
15     FL        D     DELIVERY     20
16     FL        D         LAST     30
17     FL        E        FIRST     20
18     FL        E         LAST     20

所需的输出:

   region customer purchaseType  price noDeliveryFlag
0      NY        A        FIRST     10              0
1      NY        A     DELIVERY     20              0
2      NY        A     DELIVERY     30              0
3      NY        A         LAST     25              0
4      NY        B        FIRST     15              0
5      NY        B     DELIVERY     10              0
6      NY        B         LAST     20              0
7      FL        A        FIRST     15              0
8      FL        A     DELIVERY     10              0
9      FL        A     DELIVERY     12              0
10     FL        A     DELIVERY     25              0
11     FL        A         LAST     20              0
12     FL        C        FIRST     15              1
13     FL        C         LAST     10              1
14     FL        D        FIRST     10              0
15     FL        D     DELIVERY     20              0
16     FL        D         LAST     30              0
17     FL        E        FIRST     20              1
18     FL        E         LAST     20              1

非常感谢!

Questioner
Mauro Del Nook
Viewed
22
Mauro Del Nook 2020-11-28 08:02:46

我想我明白了

df['noDeliveryFlag'] = df['purchaseType'] != 'DELIVERY'
df['noDeliveryFlag'] = df.groupby(['region','customer'])['noDeliveryFlag'].transform('min').astype(int)
print(df)

如果有人有更有效的方法,我将不胜感激。