Warm tip: This article is reproduced from serverfault.com, please click

Condition Shift in Pandas

发布于 2020-12-09 07:08:41

I am trying to count up a number during a sequence change.

The number shall always be +1, when changing from the negative to the positive range.

Here the code:


data = {'a':  [-1,-1,-2,-3,4,5,6,-2,-2,-3,6,3,6,7,-1,-5,-7,1,34,5]}

df = pd.DataFrame (data)


df['p'] = df.a > 0
df['g'] = (df['p'] != df['p'].shift()).cumsum()

This is the output:

0  -1  False  1
1  -1  False  1
2  -2  False  1
3  -3  False  1
4   4   True  2
5   5   True  2
6   6   True  2
7  -2  False  3
8  -2  False  3
9  -3  False  3
10  6   True  4
11  3   True  4
12  6   True  4
13  7   True  4
14 -1  False  5

I need an output that looks like this:

0  -1  False  1
1  -1  False  1
2  -2  False  1
3  -3  False  1
4   4   True  2
5   5   True  2
6   6   True  2
7  -2  False  2
8  -2  False  2
9  -3  False  2
10  6   True  3
11  3   True  3
12  6   True  3
13  7   True  3
14 -1  False  3

Anybody got an idea?

Questioner
Christian Piazzi
Viewed
0
jezrael 2020-12-09 15:29:10

You can match mask by & for bitwise AND:

df['p'] = df.a > 0
df['g'] = ((df['p'] != df['p'].shift()) & df['p']).cumsum() + 1

Another idea is filter by mask in column p, forward filling missing values replace NaN by first group and add 1:

df['p'] = df.a > 0
df['g'] = ((df['p'] != df['p'].shift()))[df['p']].cumsum()
df['g'] = df['g'].ffill().fillna(0).astype(int) + 1

Solution with differencies, without helper p column:

df['g'] = df.a.gt(0).view('i1').diff().gt(0).cumsum().add(1)

print (df)
     a      p  g
0   -1  False  1
1   -1  False  1
2   -2  False  1
3   -3  False  1
4    4   True  2
5    5   True  2
6    6   True  2
7   -2  False  2
8   -2  False  2
9   -3  False  2
10   6   True  3
11   3   True  3
12   6   True  3
13   7   True  3
14  -1  False  3
15  -5  False  3
16  -7  False  3
17   1   True  4
18  34   True  4
19   5   True  4