pandas groupby rolling mean/median with dropping missing values

发布于 2020-03-27 10:22:54

How can get in pandas groupby rolling mean/median with dropping missing values? I.e. the output should drop missing values before calculating mean/median instead of giving me NaN if a missing value is present.

import pandas as pd
t = pd.DataFrame(data={v.date:[0,0,0,0,1,1,1,1,2,2,2,2],
                         'i0':[0,1,2,3,0,1,2,3,0,1,2,3],
                         'i1':['A']*12,
                         'x':[10.,20.,30.,np.nan,np.nan,21.,np.nan,41.,np.nan,np.nan,32.,42.]})
t.set_index([v.date,'i0','i1'], inplace=True)
t.sort_index(inplace=True)

print(t)
print(t.groupby('date').apply(lambda x: x.rolling(window=2).mean()))

gives

               x
date i0 i1      
0    0  A   10.0
     1  A   20.0
     2  A   30.0
     3  A    NaN
1    0  A    NaN
     1  A   21.0
     2  A    NaN
     3  A   41.0
2    0  A    NaN
     1  A    NaN
     2  A   32.0
     3  A   42.0

               x
date i0 i1      
0    0  A    NaN
     1  A   15.0
     2  A   25.0
     3  A    NaN
1    0  A    NaN
     1  A    NaN
     2  A    NaN
     3  A    NaN
2    0  A    NaN
     1  A    NaN
     2  A    NaN
     3  A   37.0

while I want the following for this example:

               x
date i0 i1      
0    0  A   10.0
     1  A   15.0
     2  A   25.0
     3  A   30.0
1    0  A    NaN
     1  A   21.0
     2  A   21.0
     3  A   41.0
2    0  A    NaN
     1  A    NaN
     2  A   32.0
     3  A   37.0

what I tried

t.groupby('date').apply(lambda x: x.rolling(window=2).dropna().median())

and

t.groupby('date').apply(lambda x: x.rolling(window=2).median(dropna=True))

(both raise exceptions, but maybe there exists something along the lines)

Thank you for your help!

Questioner

S.V

Viewed

126

Chinese

Original

t.groupby('date', group_keys=False).rolling(window=2, min_periods=1).mean() x date i0 i1 0 0 A 10.0 1 A 15.0 2 A 25.0 3 A 30.0 1 0 A NaN 1 A 21.0 2 A 21.0 3 A 41.0 2 0 A NaN 1 A NaN 2 A 32.0 3 A 37.0

YOBEN_S 2019-07-03 22:33:04

I feel like stack's server is acting up today , can not see the question in time :-(

cs95 2019-07-03 22:34:41

@WeNYoBen Yeah same. Btw just sae your comment. Is this a dupe? I feel like it is close but the use case for min periods is slightly different here.

YOBEN_S 2019-07-03 22:35:42

I do not think that is dup , since he have two problem , forget adding the min_periods and also rolling should not with apply

S.V 2019-07-03 22:39:22

@cs95 I use apply since there is a bug in pandas, which will only be fixed in pandas-0.25. Just look how messed up your output is if apply is not used (two 'date' index levels).

cs95 2019-07-03 22:40:41

@S.V Didn't notice that, but fixed ;)

pandas groupby rolling mean/median with dropping missing values

Related issues