Warm tip: This article is reproduced from stackoverflow.com, please click
pandas python

Pandas groupby with aggregation

发布于 2020-03-30 21:16:29

Starting from my previous question: Get grouped informations from an array with Pandas

I have a Dataset structured like this and I want to get this information with pandas: for each day (so grouped day by day) get the second value of "Open", second-last value of "Close", Highest value of "High" and Lower value of "Low", and sum of Volume.

"Date","Time","Open","High","Low","Close","Up","Down","Volume"
01/03/2000,00:05,1481.50,1481.50,1481.00,1481.00,2,0,0.00
01/03/2000,00:10,1480.75,1480.75,1480.75,1480.75,1,0,1.00
01/03/2000,00:20,1480.50,1480.50,1480.50,1480.50,1,0,1.00
[...]
03/01/2018,11:05,2717.25,2718.00,2708.50,2709.25,9935,15371,25306.00
03/01/2018,11:10,2709.25,2711.75,2706.50,2709.50,8388,8234,16622.00
03/01/2018,11:15,2709.25,2711.50,2708.25,2709.50,4738,4703,9441.00
03/01/2018,11:20,2709.25,2709.50,2706.00,2707.25,3609,4685,8294.00

In my previous question an user suggest me to use this:

df.groupby('Date').agg({
'Close': 'last',
'Open': 'first',
'High': 'max',
'Low': 'min',
'Volume': 'sum'

})

But now I want to take the second element for Open and second-last for Close. How can I do this?

Questioner
Steve
Viewed
15
jezrael 2020-01-31 19:47

You can create custom functions, only necessary specify output if second val not exist, e.g. NaN or first values x.iat[0]:

def second(x):
    return x.iat[1] if len(x) > 1 else np.nan

def secondLast(x):
    return x.iat[-2] if len(x) > 1 else np.nan

df1 = df.groupby('Date').agg({
'Close': secondLast,
'Open': second,
'High': 'max',
'Low': 'min',
'Volume': 'sum'

})

print (df1)
              Close     Open    High     Low   Volume
Date                                                 
01/03/2000  1480.75  1480.75  1481.5  1480.5      2.0
03/01/2018  2709.50  2709.25  2718.0  2706.0  59663.0