Warm tip: This article is reproduced from serverfault.com, please click

Resample Time Series in pandas

发布于 2020-11-28 16:26:19

I have a data frame which contains multiple records in time—specifically every 4 minutes. I want to plot the time series to get daily multiple values of that temperature. Nevertheless, the data plots every value in a single manner and not daily, as I want.

df = pd.read_csv("my_file.csv")
print (df.head())

Output

                       Temperature
Date/Time
2015-07-01 00:00:47        25.21
2015-07-01 00:01:48        25.23
2015-07-01 00:02:48        25.33
2015-07-01 00:03:47        25.22
2015-07-01 00:04:48        25.32

When I plot with seaborn I get this:

df = df.reset_index()
sns.relplot(x= "Date/Time", y="Temperature", data=df, kind="line")
plt.show()

enter image description here

This is not what I want to plot; I want to something like this example:

enter image description here

I believe that I have to resample the data, but I get the average of that day. Therefore, one single value and not multiple values for a day.

df = df.resample("H").mean()
print (df.head())

Output:

                      Temperature
Date/Time
2015-07-01 00:00:00    25.264167
2015-07-01 01:00:00    25.267167
2015-07-01 02:00:00    25.272000
2015-07-01 03:00:00    25.290167
2015-07-01 04:00:00    25.307333

enter image description here Not what I need. Can you help me?

Questioner
Geomario
Viewed
0
Diziet Asahi 2020-11-29 20:08:58

There must be a better way to bin the timestamps, but I'm drawing a blank right now.

Here is one way to do it: create a new column where you drop part of the date/time information so that all rows that fall in that timeframe share hte same value.

for ex, if you want to bin by hours:

df['Binned time'] = pd.to_datetime(df.index.strftime('%Y-%m-%d %H:00:00'))

or by days:

df['Binned time'] = pd.to_datetime(df.index.strftime('%Y-%m-%d 00:00:00'))

then use lineplot:

sns.lineplot(data=df, x='Binned time', y='data')

enter image description here