Warm tip: This article is reproduced from stackoverflow.com, please click
dataframe pandas time-series multi-index

Select last row from each column of multi-index Pandas DataFrame based on time, when columns are une

发布于 2020-03-30 21:12:32

I have the following Pandas multi-index DataFrame with the top level index being a group ID and the second level index being when, in ISO 8601 time format (shown here without the time):

                                     value      weight
                         when                     
5e33c4bb4265514aab106a1a 2011-05-12   1.34       0.79
                         2011-05-07   1.22       0.83
                         2011-05-03   2.94       0.25
                         2011-04-28   1.78       0.89
                         2011-04-22   1.35       0.92
...                                    ...        ...
5e33c514392b77d517961f06 2009-01-31  30.75       0.12
                         2009-01-24  30.50       0.21
                         2009-01-23  29.50       0.96
                         2009-01-10  28.50       0.98
                         2008-12-08  28.50       0.65

when is currently defined as an index but this is not a requirement.

Assertions

  1. when may be non-unique.
  2. Columns may be of unequal length across groups
  3. Within groups when, value and weight will always be of equal length (for each when there will always be a value and a weight

Question

Using the parameter index_time, how do you retrieve:

  1. The most recent past value and weight from each group relative to index_time along with the difference (in seconds) between index_time and when.
  2. index_time may be a time in the past such that only entries where when <= index_time are selected.
  3. The result should be indexed in some way so that the group id of each result can be deduced

Example

From the above, if the index_time was 2011-05-10 then the result should be:

                          value     weight      age                     
5e33c4bb4265514aab106a1a   1.22      0.83      259200
5e33c514392b77d517961f06  30.75      0.12    72576000
Questioner
Tintin
Viewed
28
Tintin 2020-02-01 11:58

Where original DataFrame given in the question is df:

import pandas as pd

df.sort_index(inplace=True)
result = df.loc[pd.IndexSlice[:, :when], :].groupby('id').tail(1)
result['age'] =  when - result.index.get_level_values(level=1)