Warm tip: This article is reproduced from stackoverflow.com, please click
dataframe pandas python-3.x

Pandas creating and filling new columns based on other columns

发布于 2020-03-27 10:18:27

I have a pandas dataframe with Time and values columns. I am trying to create two new columns 'START_TIME" and 'END_TIME'. It is medication related data and it is stored poorly in the database so I am trying to transform the table. In this case, the medication for a patient started at 2018-11-07 23:59:32 with a dose value of 80.o so, I want to capture that as the start time of the medication and end time is the first zero after the last value. That would be one round of medication. Whenever a new value starts it is considered as the second round of medication and I'd like to capture the start time and end time in the following way as explained earlier.

Time                          Values
2018-11-07 23:59:32            80.0
2018-11-08 04:35:09            80.0
2018-11-08 05:31:24            40.0
2018-11-24 18:29:30             0.0
2018-11-24 18:33:14             0.0
2018-11-26 17:39:31            20.0
2018-11-26 18:51:07            20.0
2018-11-26 21:04:35             0.0
2018-11-26 21:05:20             0.0
2018-11-26 21:13:44             0.0
2018-11-26 21:25:57             0.0
2018-11-29 02:19:57             7.0
2018-12-09 16:02:06             5.0
2018-12-09 16:33:03             2.5
2018-12-09 21:02:10             0.0

I believe it cannot be done with a simple for and if loop as I started with a simple step and it failed

for i in df['Values']:
    if i+1 != 0:
        df['START_TIME'] = df['TIME'].copy()

Expected DataFrame:

Time                     Values   START_TIME              END_TIME
2018-11-07 23:59:32       80.0    2018-11-07 23:59:32 
2018-11-08 04:35:09       80.0
2018-11-08 05:31:24       40.0
2018-11-24 18:29:30        0.0                            2018-11-24 18:29:30
2018-11-24 18:33:14        0.0
2018-11-26 17:39:31       20.0    2018-11-26 17:39:31
2018-11-26 18:51:07       20.0
2018-11-26 21:04:35        0.0                            2018-11-26 21:04:35
2018-11-26 21:05:20        0.0
2018-11-26 21:13:44        0.0
2018-11-26 21:25:57        0.0
2018-11-29 02:19:57        7.0    2018-11-29 02:19:57
2018-12-09 16:02:06        5.0
2018-12-09 16:33:03        2.5
2018-12-09 21:02:10        0.0                            2018-12-09 21:02:10

I'd really appreciate if I can get some help.

Questioner
akash bachu
Viewed
96
YOBEN_S 2019-07-03 22:09

IIUC create the condition by using diff, then the value equal to -1 and 1 will be the end and start point

s=df.Values.eq(0).astype(int).diff().fillna(-1)
df.loc[s==-1,'START_TIME']=df.Time
df.loc[s==1,'END_TIME']=df.Time
df
Out[334]: 
                  Time  Values          START_TIME            END_TIME
0  2018-11-07 23:59:32    80.0 2018-11-07 23:59:32                 NaT
1  2018-11-08 04:35:09    80.0                 NaT                 NaT
2  2018-11-08 05:31:24    40.0                 NaT                 NaT
3  2018-11-24 18:29:30     0.0                 NaT 2018-11-24 18:29:30
4  2018-11-24 18:33:14     0.0                 NaT                 NaT
5  2018-11-26 17:39:31    20.0 2018-11-26 17:39:31                 NaT
6  2018-11-26 18:51:07    20.0                 NaT                 NaT
7  2018-11-26 21:04:35     0.0                 NaT 2018-11-26 21:04:35
8  2018-11-26 21:05:20     0.0                 NaT                 NaT
9  2018-11-26 21:13:44     0.0                 NaT                 NaT
10 2018-11-26 21:25:57     0.0                 NaT                 NaT
11 2018-11-29 02:19:57     7.0 2018-11-29 02:19:57                 NaT
12 2018-12-09 16:02:06     5.0                 NaT                 NaT
13 2018-12-09 16:33:03     2.5                 NaT                 NaT
14 2018-12-09 21:02:10     0.0                 NaT 2018-12-09 21:02:10