Warm tip: This article is reproduced from stackoverflow.com, please click
datetime for-loop pandas python

compute date feature from two feature in pandas

发布于 2020-04-05 00:24:40

Hello I'd like to compute a new feature duration from date_start and date_end. If the contract not yet ended I compute it using today date. My problem is it's been 1hours my for loop is running I've only 200K rows. What's wrong (maybe) with my code? Is there another way to do this more simple?

dftopyear['duration'] = ''
for x in dftopyear.Date_resil:
    if x == pd.isnull(np.datetime64('NaT')): # this mean contract not yet ended
        dftopyear['duration'] = dt.datetime.today().strftime("%Y-%m-%d") - dftopyear['date_start'] 
    else: # this mean contact ended 
        dftopyear['duration'] = dftopyear['Date_end'] - dftopyear['date_start']

Questioner
abdoulsn
Viewed
87
Tony Yun 2020-01-31 23:11

There's a major problem here is, when you do the minus dftopyear['date_start'], it's doing minus against the entire DataFrame.

You need a index locator to point to a single value, rather than an entire series:

dftopyear['duration'] = ''
for i,x in enumerate(dftopyear.Date_resil):
    if pd.isnull(x):
        dftopyear.iloc[i, 'duration'] = dt.datetime.today().strftime("%Y-%m-%d") - dftopyear.iloc[i, 'date_start'] 
    else: 
        dftopyear.iloc[i, 'duration'] = dftopyear.iloc[i, 'Date_end'] - dftopyear.iloc[i, 'date_start']

or a more pythonic way:

dftopyear['duration'] = ''
for i,x in enumerate(dftopyear.Date_resil):
    end_day = dt.datetime.today().strftime("%Y-%m-%d") if pd.isnull(x) else dftopyear.iloc[i, 'Date_end']
    dftopyear.iloc[i, 'duration'] = end_day - dftopyear.iloc[i, 'date_start']