Hello I'd like to compute a new feature duration
from date_start
and date_end
. If the contract not yet ended I compute it using today date. My problem is it's been 1hours my for loop is running I've only 200K rows.
What's wrong (maybe) with my code? Is there another way to do this more simple?
dftopyear['duration'] = ''
for x in dftopyear.Date_resil:
if x == pd.isnull(np.datetime64('NaT')): # this mean contract not yet ended
dftopyear['duration'] = dt.datetime.today().strftime("%Y-%m-%d") - dftopyear['date_start']
else: # this mean contact ended
dftopyear['duration'] = dftopyear['Date_end'] - dftopyear['date_start']
There's a major problem here is, when you do the minus dftopyear['date_start'], it's doing minus against the entire DataFrame.
You need a index locator to point to a single value, rather than an entire series:
dftopyear['duration'] = ''
for i,x in enumerate(dftopyear.Date_resil):
if pd.isnull(x):
dftopyear.iloc[i, 'duration'] = dt.datetime.today().strftime("%Y-%m-%d") - dftopyear.iloc[i, 'date_start']
else:
dftopyear.iloc[i, 'duration'] = dftopyear.iloc[i, 'Date_end'] - dftopyear.iloc[i, 'date_start']
or a more pythonic way:
dftopyear['duration'] = ''
for i,x in enumerate(dftopyear.Date_resil):
end_day = dt.datetime.today().strftime("%Y-%m-%d") if pd.isnull(x) else dftopyear.iloc[i, 'Date_end']
dftopyear.iloc[i, 'duration'] = end_day - dftopyear.iloc[i, 'date_start']
Thanks let try it. mine finish running with entry NaT values.
i've got this error
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types
try use my updated answer. I think the first publish has a typo was corrected later.
I see the problem and fixe it now, iloc take only integer I replaced the var name by its index in axis 1. THANKS
You are right. It's actually only dt.datetime.today() is needed if your date column is already datetime. I'll update in the answer.