# pandas error creating TimeDeltas from Datetime operation

I have looked at several other related questions here, here, and here, and none of them have come across quite the same problem as me.

I am using Pandas version 0.16.2. I have several columns in a Pandas dataframe, of dtype datetime64[ns]:

In [6]: date_list = ["SubmittedDate","PolicyStartDate", "PaidUpDate", "MaturityDate", "DraftDate", "CurrentValuationDate", "DOB", "InForceDate"] In [11]: data[date_list].head() Out[11]: SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate \ 0 NaT 2002-11-18 NaT 2041-03-04 NaT 1 NaT 2015-01-13 NaT NaT NaT 2 NaT 2014-10-15 NaT NaT NaT 3 NaT 2009-08-27 NaT NaT NaT 4 NaT 2007-04-19 NaT 2013-10-01 NaT CurrentValuationDate DOB InForceDate 0 2015-04-30 1976-03-04 2002-11-18 1 NaT 1949-09-27 2015-01-13 2 NaT 1947-06-15 2014-10-15 3 2015-07-30 1960-06-07 2009-08-27 4 2010-04-21 1950-10-01 2007-04-19

These were originally in string format (e.g. '1976-03-04') which I converted to datetime objects using:

In [7]: for datecol in date_list: ...: data[datecol] = pd.to_datetime(data[datecol], coerce=True, errors = 'raise')

Here are the dtypes for each of these columns:

In [8]: for datecol in date_list: print data[datecol].dtypes

returns:

datetime64[ns] datetime64[ns] datetime64[ns] datetime64[ns] datetime64[ns] datetime64[ns] datetime64[ns] datetime64[ns]

So far, so good. But what I want to do is create a new column for each of these columns that gives the age in days (as an integer) from a certain date.

In [13]: current_date = pd.to_datetime("2015-07-31")

I first ran this:

In [14]: for i in date_list: ....: data[i+"InDays"] = data[i].apply(lambda x: current_date - x)

However, when I check the dtype of the returned columns:

In [15]: for datecol in date_list: ....: print data[datecol + "InDays"].dtypes

I get these:

object timedelta64[ns] object timedelta64[ns] object timedelta64[ns] timedelta64[ns] timedelta64[ns]

I don't know why three of them are objects, when they should be timedeltas. What I want to do next is:

In [16]: for i in date_list: ....: data[i+"InDays"] = data[i+"InDays"].dt.days

This approach works fine for the timedelta columns. However, since three of the columns are not timedeltas, I get this error:

AttributeError: Can only use .dt accessor with datetimelike values

I suspect that there are some values in those three columns that are preventing Pandas from converting them to timedeltas. I can't figure out how to work out what those values might be.

## Answers

The issue occurs because you have three columns with only NaT values, which is causing those columns to be treated as objects when you do apply your condition on it.

You should put some kind of condition in your apply part, to default to some timedelta in case of NaT. Example -

for i in date_list: data[i+"InDays"] = data[i].apply(lambda x: current_date - x if x is not pd.NaT else pd.Timedelta(0))

Or if you cannot do the above, you should put a condition where you want to do - data[i+"InDays"] = data[i+"InDays"].dt.days , to take it only if the dtype of the series allows it.

Or a simpler way to change the apply part to directly get what you want would be -

for i in date_list: data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else x)

This would output -

In [110]: data Out[110]: SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate \ 0 NaT 2002-11-18 NaT 2041-03-04 NaT 1 NaT 2015-01-13 NaT NaT NaT 2 NaT 2014-10-15 NaT NaT NaT 3 NaT 2009-08-27 NaT NaT NaT 4 NaT 2007-04-19 NaT 2013-10-01 NaT CurrentValuationDate DOB InForceDate SubmittedDateInDays \ 0 2015-04-30 1976-03-04 2002-11-18 NaT 1 NaT 1949-09-27 2015-01-13 NaT 2 NaT 1947-06-15 2014-10-15 NaT 3 2015-07-30 1960-06-07 2009-08-27 NaT 4 2010-04-21 1950-10-01 2007-04-19 NaT PolicyStartDateInDays PaidUpDateInDays MaturityDateInDays DraftDateInDays \ 0 4638 NaT -9348 NaT 1 199 NaT NaN NaT 2 289 NaT NaN NaT 3 2164 NaT NaN NaT 4 3025 NaT 668 NaT CurrentValuationDateInDays DOBInDays InForceDateInDays 0 92 14393 4638 1 NaN 24048 199 2 NaN 24883 289 3 1 20142 2164 4 1927 23679 3025

If you want your NaT to be changed to NaN you can use -

for i in date_list: data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN)

Example/Demo -

In [114]: for i in date_list: .....: data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN) .....: In [115]: data Out[115]: SubmittedDate PolicyStartDate PaidUpDate MaturityDate DraftDate \ 0 NaT 2002-11-18 NaT 2041-03-04 NaT 1 NaT 2015-01-13 NaT NaT NaT 2 NaT 2014-10-15 NaT NaT NaT 3 NaT 2009-08-27 NaT NaT NaT 4 NaT 2007-04-19 NaT 2013-10-01 NaT CurrentValuationDate DOB InForceDate SubmittedDateInDays \ 0 2015-04-30 1976-03-04 2002-11-18 NaN 1 NaT 1949-09-27 2015-01-13 NaN 2 NaT 1947-06-15 2014-10-15 NaN 3 2015-07-30 1960-06-07 2009-08-27 NaN 4 2010-04-21 1950-10-01 2007-04-19 NaN PolicyStartDateInDays PaidUpDateInDays MaturityDateInDays \ 0 4638 NaN -9348 1 199 NaN NaN 2 289 NaN NaN 3 2164 NaN NaN 4 3025 NaN 668 DraftDateInDays CurrentValuationDateInDays DOBInDays InForceDateInDays 0 NaN 92 14393 4638 1 NaN NaN 24048 199 2 NaN NaN 24883 289 3 NaN 1 20142 2164 4 NaN 1927 23679 3025