I have a data of savings accounts of individuals and I observe the amount of agreement, as well as opening and closing dates. Here is the savings data of a consumer:
amount <- c(1004, 1004, 1240, 1039, 1240, 1039, 1039, 1240, 1040, 1040)
opening <- as.Date(c('2012-11-19', '2013-05-20', '2014-06-13', '2015-05-26',
'2015-06-13', '2015-11-26', '2016-05-26', '2016-06-13', '2016-11-26',
'2017-05-26'))
closing <- as.Date(c('2013-05-20', '2013-11-20', '2015-06-13', '2015-11-26',
'2016-06-13', '2016-05-26', '2016-11-26', '2017-06-13', '2017-05-26',
'2017-07-10'))
dt <- data.frame(amount, opening, closing)
amount opening closing
1004 2012-11-19 2013-05-20
1004 2013-05-20 2013-11-20
1240 2014-06-13 2015-06-13
1039 2015-05-26 2015-11-26
1240 2015-06-13 2016-06-13
1039 2015-11-26 2016-05-26
1039 2016-05-26 2016-11-26
1240 2016-06-13 2017-06-13
1040 2016-11-26 2017-05-26
1040 2017-05-26 2017-07-10
My task is the following: I want to identify all the accounts that have been rolled over. In other words, I want to track all the savings amounts through time and see whether the consumer closed the account and reopened it on the same day (automatic renewal of a savings account). For example, on 2015-05-26 the consumer opened an account of 1039$, then rolled it over on 2015-11-26 and then again on 2016-05-26, then on 2016-11-26 (1040$) and finally on 2017-05-26 (1040$).
I can identify those accounts with ifelse(dt$opening %in% dt$closing, 1, 0)
, but this obviously is not enough. I am not sure how to proceed and what the usual methodology is in such cases (I wonder if replicating the entire data set would be a good start).
The final goal is to find out if someone has contributed to the savings amount or decreased it when rolling over the account.
Hope this is clear enough. Any help is very much appreciated!
You can identify rows whose closing
date equals the opening
date of another row with the same amount
value using a self-join. In the output below they will be the rows with non-missing rollover_opening
. To answer the actual question you're asking the data would need to contain more info.
library(data.table)
setDT(dt)
dt[dt, on = .(amount, closing = opening), rollover_opening := i.opening]
dt
# amount opening closing rollover_opening
# 1: 1004 2012-11-19 2013-05-20 2013-05-20
# 2: 1004 2013-05-20 2013-11-20 <NA>
# 3: 1240 2014-06-13 2015-06-13 2015-06-13
# 4: 1039 2015-05-26 2015-11-26 2015-11-26
# 5: 1240 2015-06-13 2016-06-13 2016-06-13
# 6: 1039 2015-11-26 2016-05-26 2016-05-26
# 7: 1039 2016-05-26 2016-11-26 <NA>
# 8: 1240 2016-06-13 2017-06-13 <NA>
# 9: 1040 2016-11-26 2017-05-26 2017-05-26
# 10: 1040 2017-05-26 2017-07-10 <NA>
Another option:
dt[dt, on = .(closing = opening), rollover_amount := i.amount][]
# amount opening closing rollover_amount
# 1: 1004 2012-11-19 2013-05-20 1004
# 2: 1004 2013-05-20 2013-11-20 NA
# 3: 1240 2014-06-13 2015-06-13 1240
# 4: 1039 2015-05-26 2015-11-26 1039
# 5: 1240 2015-06-13 2016-06-13 1240
# 6: 1039 2015-11-26 2016-05-26 1039
# 7: 1039 2016-05-26 2016-11-26 1040
# 8: 1240 2016-06-13 2017-06-13 NA
# 9: 1040 2016-11-26 2017-05-26 1040
# 10: 1040 2017-05-26 2017-07-10 NA
Thank you very much @IceCreamToucan, and I apologize for the late reaction to this. Could you explain how to use the code by group? When I try to use the
by
option, it gives an error (i.amount not found). I wonder if I should use this together withdplyr
. My groups areid
andCurrency
(as I want to identify roll-overs by currency).You can join on additional variables if you want the matching which determines the new column to consider those variables. For example, instead of
on = .(amount, closing = opening)
you could doon = .(amount, closing = opening, id, Currency)
.Great! Thank you so much for your time, @IceCreamToucan!