Warm tip: This article is reproduced from stackoverflow.com, please click
r

compare values of a column with multiple values in another column by group in R

发布于 2020-03-27 10:25:36

I have a data of savings accounts of individuals and I observe the amount of agreement, as well as opening and closing dates. Here is the savings data of a consumer:

amount <- c(1004, 1004, 1240, 1039, 1240, 1039, 1039, 1240, 1040, 1040)  
opening <- as.Date(c('2012-11-19', '2013-05-20', '2014-06-13', '2015-05-26',
    '2015-06-13', '2015-11-26', '2016-05-26', '2016-06-13', '2016-11-26',
    '2017-05-26'))  
closing <- as.Date(c('2013-05-20', '2013-11-20', '2015-06-13', '2015-11-26',
    '2016-06-13', '2016-05-26', '2016-11-26', '2017-06-13', '2017-05-26',
    '2017-07-10'))

dt <- data.frame(amount, opening, closing) 
   amount    opening    closing
     1004 2012-11-19 2013-05-20
     1004 2013-05-20 2013-11-20
     1240 2014-06-13 2015-06-13
     1039 2015-05-26 2015-11-26
     1240 2015-06-13 2016-06-13
     1039 2015-11-26 2016-05-26
     1039 2016-05-26 2016-11-26
     1240 2016-06-13 2017-06-13
     1040 2016-11-26 2017-05-26
     1040 2017-05-26 2017-07-10

My task is the following: I want to identify all the accounts that have been rolled over. In other words, I want to track all the savings amounts through time and see whether the consumer closed the account and reopened it on the same day (automatic renewal of a savings account). For example, on 2015-05-26 the consumer opened an account of 1039$, then rolled it over on 2015-11-26 and then again on 2016-05-26, then on 2016-11-26 (1040$) and finally on 2017-05-26 (1040$).

I can identify those accounts with ifelse(dt$opening %in% dt$closing, 1, 0), but this obviously is not enough. I am not sure how to proceed and what the usual methodology is in such cases (I wonder if replicating the entire data set would be a good start).

The final goal is to find out if someone has contributed to the savings amount or decreased it when rolling over the account.

Hope this is clear enough. Any help is very much appreciated!

Questioner
Shirkhanyan
Viewed
105
IceCreamToucan 2019-07-03 23:17

You can identify rows whose closing date equals the opening date of another row with the same amount value using a self-join. In the output below they will be the rows with non-missing rollover_opening. To answer the actual question you're asking the data would need to contain more info.

library(data.table)
setDT(dt)

dt[dt, on = .(amount, closing = opening), rollover_opening := i.opening]

dt
#     amount    opening    closing rollover_opening
#  1:   1004 2012-11-19 2013-05-20       2013-05-20
#  2:   1004 2013-05-20 2013-11-20             <NA>
#  3:   1240 2014-06-13 2015-06-13       2015-06-13
#  4:   1039 2015-05-26 2015-11-26       2015-11-26
#  5:   1240 2015-06-13 2016-06-13       2016-06-13
#  6:   1039 2015-11-26 2016-05-26       2016-05-26
#  7:   1039 2016-05-26 2016-11-26             <NA>
#  8:   1240 2016-06-13 2017-06-13             <NA>
#  9:   1040 2016-11-26 2017-05-26       2017-05-26
# 10:   1040 2017-05-26 2017-07-10             <NA>

Another option:

dt[dt, on = .(closing = opening), rollover_amount := i.amount][]
#     amount    opening    closing rollover_amount
#  1:   1004 2012-11-19 2013-05-20            1004
#  2:   1004 2013-05-20 2013-11-20              NA
#  3:   1240 2014-06-13 2015-06-13            1240
#  4:   1039 2015-05-26 2015-11-26            1039
#  5:   1240 2015-06-13 2016-06-13            1240
#  6:   1039 2015-11-26 2016-05-26            1039
#  7:   1039 2016-05-26 2016-11-26            1040
#  8:   1240 2016-06-13 2017-06-13              NA
#  9:   1040 2016-11-26 2017-05-26            1040
# 10:   1040 2017-05-26 2017-07-10              NA