I am working with a longitudinal data set that has multiple rows per ID in a single unit of time. I have never seen a case like this before and couldn't find any similar questions.
In this example groups borrow money. Each group is formed by multiple clients and each credit may appear during multiple months (the data is longitudinal). In case a single group has more than one credit, I would like to display whether it's their first, second or third credit with the lender.
In the following example, I would like to declare the column Iteration
.
Let Group 1, which is formed by clients 1 and 2 get two loans: Credit_ID 100 on January 2018 and Credit_ID 233 on March .
> dt
Client Group Credit_ID Crop File_origin Iteration
1 1 100 2018-01 2018-01 1
2 1 100 2018-01 2018-01 1
1 1 100 2018-01 2018-02 1
2 1 100 2018-01 2018-02 1
1 1 233 2018-03 2018-03 2
2 1 233 2018-03 2018-03 2
How can I define the Iteration
column? I think the key is to focus on each time Group and Credit_ID change.
I tried:
library(data.table)
dt[, 1:.N, by = list(Group, Credit_ID)]
But this enumerates the number of rows per Group and Credit_ID.
One dplyr
possibility could be:
df %>%
group_by(Group, Client) %>%
mutate(Res = cumsum(!duplicated(Credit_ID)))
Client Group Credit_ID Crop File_origin Iteration Res
<int> <int> <int> <chr> <chr> <int> <int>
1 1 1 100 2018-01 2018-01 1 1
2 2 1 100 2018-01 2018-01 1 1
3 1 1 100 2018-01 2018-02 1 1
4 2 1 100 2018-01 2018-02 1 1
5 1 1 233 2018-03 2018-03 2 2
6 2 1 233 2018-03 2018-03 2 2
Or the same with base R
:
with(df, ave(Credit_ID, Group, Client, FUN = function(x) cumsum(!duplicated(x))))