Warm tip: This article is reproduced from stackoverflow.com, please click
grouping r

Display how many rows appear by each ID when data is not a panel (R)

发布于 2020-03-27 10:22:33

I am working with a longitudinal data set that has multiple rows per ID in a single unit of time. I have never seen a case like this before and couldn't find any similar questions.

In this example groups borrow money. Each group is formed by multiple clients and each credit may appear during multiple months (the data is longitudinal). In case a single group has more than one credit, I would like to display whether it's their first, second or third credit with the lender.

In the following example, I would like to declare the column Iteration. Let Group 1, which is formed by clients 1 and 2 get two loans: Credit_ID 100 on January 2018 and Credit_ID 233 on March .

> dt
Client  Group  Credit_ID     Crop  File_origin  Iteration
     1      1        100  2018-01      2018-01          1
     2      1        100  2018-01      2018-01          1
     1      1        100  2018-01      2018-02          1
     2      1        100  2018-01      2018-02          1
     1      1        233  2018-03      2018-03          2
     2      1        233  2018-03      2018-03          2

How can I define the Iteration column? I think the key is to focus on each time Group and Credit_ID change.

I tried:

    library(data.table)
    dt[, 1:.N, by = list(Group, Credit_ID)]

But this enumerates the number of rows per Group and Credit_ID.

Questioner
Arturo Sbr
Viewed
131
tmfmnk 2019-07-03 22:22

One dplyr possibility could be:

df %>%
 group_by(Group, Client) %>%
 mutate(Res = cumsum(!duplicated(Credit_ID)))

  Client Group Credit_ID Crop    File_origin Iteration   Res
   <int> <int>     <int> <chr>   <chr>           <int> <int>
1      1     1       100 2018-01 2018-01             1     1
2      2     1       100 2018-01 2018-01             1     1
3      1     1       100 2018-01 2018-02             1     1
4      2     1       100 2018-01 2018-02             1     1
5      1     1       233 2018-03 2018-03             2     2
6      2     1       233 2018-03 2018-03             2     2

Or the same with base R:

with(df, ave(Credit_ID, Group, Client, FUN = function(x) cumsum(!duplicated(x))))