I have data in r that has over 6000 observations and 96 variables.
The data relates to groups of individuals and their activities etc. If a group returned the Group ID number was recorded again and a new observation was made. I need to merge the rows by ID so that the # of individuals take the highest number recorded, but the activities etc are a combination of both observations.
The data contains, #of individuals, activities, impacts, time of arrival etc. The issue is that some of the observations were split across 2 lines, so there may be activities which were recorded for the same group in another line. The Group ID for both observations is the same, but one may have the #of individuals recorded and some activity records or impacts, but the second may be incomplete and only have Group ID and then Impacts (which are additional to those in the 1st record). The #of individuals in the group never changes, so I need some way to combine them so that activities are additive, but #visitors takes the value that is highest, time of arrival needs to be the earliest recorded and time of departure needs to be the later of the 2 observations.
Does anyone know how to merge observations based on Group ID but vary the merging protocol based on the variable.
I'm not sure if this actually is what you want, but to combine rows of a data frame based on multiple conditions you can use the dplyr
package and its summarise()
function. I generated some data to use in R directly, you would have to modify the code according to your needs.
# generate data
ID<-rep(1:20,2)
visitors<-sample(1:50, 40, replace=TRUE)
impact<-sample(rep(c("a", "b", "c", "d", "e"), 8))
arrival<-sample(rep(8:15, 5))
departure <- sample(rep(16:23, 5))
df<-data.frame(ID, visitors, impact, arrival, departure)
df$impact<-as.character(df$impact)
# summarise rows with identical ID
df_summary <- df %>%
group_by(ID) %>%
summarise(visitors = max(visitors), arrival = min(arrival),
departure = max(departure), impact = paste0(impact, collapse =", "))
Hope this helps!
This is exactly what I wanted and worked perfectly for my data. I really really appreciate the help.
Glad I could help! Could you maybe accept the answer if you're happy with it? Thanks!
Sorry this was the 1st question I have ever asked, so I didnt realise I had to accept the answer. That is accepted now.