I am trying to join two dataframe. The condition of the join is not ColumnA=ColumnB but ColumnA=ColumnB*Function. With the function merge, I dont see how i can handle it
There a exemple,
df1 <- data.frame(ID=c(5,4,3,2), CASE=c("A","B","C","D"))
df2 <- data.frame(ID=c(6,5,4,3), RESULT=c("ResultA","ResultB","ResultC","ResultD"))
I would like to join df1 and df2 with somethng like df1$ID = df2$ID - 1, to have the result:
df_result<- data.frame(ID_df1=c(5,4,3,2), CASE=c("A","B","C","D"), RESULT=c("Result5","Result4","Result3","Result2"))
I have tried to delete the quotation marks in the join, but it does not work:
df_result <- merge ( x = df1, y = df2, by.x = ID , by.y = ID - 1 , all.x = TRUE)
Could some one helps me? : )
Thank you !
A tidyverse
solution to reproduce your expected output would be
library(tidyverse)
left_join(df1, df2 %>% mutate(ID = ID - 1)) %>%
mutate(RESULT = str_replace(RESULT, "^(.+)[A-Z]$", paste0("\\1", ID)))
#Joining, by = "ID"
# ID CASE RESULT
#1 5 A Result5
#2 4 B Result4
#3 3 C Result3
#4 2 D Result2
Explanation: If you only want to merge by ID
and ID - 1
a simple
left_join(df1, df2 %>% mutate(ID = ID - 1))
# ID CASE RESULT
#1 5 A ResultA
#2 4 B ResultB
#3 3 C ResultC
#4 2 D ResultD
is sufficient. The additional mutate
takes care of renaming RESULT
according to your expected output.
Or a base R option would start from
merge(df1, transform(df2, ID = ID - 1), by = "ID")
# ID CASE RESULT
#1 2 D ResultD
#2 3 C ResultC
#3 4 B ResultB
#4 5 A ResultA
and including renaming RESULT
transform(
merge(df1, transform(df2, ID = ID - 1), by = "ID"),
RESULT = paste0(substr(RESULT, 1, nchar(as.character(RESULT)) - 1), ID))
# ID CASE RESULT
#1 2 D Result2
#2 3 C Result3
#3 4 B Result4
#4 5 A Result5
reproducing your expected output (with a slightly different row ordering).