Warm tip: This article is reproduced from stackoverflow.com, please click
r

How to join with ColumnA = (ColumnB

发布于 2020-03-27 10:19:39

I am trying to join two dataframe. The condition of the join is not ColumnA=ColumnB but ColumnA=ColumnB*Function. With the function merge, I dont see how i can handle it

There a exemple,

df1 <- data.frame(ID=c(5,4,3,2), CASE=c("A","B","C","D"))
df2 <- data.frame(ID=c(6,5,4,3), RESULT=c("ResultA","ResultB","ResultC","ResultD"))

I would like to join df1 and df2 with somethng like df1$ID = df2$ID - 1, to have the result:

df_result<- data.frame(ID_df1=c(5,4,3,2), CASE=c("A","B","C","D"), RESULT=c("Result5","Result4","Result3","Result2"))

I have tried to delete the quotation marks in the join, but it does not work:

df_result <- merge ( x = df1, y = df2, by.x = ID , by.y = ID - 1 , all.x = TRUE)

Could some one helps me? : )

Thank you !

Questioner
Karibuu
Viewed
104
Maurits Evers 2019-07-03 21:12

A tidyverse solution to reproduce your expected output would be

library(tidyverse)
left_join(df1, df2 %>% mutate(ID = ID - 1)) %>%
    mutate(RESULT = str_replace(RESULT, "^(.+)[A-Z]$", paste0("\\1", ID)))
#Joining, by = "ID"
#  ID CASE  RESULT
#1  5    A Result5
#2  4    B Result4
#3  3    C Result3
#4  2    D Result2

Explanation: If you only want to merge by ID and ID - 1 a simple

left_join(df1, df2 %>% mutate(ID = ID - 1))
#  ID CASE  RESULT
#1  5    A ResultA
#2  4    B ResultB
#3  3    C ResultC
#4  2    D ResultD

is sufficient. The additional mutate takes care of renaming RESULT according to your expected output.


Or a base R option would start from

merge(df1, transform(df2, ID = ID - 1), by = "ID")
#  ID CASE  RESULT
#1  2    D ResultD
#2  3    C ResultC
#3  4    B ResultB
#4  5    A ResultA

and including renaming RESULT

transform(
    merge(df1, transform(df2, ID = ID - 1), by = "ID"),
    RESULT = paste0(substr(RESULT, 1, nchar(as.character(RESULT)) - 1), ID))
#  ID CASE  RESULT
#1  2    D Result2
#2  3    C Result3
#3  4    B Result4
#4  5    A Result5

reproducing your expected output (with a slightly different row ordering).