ethnicity_col_names <- c("surname", "first_name", "surname.match", "white", "black",
"hispanic", "asian", "other")
colnames(ethnicity_sample) <- ethnicity_col_names
ethnicity_sample$try <- pmax(ethnicity_sample$white, ethnicity_sample$black, ethnicity_sample$hispanic,
ethnicity_sample$asian, ethnicity_sample$other)
Each one of the ethnicity categories returns a % likelihood of the person belonging to that ethnicity. When I use the pmax function, it returns the highest % (in numbers). I want it to return the name of the column with the ethnicity with the highest % match.
We can use max.col
to return the index of the columns with the max value for each row
nm1 <- c("white", "black", "hispanic", "asian", "other")
ethnicity_sample$try <- nm1[max.col(ethnicity_sample[nm1], 'first')]
Incredible- thank you so much!!!