I have a column (named A) in a data frame that contains natural numbers as well as vectors of natural numbers. For the cells in which there is a vector of natural numbers, I want to calculate the mean of that vector. The end result I then want to store in a new column, named B.
Currently, I tried to do the following:
Val <- unlist(lapply(str_split(data$A, ","),
function(x) mean(as.numeric(x), na.rm=TRUE)))
Val[length(Val)] <- mean(Val[-length(Val)], na.rm=TRUE)
data$B <- Val
However, this doesn't seem to work correctly. The function above does not provide me with the mean of the vector, and it returns NaN when the vector only has 2 elements in it. Below an example of what it looks like
If you have column A
as text another way is to remove the extra characters from the column using gsub
, split on comma and then take mean
. Using @zx8754's data
sapply(strsplit(gsub('[c()]', '', df1$A), ","), function(x) mean(as.numeric(x)))
#[1] 1.000 2.000 3.000 2.000 3.000 2.333 3.000 3.000 2.500