Warm tip: This article is reproduced from serverfault.com, please click

Why isn't my mixed model loop working? (RStudio, Crossover design)

发布于 2021-05-31 13:10:26

I can't figure out why my loop isn't working.

I have a database (36rows x 51columns, its name is "Seleccio") consisting of 3 factors (first 3 columns: Animal (12 animals), Diet (3 diets) and Period (3 periods)) and 48 variables (many clinical parameters) with 36 observations per column. It is a 3x3 crossover design so I want to implement a mixed model to include the Animal random effect and also Period and Diet fixed effects and the interaction between them.

A sample of the data (but with less rows and columns):

  Animal Diet  Period  Var1  Var2  Var3  Var4  Var5  Var6
  <chr>  <chr> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A      A     A         11    55   1.2 0.023    22     3
2 B      A     A         13    34   1.6 0.04     23     4
3 C      B     A         15    13   1.1 0.052    22     2
4 A      B     B         10    22   1.5 0.067    27     4
5 B      C     B          9    45   1.4 0.012    24     2
6 C      C     B         13    32   1.5 0.014    23     3

> dput(sample[1:9,])
structure(list(Animal = c("A", "B", "C", "A", "B", "C", NA, NA, 
NA), Diet = c("A", "A", "B", "B", "C", "C", NA, NA, NA), Period = c("A", 
"A", "A", "B", "B", "B", NA, NA, NA), Var1 = c(11, 13, 15, 10, 
9, 13, NA, NA, NA), Var2 = c(55, 34, 13, 22, 45, 32, NA, NA, 
NA), Var3 = c(1.2, 1.6, 1.1, 1.5, 1.4, 1.5, NA, NA, NA), Var4 = c(0.023, 
0.04, 0.052, 0.067, 0.012, 0.014, NA, NA, NA), Var5 = c(22, 23, 
22, 27, 24, 23, NA, NA, NA), Var6 = c(3, 4, 2, 4, 2, 3, NA, NA, 
NA)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"
))

I want to make descriptive analysis (normality testing and checking for outliers) of each variable sorted by Diet (which is the treatment) and also run a mixed model and make an ANOVA and a Tukey test for the fixed effects.

I can do the analysis one by one, but it takes a lot of time, I have tried several times to create a for loop to automate the analysis for all the variables but it didn't work (I'm pretty new to R).

What I got so far:

sink("output.txt") # to store the output into a file, as it would be to large to be shown in the console
vars <-as.data.frame(Seleccio[,c(4:51)])
fact <-Seleccio$Diet
dim(vars)
for (i in 1:length(vars)) { 
  variable <- vars[,i]
  lme_cer <- lme(variable ~ Period*Diet, random = ~1 | Animal, data = Seleccio) # the model
  cat("\n---------\n\n")
  cat(colnames(Seleccio)[i]) # the name of each variable, so I don't get lost in the text file
  cat("\n")
  print(boxplot(vars[,i]~fact)$out) #checking for outliers
  print(summary(lme_cer))
  print(anova(lme_cer)) 
  print(lsmeans(lme_cer, pairwise~Diet, adjust="tukey"))
}
sink()

This code runs but doesn't do the job, as it gives me wrong results for each variable because they are different from the results that I get when I analyse each variable one by one. I would also like to add to the loop this normality testing sorted by Diet (Treatment) code. I wonder if it would be possible.

aggregate(formula = VARIABLENAME ~ Diet,
          data = Seleccio,
          FUN = function(x) {y <- shapiro.test(x); c(y$statistic, y$p.value)})

Thank you very much in advance to all of those who are willing to lend me a hand, any help will be greatly appreciated

Questioner
Miquel
Viewed
0
Ricardo Semião e Castro 2021-05-31 23:44:54

I don't think i can run the model with only 6 observations, so i couldn't find why would your loop doesn't return the same as doing it one by one. Maybe the problem is with cat(colnames(Seleccio)[i]): you only want the Var names, and for i=1, 2 and 3, that code will return "Animal", "Diet" and "Period", thus messing up how you're comparing the results. Using cat(colnames(vars)[i]) might correct that. If you find a way to include more observations of Seleccio i might be able to help more.

I would suggest you to create a list to store the output:

vars <- as.data.frame(Seleccio[,c(4:51)])
fact <- Seleccio$Diet
dim(vars)
output = list() #Create empty list
for (i in 1:length(vars)) {
  var = colnames(vars)[i] 
  output[[var]] = list() #Create one entry for each variable
  variable <- vars[,i]
  lme_cer <- lme(variable ~ Period*Diet, random = ~1 | Animal, data = Seleccio) # the model
  #Fill that entry with each statistics:
  output[[var]]$boxplot = boxplot(vars[,i]~fact)$out #checking for outliers
  output[[var]]$summary = summary(lme_cer)
  output[[var]]$anova = anova(lme_cer)
  output[[var]]$lsmeans = lsmeans(lme_cer, pairwise~Diet, adjust="tukey")
  output[[var]]$shapiro = aggregate(formula = variable ~ Diet, data = Seleccio,
            FUN = function(x) {y <- shapiro.test(x); c(y$statistic, y$p.value)})
}

This way you have the results in you R enviroment, and have better visualisation options: do output$Var1 and get all the results for Var1, which should fit in the console; do for(i in output){print(i$summary)} to get all of the summaries; etc.