environment variables-Recommended way of creating reusable objects within an R function

Allan Cameron 2020-12-02 02:45:02

You should certainly avoid writing an object to the global environment. If you find that you have to repeat the same computationally expensive task at the top of a number of different functions, it means you are carrying out the computationally expensive task too late.

For example, you could create an S3 class that holds the necessary components to produce a "cheap" plot and a "cheap" extraction of the coefficients. It even has the benefits of generic dispatch:

add_ten <- function(model) model$model + 10

lm_tens <- function(formula, data)
{
  model <- if(missing(data)) lm(formula) else lm(formula, data = data)
  
  structure(list(data = data.frame(add_ten(model)), model = model),
            class = "tens")
}

plot.tens <- function(tens) {
  x = all.vars(formula(tens$data))[2]
  y = all.vars(formula(tens$data))[1]
  ggplot2::ggplot(tens$data, ggplot2::aes(x = x, y = y)) + 
    ggplot2::geom_point() + 
    ggplot2::geom_smooth()
}

coef.tens = function(tens) {
  coef(lm(formula(tens$model), data = tens$data))
}

So now we just need to do:

set.seed(21)
y = rnorm(100)
x = .5*y + rnorm(100, 0, sqrt(.75))

mod <- lm_tens(y ~ x)
coef(mod)
#> (Intercept)           x 
#>   4.3269914   0.5775404
plot(mod)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Note that we only need to call add_ten once here.

dfife 2020-12-01 19:27:17

I've seen functions do something like this (e.g., the mice package and the norm package) and always found the two-stage process a little frustrating. But, i think a good alternative is similar to what you propose: do not require the lm_tens function, but use it if they've called it (otherwise, repeat add_ten).

Allan Cameron 2020-12-01 19:34:42

@dfife it depends what possible uses you have for the object. I came across an example today from the eulerr package. The euler class contains a few different lightweight fields that made it convenient to bundle them up into a class, but the most of the work is done later; the plot.euler function is expensive, so the structures needed to draw the plot are only generated when plot is called. On the other hand, most regression functions do the computationally expensive part at the outset and you can pass the model around knowing that it's going to be cheap to do any work on it later.

dfife 2020-12-01 19:39:13

Let me give a bit more detail (just in case I'm overlooking some obvious solution). My package accepts models fitted from the lavaan package. (lavaan is NOT my package). Some of my functions require computing standard errors (which are estimated through mutliple imputation, which is computationally intensive). I can't attach these standard errors to the already-estimated lavaan model, so I have been computing them for each function. But, the same standard error computations might happen multiple times for different functions. Hence, the question about placing them in the global environment :)

Allan Cameron 2020-12-01 20:25:56

@dfife so why not create a class that wraps the lavaan object and holds it as a member, but also holds the standard errors? Say your class is called "dfife_class" and contains a member "model" which is a lavaan object, and a member "SE" which has the computed standard errors. At the head of each function check whether it has been passed a lavaan object or a "dfife_class" object. If it's a lavaan object, calculate the SE and turn the lavaan into a "dfife_class" object. Then write your function to handle "dfife_class" objects

dfife 2020-12-01 21:18:26

I see what you're saying. Yes, that's a good idea and will save an extra step (assuming users fit with lm_tens instead of lm). Thanks!

Recommended way of creating reusable objects within an R function

热门帖子

热门github