r-Minimum absolute difference between vector pairs (greedy algorithm)

Joseph Wood 2020-12-01 06:41:30

TL;DR

No need for combn or the like, simply:

min(abs(diff(sort(v))))

The Nitty Gritty

Finding the difference between every possible combinations is O(n^2). So when we get to vectors of length 1e5, the task is burdensome both computationally and memory-wise.

We need a different approach.

How about sorting and taking the difference only with its neighbor?

By first sorting, for any element v_j, the difference min |v_j - v_{j -/+ 1}| will be the smallest such difference involving v_j. For example, given the sorted vector v:

v = -9 -8 -6 -4 -2  3  8

The smallest distance from -2 is given by:

|-2 - 3| = 5
|-4 - -2| = 2

There is no need in checking any other elements.

This is easily implemented in base R as follows:

getAbsMin <- function(v) min(abs(diff(sort(v))))

I'm not going to use rpois as with any reasonably sized vector, duplicates will be produces, which will trivially give 0 as an answer. A more sensible test would be with runif or sample (minimumAbsoluteDifference2 is from the answer provided by @RuiBarradas):

set.seed(1729)
randUnif100 <- lapply(1:100, function(x) {
    runif(1e3, -100, 100)
})

randInts100 <- lapply(1:100, function(x) {
    sample(-(1e9):(1e9), 1e3)
})

head(sapply(randInts100, getAbsMin))
[1]  586 3860 2243 2511 5186 3047

identical(sapply(randInts100, minimumAbsoluteDifference2),
          sapply(randInts100, getAbsMin))
[1] TRUE

options(scipen = 99)
head(sapply(randUnif100, getAbsMin))
[1] 0.00018277206 0.00020549633 0.00009834766 0.00008395873 0.00005299225 0.00009313226

identical(sapply(randUnif100, minimumAbsoluteDifference2),
          sapply(randUnif100, getAbsMin))
[1] TRUE

It's very fast as well:

library(microbenchmark)
microbenchmark(a = getAbsMin(randInts100[[50]]),
               b = minimumAbsoluteDifference2(randInts100[[50]]),
               times = 25, unit = "relative")
Unit: relative
expr      min       lq     mean   median       uq      max neval
   a   1.0000   1.0000   1.0000   1.0000  1.00000  1.00000    25
   b 117.9799 113.2221 105.5144 107.6901 98.55391 81.05468    25

Even for very large vectors, the result is instantaneous;

set.seed(321)
largeTest <- sample(-(1e12):(1e12), 1e6)

system.time(print(getAbsMin(largeTest)))
[1] 3
 user  system elapsed 
0.083   0.003   0.087

Minimum absolute difference between vector pairs (greedy algorithm)

TL;DR

The Nitty Gritty

热门帖子

热门github