I have a set of data which consists of periodically collected values. I want to calculate a median using 2 left and right neighbors of a current value for each element of set.
For example, the set is:
21
22
23
-10
20
22
19
21
100
20
For the first value we pick 21
, 22
, 23
which median is 22
. So for 21
we have 22
. For -10
we have 22
, 23
, -10
, 20
, 22
. Median is 22
.
I use this method to get rid of "deviant" values which are abnormal for this set.
I guess I should somehow use median
analytic function. Something like that:
SELECT (SELECT median(d.value)
FROM my_set d
WHERE d.key_val = s.key_val
AND d.order_value BETWEEN s.order_value - 2 AND s.order_value + 2) median_val
,s.key_val
,s.order_value
FROM my_set s
I would be happy to see any other approaches or some improved approaches to solve this question.
You did not specify anything about your table structure so I'm just guessing from your SQL what fields there are and what they're supposed to mean, but consider an attempt like this one:
SELECT s1.key_val, s1.order_value, s1.value, MEDIAN(s2.value) as med
FROM my_set s1
LEFT OUTER JOIN my_set s2
ON s2.key_val = s1.key_val
AND (s1.order_value - 2) <= s2.order_value
AND s2.order_value <= (s1.order_value + 2)
GROUP BY s1.key_val, s1.order_value, s1.value
Looks like it would be the simpliest way without doing analytics