我有一个表,每当位置的分数发生变化时,该表就会记录一行。
score_history:
这样做是出于效率的考虑,并且能够简单地检索给定位置的更改列表并很好地实现了该目的。
I'm trying to output the data in a very redundant format to help load it into a rigid external system. The external system expects a row for each location * every date. The goal is to represent the last score value for each location for each date. So if the score changed 3 times in a given date only the score closest to midnight would be considered that locations closing score for the day. I imagine this is similar to the challenge of creating a close of business inventory level fact table.
I have a handy star schema style date dimension table which has a row for every date fully covering this sample period and well into the future.
That table looks like
dw_dim_date:
So, if I had only 3 records in the score_history table...
1, 2019-01-01:10:13:01, 100, 5.0
2, 2019-01-05:20:00:01, 100, 5.8
3, 2019-01-05:23:01:22, 100, 6.2
The desired output would be:
2019-01-01, 100, 5.0
2019-01-02, 100, 5.0
2019-01-03, 100, 5.0
2019-01-04, 100, 5.0
2019-01-05, 100, 6.2
3 Requirements:
我一直在通过子查询和窗口函数来追逐自己的尾巴。
因为我不愿意发布没有任何内容的东西,所以我将分享这个火车残骸,它会产生输出,但毫无意义...
SELECT dw_dim_date.date,
(SELECT score
FROM score_history
WHERE score_history.happened_at::DATE < dw_dim_date.date
OR score_history.happened_at::DATE = dw_dim_date.date
ORDER BY score_history.id desc limit 1) as last_score
FROM dw_dim_date
WHERE dw_dim_date.date > '2019-06-01'
感谢您提供指导或其他问题的阅读指南。
您可以通过使用相关子查询和实现此目的LATERAL
:
SELECT sub.date, sub.location_id, score
FROM (SELECT * FROM dw_dim_date
CROSS JOIN (SELECT DISTINCT location_id FROM score_history) s
WHERE date >= '2019-01-01'::date) sub
,LATERAL(SELECT score FROM score_history sc
WHERE sc.happened_at::date <= sub.date
AND sc.location_id = sub.location_id
ORDER BY happened_at DESC LIMIT 1) l
,LATERAL(SELECT MIN(happened_at::date) m1, MAX(happened_at::date) m2
FROM score_history sc
WHERE sc.location_id = sub.location_id) lm
WHERE sub.date BETWEEN lm.m1 AND lm.m2
ORDER BY location_id, date;
怎么运行的:
1)s
(这是每个location_id的所有日期的交叉联接)
2)l
(按位置选择分数)
3)lm
(选择每个位置的最小/最大日期进行过滤)
4)WHERE
在可用范围内过滤日期,如果需要可以放宽日期
谢谢,这是LATERAL的非常有趣的介绍,并且在数据集较大的情况下也表现不错。
@尼克很棒听到:)