I have a MySQL table of 10 million rows and 3 columns, in following format:
id time num
ca65e871-d758-437e-b76f-175234760e7b 2020-11-14 23:08:05.553770 11112222222
...
For running the first query below, I indexed the table on (num, time)
and it works very fast (<5 milliseconds on 10 million rows table):
SELECT COUNT(*)
FROM TABLE_NAME
WHERE time >= '2020-11-14 23:08:05.553752' AND num = 11112222222
However I also need to execute count(distinct)
on the same table with between
clause, something like this:
SELECT COUNT(DISTINCT num)
FROM TABLE_NAME
WHERE time >= '2020-11-14 23:08:05.553752'
AND num BETWEEN (11112222222 - 30)
AND (11112222222 + 30)
This turns out to be significantly slower, around 200 milliseconds. Is there a way to speed the execution time of the second query on the same table?
If your MySQl is 8+ then try:
WITH RECURSIVE
cte AS ( SELECT 11112222222 - 30 num
UNION ALL
SELECT num + 1 FROM cte WHERE num < 11112222222 + 30 )
SELECT COUNT(*)
FROM cte
WHERE EXISTS ( SELECT NULL
FROM TABLE_NAME
WHERE TABLE_NAME.num = cte.num
AND time >= '2020-11-14 23:08:05.553752' )
If you'll often execute such query then I'd suggest to create service table with the numbers from -30 to 30 and use it instead of recursive CTE.
Absolutely amazing! This makes the query runs for less than 5 milliseconds. Thank you very much! Can you give me a brief explanation of what you did?
@Makaroni I generate the
num
s list within the range then simply test does there exists a row with thisnum
within the date range. WHERE EXISTS checks the presence - i.e. it executes for each row but for eachnum
it aborts after it find one value (whereas your query needs to find all matched rows).Fantastic! I will read this explanation several times in order to really figure it out. :)) Thanks again!
BTW, can you help me with similar problem of using
count(distinct)
? Can I applyrecursive cte
in this case also? stackoverflow.com/questions/65005378/…