Warm tip: This article is reproduced from serverfault.com, please click

datetime mysql sql query-optimization where-clause

How to speed up count(distinct) with Between clause in MySQL

发布于 2020-11-24 17:55:27

I have a MySQL table of 10 million rows and 3 columns, in following format:

id                                     time                               num

ca65e871-d758-437e-b76f-175234760e7b  2020-11-14 23:08:05.553770          11112222222
...

For running the first query below, I indexed the table on (num, time) and it works very fast (<5 milliseconds on 10 million rows table):

SELECT COUNT(*) 
FROM TABLE_NAME 
WHERE time >= '2020-11-14 23:08:05.553752' AND num = 11112222222

However I also need to execute count(distinct) on the same table with between clause, something like this:

SELECT COUNT(DISTINCT num) 
FROM TABLE_NAME 
WHERE time >= '2020-11-14 23:08:05.553752'
  AND num BETWEEN (11112222222 - 30)
              AND (11112222222 + 30)

This turns out to be significantly slower, around 200 milliseconds. Is there a way to speed the execution time of the second query on the same table?

Questioner

Makaroni

Viewed

0

Akina 2020-11-25 02:21:55

If your MySQl is 8+ then try:

WITH RECURSIVE
cte AS ( SELECT 11112222222 - 30 num
         UNION ALL
         SELECT num + 1 FROM cte WHERE num < 11112222222 + 30 )
SELECT COUNT(*)
FROM cte
WHERE EXISTS ( SELECT NULL
               FROM TABLE_NAME 
               WHERE TABLE_NAME.num = cte.num
                 AND time >= '2020-11-14 23:08:05.553752' )

If you'll often execute such query then I'd suggest to create service table with the numbers from -30 to 30 and use it instead of recursive CTE.

Makaroni 2020-11-24 18:29:46

Absolutely amazing! This makes the query runs for less than 5 milliseconds. Thank you very much! Can you give me a brief explanation of what you did?

Akina 2020-11-24 18:32:21

@Makaroni I generate the nums list within the range then simply test does there exists a row with this num within the date range. WHERE EXISTS checks the presence - i.e. it executes for each row but for each num it aborts after it find one value (whereas your query needs to find all matched rows).

Makaroni 2020-11-24 18:35:43

Fantastic! I will read this explanation several times in order to really figure it out. :)) Thanks again!

Makaroni 2020-11-25 13:38:24

BTW, can you help me with similar problem of using count(distinct)? Can I apply recursive cte in this case also? stackoverflow.com/questions/65005378/…

热门帖子

1

难道 Go 就没有好用的工作审批流框架吗

2

跟风贴自家软路由实现

3

Mac 上有什么 pdf 阅读器比较好用？

4

再开一个贴，问界发布新 M5，请各位老哥或者老车主骂醒我

5

我这辈子是不是彻底和 Airpods Pro 无缘了

6

最近对很多事情都没有感觉了

7

微信读书非付费会员有每月导入数量限制了

8

不知道从什么时候，普通 USB-C 双头线不能给 iPhone15 充电了

9

程序员想开发漂亮的个人网站是不是用 react 会比 vue 简单一些？

10

有关于上海居住证 120 积分

热门github

1

A multi-platform library for OpenGL, OpenGL ES, Vulkan, window and input

2

Dev tool that writes scalable apps from scratch while the developer oversees the implementation

3

shadcn/ui, but for Svelte. ✨

4

The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

5

Performance-portable, length-agnostic SIMD with runtime dispatch

6

ZK Credo

7

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

8

Joplin - the secure note taking and to-do app with synchronisation capabilities for Windows, macOS, Linux, Android and iOS.

9

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.

10

This repository contains System Design resources which are useful while preparing for interviews and learning Distributed Systems

11

Curso para aprender el lenguaje de programación Python desde cero y para principiantes. 75 clases, 37 horas en vídeo, código, proyectos y grupo de chat. Fundamentos, frontend, backend, testing, IA...

12

🎓 Path to a free self-taught education in Computer Science!

13

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

14

A collective list of free APIs

15

📚 Freely available programming books