温馨提示:本文翻译自stackoverflow.com，查看原文请点击：python - How to make this search and count much faster?

list python search

python - 如何使该搜索和计数更快？

发布于 2020-04-12 00:13:20

def count_occurrences(string):
    count = 0
    for text in GENERIC_TEXT_STORE:
        count += text.count(string)
    return count

GENERIC_TEXT_STORE是字符串列表。例如：

GENERIC_TEXT_STORE = ['this is good', 'this is a test', 'that's not a test']

给定一个字符串“ text”，我想查找GENERIC_TEXT_STORE中该文本（即“ this”）出现了多少次。如果我的GENERIC_TEXT_STORE很大，那就太慢了。有什么方法可以使搜索和计数更快？例如，如果我将大的GENERIC_TEXT_STORE列表拆分为多个较小的列表，那会更快吗？

如果多处理模块在这里有用，那么如何使它成为可能？

提问者

ling

被浏览

58

查看英文版

查看原文

Ch3steR 2020-02-02 17:39

您可以使用re。

In [2]: GENERIC_TEXT_STORE = ['this is good', 'this is a test', 'that\'s not a test']

In [3]: def count_occurrences(string):
   ...:     count = 0
   ...:     for text in GENERIC_TEXT_STORE:
   ...:         count += text.count(string)
   ...:     return count

In [6]: import re

In [7]: def count(_str):
   ...:     return len(re.findall(_str,''.join(GENERIC_TEXT_STORE)))
   ...:
In [28]: def count1(_str):
    ...:     return ' '.join(GENERIC_TEXT_STORE).count(_str)
    ...:

现在使用timeit来分析执行时间。

当的大小GENERIC_TEXT_STORE为时3。

In [9]: timeit count('this')
1.27 µs ± 57.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [10]: timeit count_occurrences('this')
697 ns ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [33]: timeit count1('this')
385 ns ± 22.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

当的大小GENERIC_TEXT_STORE为时 15000。

In [17]: timeit count('this')
1.07 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [18]: timeit count_occurrences('this')
3.35 ms ± 279 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [37]: timeit count1('this')
275 µs ± 18.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

当的大小GENERIC_TEXT_STORE为150000

In [20]: timeit count('this')
5.7 ms ± 2.39 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [21]: timeit count_occurrences('this')
33 ms ± 3.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [40]: timeit count1('this')
3.98 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

当大小GENERIC_TEXT_STORE超过一百万（1500000）

In [23]: timeit count('this')
50.3 ms ± 7.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [24]: timeit count_occurrences('this')
283 ms ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [43]: timeit count1('this')
40.7 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

count1 <count <count_occurrences

当的大小GENERIC_TEXT_STORE大时count，count1速度比快4至5倍count_occurrences。

相关问题

1

如何使用python cut方法创建bin，接受一个参数并返回适当的bin？

2

从具有特定条件的列表列表创建字典

3

根据行值选择列，Python，Pandas

4

在数据框中绘制零和一的计数

5

python函数。

6

在两个DataFrame之间执行大量Pandas查找的最佳方法

7

如何获取Pandas数据透视表中的列数和每列的宽度？

8

在Pandas数据框中分组时缺少所需值时显示一列

9

Python隐藏壁虱但显示壁虱标签

10

获取Entry和checkbutton值Tkinter时出现问题

热门github

1

All Algorithms implemented in Python (翻译：用 Python 实现的所有算法)

2

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

3

PowerShell for every system! (翻译：适用于各系统的PowerShell)

4

3D Reconstruction for all

5

6

AI wearables. Put it on, speak, transcribe, automatically

7

zero-shot voice conversion & singing voice conversion, with real-time support

8

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 80+ languages. (翻译：PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库，助力开发者训练出更好的模型，并应用落地。)

9

"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"

10

A cryptocurrency trading API with more than 100 exchanges in JavaScript / TypeScript / Python / C# / PHP / Go (翻译：一个 JavaScript / Python / PHP 加密货币交易 API，支持 100 多个比特币/山寨币交易所)

11

An AI Hedge Fund Team

12

DeepResearchAgent is a hierarchical multi-agent system designed not only for deep research tasks but also for general-purpose task solving. The framework leverages a top-level planning agent to coordinate multiple specialized lower-level agents, enabling automated task decomposition and efficient execution across diverse and complex domains.

13

基于大模型和 RAG 的智能问数系统。Text-to-SQL Generation via LLMs using RAG.

14

Run LLMs with MLX

15

Python tool for converting files and office documents to Markdown.