Warm tip: This article is reproduced from serverfault.com, please click

dataframe pandas python string match

How to test if a string contains one of the substrings in a list, in pandas?

发布于 2014-10-26 20:23:37

Is there any function that would be the equivalent of a combination of df.isin() and df[col].str.contains()?

For example, say I have the series s = pd.Series(['cat','hat','dog','fog','pet']), and I want to find all places where s contains any of ['og', 'at'], I would want to get everything but 'pet'.

I have a solution, but it's rather inelegant:

searchfor = ['og', 'at']
found = [s.str.contains(x) for x in searchfor]
result = pd.DataFrame[found]
result.any()

Is there a better way to do this?

Questioner

ari

Viewed

0

Alex Riley 2017-03-01 04:42:50

One option is just to use the regex | character to try to match each of the substrings in the words in your Series s (still using str.contains).

You can construct the regex by joining the words in searchfor with |:

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0    cat
1    hat
2    dog
3    fog
dtype: object

As @AndyHayden noted in the comments below, take care if your substrings have special characters such as $ and ^ which you want to match literally. These characters have specific meanings in the context of regular expressions and will affect the matching.

You can make your list of substrings safer by escaping non-alphanumeric characters with re.escape:

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

The strings with in this new list will match each character literally when used with str.contains.

goofd 2014-10-26 21:19:09

maybe good to add this link pandas.pydata.org/pandas-docs/stable/… too. Starting from pandas 0.15, the string operations are even easier

Andy Hayden 2014-10-26 21:24:56

one thing you have to take care with is if a string in searchfor has special regex characters (you can map with re.escape).

Alex Riley 2014-10-26 21:42:47

@AndyHayden Thank you, I've improved my answer to take this complication into account.

Doo Hyun Shin 2019-02-17 12:59:35

I don't know why your method doesn't work with "str.startswith('|'.join(searchfor))"

The Dan 2021-02-11 23:31:56

in this case I understand we use "|" for OR, how could we use AND??

热门帖子

1

C++新手，求助一个关于怎么使用第三方库的问题

2

关于英语学习的重要性的思考

3

这里分享一个免费的在线 PDF 总结工具： NoteGPT

4

没想到 Arc 浏览器对网络要求如此严格

5

深陷消费主义陷阱的背后，是我空洞的灵魂

6

[上海] 招中级前端开发工程师

7

澳大利亚🇦🇺归来~第一次去南半球，虽然看过很多次照片，亲临大洋路时仍觉震撼

8

Apple Watch Terminal 风格表盘

9

失业三个月，面试寥寥无几，朋友失业的也很多

10

开发了一个在线批量图片压缩网站

热门github

1

A multi-platform library for OpenGL, OpenGL ES, Vulkan, window and input

2

Dev tool that writes scalable apps from scratch while the developer oversees the implementation

3

shadcn/ui, but for Svelte. ✨

4

The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

5

Performance-portable, length-agnostic SIMD with runtime dispatch

6

ZK Credo

7

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

8

Joplin - the secure note taking and to-do app with synchronisation capabilities for Windows, macOS, Linux, Android and iOS.

9

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.

10

This repository contains System Design resources which are useful while preparing for interviews and learning Distributed Systems

11

Curso para aprender el lenguaje de programación Python desde cero y para principiantes. 75 clases, 37 horas en vídeo, código, proyectos y grupo de chat. Fundamentos, frontend, backend, testing, IA...

12

🎓 Path to a free self-taught education in Computer Science!

13

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

14

A collective list of free APIs

15

📚 Freely available programming books