温馨提示:本文翻译自stackoverflow.com，查看原文请点击：其他 - Ruby: Testing a ruby string for a substring fails (substring is not recognized)

ruby string character

其他 - Ruby：测试Ruby字符串的子字符串失败（无法识别子字符串）

发布于 2020-03-29 22:05:14

我正在尝试使用Ruby手动清除垃圾邮件，为什么下面的测试在返回false时准确返回true？经过测试的字符串是原始字符串，因此您可以从字面上将整个内容复制/粘贴到ruby控制台中以验证此示例：

irb(main):053:0> "Веautiful women fоr sеx in yоur town АU: https://links.wtf/qLFs".include? "sex"
=> false

提示：如果您通过自己输入在整个字符串中替换单词“ sex”，则测试将按true预期返回。因此，以某种方式，两个“性别”字符串并不相同，但是处于什么水平？如何正确测试？

编辑：

我将其范围缩小到了这个范围（复制/粘贴以进行测试！）：

irb(main):073:0> "е" == "e"
=> false

提问者

TomDogg

被浏览

185

查看英文版

查看原文

Patrick Taylor 2020-01-31 20:40

JavaScript的charCodeAt方法告诉我，这两个字符是不同的Unicode值。Ruby的.ord方法告诉我同样的事情。您可以在Ruby中更确切地检查那些Unicode值，但我建议您找到一种对数据进行规范化的方法，而不是为不寻常的字符添加无穷的条件。0x0435 1077 CYRILLIC SMALL LETTER IE根据我在网上找到的Unicode查找表，这似乎是е。

另外，这是一种方法，您可以禁止所有西里尔字母。我使用了各种各样的排除字符，因此您可以根据需要添加排除项。

#!/usr/bin/env ruby

CYRILLIC_UNICODE_DECIMALS = *(1024..1273).freeze

for arg in ARGV
  # next unless arg.is_a?(String)

  arg.split('').each do |char|
    p char if CYRILLIC_UNICODE_DECIMALS.include?(char.ord)
  end
end

作为参考，以下是我针对您的示例使用的.ord和.charCodeAt方法。我从JavaScript开始，因为它是浏览器控制台中的简单测试。

2.6.3 :005 > 'е'.ord
 => 1077
2.6.3 :006 > 'e'.ord
 => 101

'"е" == "e"'.charCodeAt(1)
1077
'"e" == "e"'.charCodeAt(1)
101

TomDogg 2020-02-02 03:30:03

此问题的最简单的方法将与扫描有问题的字符串/文本gem "unicode-scripts"在github.com/janlelis/unicode-scripts。然后，普通文本应返回一个最多包含以下2个元素的数组["Common", "Latin"]。如果它包含任何其他元素，例如in ["Common", "Cyrillic", "Latin"]，则很有可能字符串/文本被“混淆”为垃圾邮件。

相关问题

1

匹配查询格式不正确，查询名称后没有start_object“ Elasticsearch 7.1

2

具有动态index_name的Elasticsearch映射

3

在MacOS 11.0中无法使用“ pod安装”

4

如何优化包含相似键和值的映射哈希？

5

从URL访问参数

6

如何在paper_trail中跟踪自定义事件？

7

红宝石不等于运算符不起作用，但相等

8

在Big Sur上安装自制红宝石时，出现“ ld：-lSystem找不到库”

9

如何将多行代码复制到byebug中？

10

Ruby FastJsonAPI动态set_type？

热门github

1

real time face swap and one-click video deepfake with only a single image

2

A quick example of how one can "synchronize" a 3d scene across multiple windows using three.js and localStorage

3

ChatGPT DAN, Jailbreaks prompt

4

21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/ (翻译：12 节课程，开始使用生成式 AI 进行构建)

5

Curated list of project-based tutorials (翻译：收藏了基于项目的教程列表)

6

Truly independent web browser

7

Python - 100天从新手到大师

8

An open source payments switch written in Rust to make payments fast, reliable and affordable (翻译：YOLOv8 🚀 in PyTorch > ONNX > CoreML > TFLite)

9

Agent S: an open agentic framework that uses computers like a human

10

Master programming by recreating your favorite technologies from scratch. (翻译：在这个项目中，你能学会如何创造自己的各种工具，引擎，游戏，框架，库......)

11

Jelly Evolution Simulator

12

Collection of leaked system prompts

13

🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application. (翻译：LobeChat 是开源的高性能聊天机器人框架，支持语音合成、多模态、可扩展的（Function Call）插件系统。)