如何在ruby中匹配所有语言的字符，但不包括特殊字符

Question

3

我有一个显示名称字段，需要使用Ruby正则表达式进行验证。我们必须匹配所有语言字符，如法语、阿拉伯语、中文、德语、西班牙语以及除了特殊字符如*()!@#$%^&之外的英语字符。我不知道如何匹配那些非拉丁字符。

- dexterdeng

也许如果您提供一些字符串，展示样例输入和输出，人们就能够更好地帮助您。您的问题非常不清楚。 - Geo

你的意思是所有字母表中的所有字母吗？ - BoltClock

@Bolt 如果你有正确的工具，那是完全可行的。 - NullUserException

3个回答

1

从 Ruby 1.9 开始，String 和 Regex 类别具有 Unicode 知识。您可以安全地使用 Regex 单词字符选择器 \w

"可口可樂!?!".gsub /\w/, 'Ha'
#=> "HaHaHaHa!?!"

- edgerunner

因为\w被定义为[0-9A-Za-z_]。 - Michael Kohl

是的，我知道这一点，但它背后的原因让我无法理解。（顺便说一句，\w 在 Unicode 中匹配的不仅仅是那些字符） - edgerunner

不幸的是，似乎较新版本的Ruby（尝试使用2.2.2）不再将例如可口可乐 äåö 作为单词字符（\w）匹配了 :( - Trond Hatlen

1

我猜它也匹配 '_'，因为在编程语言中，“my_variable”被视为一个单词。 - nitsas

1

在 Ruby 1.9.1 及以上版本（可能更早），可以使用 \p{L} 匹配所有语言中的单词字符（不需要像之前的答案中所描述的那样使用 Oniguruma gem）。

- Trond Hatlen

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- NullUserException · Accepted Answer

有两种可能性：

Create a regex with a negated character class containing every symbol you don't want to match:
```
if ( name ~= /[^*!@%\^]/ ) # add everything and if this matches you are good
```
This solution may not be feasible, since there is a massive amount of symbols you'd have to insert, even if you were just to include the most common ones.

Use Oniguruma (see also: Oniguruma for Ruby main). This supports Unicode and their properties; in which case all letters can be matched using:
```
if ( name ~= /[\pL\pM]/ )
```
You can see what these are all about here: Unicode Regular Expressions