在 PHP 中如何使用正则表达式匹配 Telegram 用户名并删除整行

Question

在 PHP 中如何使用正则表达式匹配 Telegram 用户名并删除整行

phpregextelegram-botphp-telegram-bot

8

我想要在消息文本中匹配电报用户名并删除整行，我尝试了这个模式，但问题是它也匹配电子邮件：

.*(@(?=.{5,64}(?:\s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*

该模式应匹配所有这些行：

Hi @username how are you?

Hi @username.how are you?

@username.

而不应匹配此类电子邮件：

Hi email to something@domain.com

- Ali Raghebi

在@符号前面可能有一个以上的表情符号。 - Ali Raghebi

我曾经想过 .*[^a-zA-Z]@ ... 但那远非完美。然后我查了一下 http://emailregex.com/，并想... 或许那会有所帮助？你或许可以先得到匹配结果，然后再使用另一个正则表达式来检查“用户名”是否真的是用户名，还是一个电子邮件地址。 - Reed

这只涉及到表情符号吗？还是包括非单词字符？字符串中 '@' 可能出现多次吗？ - The fourth bird

3个回答

1

.*[\W](@(?=.{5,64}(?:\s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*

我在@符号前添加了[\W]非单词字符。你可以在这里查看结果 https://regex101.com/r/yFGegO/1

- Bugaloo

1

太阳底下并无新事物，但其他模式基本上可以归纳为：

.*?\B@\w{5}.*

demo

或者最终：

.*?\B\w{5,64}\b.*

如果你想更精确，但这真的有必要吗？

注意：如果你想删除换行符序列，可以在模式末尾添加\R?。

- Casimir et Hippolyte

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ryszard Czech · Accepted Answer

使用

.*\B@(?=\w{5,32}\b)[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*.*

请查看证明

\B在@之前表示@前必须有一个非单词字符或字符串的开头。

说明

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \B                       the boundary between two word chars (\w)
                           or two non-word chars (\W)
--------------------------------------------------------------------------------
  @                        '@'
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \w{5,32}                 word characters (a-z, A-Z, 0-9, _)
                             (between 5 and 32 times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  [a-zA-Z0-9]+             any character of: 'a' to 'z', 'A' to 'Z',
                           '0' to '9' (1 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    _                        '_'
--------------------------------------------------------------------------------
    [a-zA-Z0-9]+             any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9' (1 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))