如何使用模式匹配查找重复字符串?

8

我有一个类似于这样的字符串:

[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> |Hunit:Player-3693-07420299:DevnullYour [Chimaera Shot] hit |Hunit:Creature-0-3693-1116-3-87318-0000881AC4:Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature. 

如果你好奇,这是来自《魔兽世界》。

我想以这样的方式结束:

[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit Dungeoneer's Training Dummy 33265 Nature. 

如果你注意到,“地下城训练假人”被打印了两次。我已经成功用类似以下代码的方式去掉了第一个“|Hunit”部分:
str = "[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> |Hunit:Player-3693-07420299:DevnullYour [Chimaera Shot] hit |Hunit:Creature-0-3693-1116-3-87318-0000881AC4:Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature."
str = string.gsub(str, "|Hunit:.*:.*Your", "Your")

这将返回:

print(str)    # => [13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit |Hunit:Creature-0-3693-1116-3-87318-0000881AC4:Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature.

I then add a second gsub:

str = string.gsub(str, "|Hunit:.*:", "")
print(str) # => [13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature.

但是双重的“Dungeoneer's Training Dummy”字符串被重复了,显然。

我该如何消除重复的字符串?这个字符串可以是任何其他内容,在这种情况下是“Dungeoneer's Training Dummy”,但它也可以是任何其他目标的名称。

1个回答

5
你可以尝试这样做:
str = "[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature."
-- find a string that starts with 'hit', has some number of non-digits
-- and ends with one or more digit and one or more characters.
-- these characters will be "captured" into three strings,
-- which are then passed to the "replacement" function.
-- the returned result of the function replaces the value in the string.
str = str:gsub("(hit%s+)([^%d]+)(%d+.+)", function(s1, s2, s3)
    local s = s2:gsub("%s+$","") -- drop trailing spaces
    if #s % 2 == 0 -- has an even number of characters
    and s:sub(0, #s / 2) -- first half
    == -- is the same
    s:sub(#s / 2 + 1) -- as the second half
    then -- return the second half
      return s1..s:sub(#s / 2 + 1)..' '..s3
    else
      return s1..s2..s3
    end
  end)
print(str)

这会打印出:[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] 命中了 Dungeoneer's Training Dummy

该代码将尝试提取目标的名称并检查该名称是否完全重复。如果匹配失败,则返回原始字符串。


虽然如此,我仍需要结尾处的“33265 Nature。”你介意解释一下你使用的函数发生了什么吗?如果不麻烦的话。 - dev404
在移除33265 Nature之后,该函数会检查当前字符串是否可以分成两半,并检查这两个部分是否相同。我会添加更多注释... - Paul Kulchenko
更新解决方案以保留“33265 Nature”在其中。 - Paul Kulchenko
哦,我明白了。偶数个字符是一个重复的标志。聪明。非常感谢! - dev404
没错;我认为如果有人不熟悉,[^]会是一个有用的东西,但%D绝对更简短。 - Paul Kulchenko

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接