正则表达式去除位信号噪声尖峰

Question

正则表达式去除位信号噪声尖峰

5

我正在处理有时存在噪声尖峰的射频信号。

输入数据如下：

00000001111100011110001111100001110000001000001111000000111001111000

在解析信号数据之前，我需要删除尖峰位，这些位是连续长度小于3（在本例中）的0和1序列。

所以基本上我需要匹配0000000111110001111000111110000111000000(1)000001111000000111(00)1111000。
匹配后，我将其替换为其前面的一个比特，因此干净的信号看起来像这样： 00000001111100011110001111100001110000000000001111000000111111111000

到目前为止，我使用两个不同的正则表达式实现了这一点:

self.re_one_spikes = re.compile("(?:[^1])(?P<spike>1{1,%d})(?=[^1])" % (self._SHORTEST_BIT_LEN - 1))
self.re_zero_spikes = re.compile("(?:[^0])(?P<spike>0{1,%d})(?=[^0])" % (self._SHORTEST_BIT_LEN - 1))

然后我遍历匹配项并替换。

我如何使用单个正则表达式实现这一点？并且我可以使用正则表达式替换不同大小的匹配项吗？
我尝试了以下代码，但没有成功：

re.compile("(?![\1])([01]{1,2})(?![\1])")

- joaoricardo000

2

所以，您基本上想要将任何孤立的单个或双重1或0替换为分别为0或1？ - tobias_k

另外，使用两个正则表达式有什么问题吗？如果您不喜欢代码重复（很有道理），您可以使用单个模板字符串，并将0和1替换为其中的内容。 - tobias_k

如果字符串是 00000110011001100111111，应该替换什么？ - tobias_k

1

@tobias_k ... 在我的代码中经过 3 次处理后，结果为 '00000000000111111111111' :P - Joran Beasley

1

在 @tobias_k 的例子中，信号可能已经被破坏了 :) - joaoricardo000

显示剩余3条评论

4个回答

1

为了在一个正则表达式中匹配两种情况[01]，只需要使用以下代码： (?<=([01]))(?:(?!\1)[01]){1,2}(?=\1) 扩展形式如下：

 (?<=                 # Lookbehind for 0 or 1
      ( [01] )             # (1), Capture behind 0 or 1
 )
 (?:                  # Match spike, one to %d times in length
      (?! \1 )             # Cannot be the 0 or 1 from lookbehind
      [01] 
 ){1,2}
 (?= \1 )             # Lookahead, can only be 0 or 1 from capture (1)

用匹配的长度（即第0组的长度）乘以$1替换。

匹配。

 **  Grp 0 -  ( pos 40 , len 1 ) 
1  
 **  Grp 1 -  ( pos 39 , len 1 ) 
0  

----------------------------------------

 **  Grp 0 -  ( pos 59 , len 2 ) 
00  
 **  Grp 1 -  ( pos 58 , len 1 ) 
1

基准测试

Regex1:   (?<=([01]))(?:(?!\1)[01]){1,2}(?=\1)
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   2
Elapsed Time:    2.06 s,   2058.02 ms,   2058018 µs


50,000 iterations * 2 matches/iteration = 100,000 matches 

100,000 matches / 2 sec's  =  50,000 matches per second

- user557597

0

另一种方法是不使用regex，而是使用replace()（以防将来有人发现它有用）：

>>> my_signal = '00000001111100011110001111100001110000001000001111000000111001111000'
>>> my_threshold = 3
>>> for i in range(my_threshold):
...     my_signal = my_signal.replace('0{}0'.format('1'*(i+1)), '0{}0'.format('0'*(i+1)))
... 
>>> my_signal
'00000001111100011110001111100000000000000000001111000000000001111000'

- Moinuddin Quadri

0

def fix_noise(s, noise_thold=3):
    pattern=re.compile(r'(?P<before>1|0)(?P<noise>(?<=0)1{1,%d}(?=0)|(?<=1)0{1,%d}(?=1))' % (noise_thold-1, noise_thold-1))
    result = s
    for noise_match in pattern.finditer(s):
        beginning = result[:noise_match.start()+1]
        end = result[noise_match.end():]
        replaced = noise_match.group('before')*len(noise_match.group('noise'))
        result = beginning + replaced + end
    return result

Jordan的int(items[0])索引想法真是太棒了！

- Verbal_Kint

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Joran Beasley · Accepted Answer

import re
THRESHOLD=3

def fixer(match):
    ones = match.group(0)
    if len(ones) < THRESHOLD: return "0"*len(ones)
    return ones

my_string = '00000001111100011110001111100001110000001000001111000000111001111000'
print(re.sub("(1+)",fixer,my_string))

如果您还想删除“零点峰值”

def fixer(match):
    items = match.group(0)
    if len(items) < THRESHOLD: return "10"[int(items[0])]*len(items)
    return items

print(re.sub("(1+)|(0+)",fixer,my_string))