Python正则表达式搜索十六进制字节

Question

Python正则表达式搜索十六进制字节

3

我正在尝试搜索一个二进制文件中一系列十六进制值，但是遇到了一些问题，无法解决。(1)我不确定如何搜索整个文件并返回所有匹配项。目前，我只有f.seek，只能找到可能存在值的地方，这样做没有用。(2)我想返回匹配项所在的偏移量，无论是十进制还是十六进制，但是每次都得到0，所以我不确定我做错了什么。 例子.bin

AA BB CC DD EE FF AB AC AD AE AF BA BB BC BD BE
BF CA CB CC CD CE CF DA DB DC DD DE DF EA EB EC

代码：

# coding: utf-8
import struct
import re

with open("example.bin", "rb") as f:
    f.seek(30)
    num, = struct.unpack(">H", f.read(2))
hexaPattern = re.compile(r'(0xebec)?')
m = re.search(hexaPattern, hex(num))
if m:
   print "found a match:", m.group(1)
   print " match offset:", m.start()

也许有更好的方法来完成所有这些吗？

- DIF

文件有多大？ - Urban48

文件的大小可能从100 KB到10 MB不等。 - DIF

2个回答

1

尝试

import re

with open("example.bin", "rb") as f:
    f1 = re.search(b'\xEB\xEC', f.read())

print "found a match:", f1 .group()
print " match offset:", f1 .start()

- Urban48

谢谢，几乎完美。有办法让f1.group()以十六进制形式显示吗？ - DIF

将 f1.group() 以十六进制形式打印出来：print(f1.group().hex())。 - qff

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- 7stud · Accepted Answer

我不确定如何搜索整个文件并返回所有匹配项。

我希望返回十进制或十六进制的偏移量。

import re

f = open('data.txt', 'wb')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.write('\xAA\xBB\xEB\xEC')
f.close()

f = open('data.txt', 'rb')
data = f.read()
f.close()

pattern = "\xEB\xEC"
regex = re.compile(pattern)

for match_obj in regex.finditer(data):
    offset = match_obj.start()
    print "decimal: {}".format(offset)
    print "hex(): " + hex(offset)
    print 'formatted hex: {:02X} \n'.format(offset)

--output:--
decimal: 2
hex(): 0x2
formatted hex: 02 

decimal: 6
hex(): 0x6
formatted hex: 06 

decimal: 10
hex(): 0xa
formatted hex: 0A 

decimal: 14
hex(): 0xe
formatted hex: 0E 

decimal: 18
hex(): 0x12
formatted hex: 12 

decimal: 22
hex(): 0x16
formatted hex: 16 

decimal: 26
hex(): 0x1a
formatted hex: 1A

文件中的位置使用基于0的索引，就像列表一样。

e.finditer(pattern, string, flags=0)
返回一个迭代器，该迭代器在字符串中以非重叠方式产生RE模式的所有匹配项的MatchObject实例。字符串从左到右扫描，并按找到的顺序返回匹配项。

Match对象支持以下方法和属性：
start([group])
end([group])
返回与组匹配的子字符串的开始和结束索引；group默认为零（表示整个匹配的子字符串）。

https://docs.python.org/2/library/re.html