Python下类似于Unix中“strings”实用程序的相应工具

Question

Python下类似于Unix中“strings”实用程序的相应工具

17

我正在尝试编写一个脚本，它可以从可执行二进制文件中提取字符串并将其保存在文件中。由于字符串本身可能有换行符，所以不能使用以换行符分隔的文件选项。但这也意味着，使用Unix的“strings”实用程序也不是一个选项，因为它只会按照换行符打印所有字符串，无法通过查看“字符串”的输出来确定哪些字符串包含换行符。因此，我希望找到一个实现“strings”相同功能的Python函数或库，但它会将这些字符串作为变量给出，这样我就可以避免使用换行符。谢谢！

- joshlf

可能是 https://dev59.com/kWbWa4cB1Zd3GeqPWWU9 的重复问题。 - Fredrik Pihl

3

@FredrikPihl，这是关于二进制和文本表示之间的转换。而这则消息是有关从二进制可执行文件中提取字符串。术语被混淆使用，但这是两个不同的问题。感谢您的提醒，如果这是一个重复的问题，那就太好了。 - joshlf

你说得对，这是我今天误解的第三个问题；需要睡一会儿 :-) - Fredrik Pihl

4个回答

6

引用 man strings 的话：

STRINGS(1)                   GNU开发工具                  STRINGS(1)
名称
       strings - 打印文件中可打印字符的字符串。
[...]
描述
       对于每个给定的文件，GNU strings 打印长度至少为 4 个字符（或使用下面选项给出的数字）且后跟一个不可打印字符的可打印字符序列。默认情况下，它仅打印目标文件的初始化和加载部分中的字符串；对于其他类型的文件，它会打印整个文件中的字符串。

你可以通过使用至少匹配 4 个可打印字符的正则表达式来实现类似的结果。例如：

>>> import re

>>> content = "hello,\x02World\x88!"
>>> re.findall("[^\x00-\x1F\x7F-\xFF]{4,}", content)
['hello,', 'World']

请注意，此解决方案需要将整个文件内容加载到内存中。

- Sylvain Leroux

2

您也可以使用 [ -~]{4,} 获得相同的结果。 - user1438233

0

使用 strings 命令可以通过 --output-separator 更改输出分隔符，因此您可以使用自定义字符串（一个您不希望在二进制文件中找到的字符串）代替新行字符，并且可以使用 --include-all-whitepaces 来包括换行符：

$ strings --include-all-whitespace --output-separator="YOURSEPARATOR" test.bin

- Pablo Rincon Crespo

-1

你也可以直接调用strings，例如像这样：

def strings(bytestring: bytes, min: int = 10) -> str:
    cmd = "strings -n {}".format(min)
    process = subprocess.Popen(
        cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE)
    process.stdin.write(bytestring)
    output = process.communicate()[0]
    return output.decode("ascii")

- TheCharlatan

1

这并没有真正解决换行符的问题。 - joshlf

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Zero Piraeus · Accepted Answer

这里是一个生成器，它会在filename中寻找长度>=min（默认为4）的所有可打印字符的字符串，并将其作为输出：

这里是一个能够产生所有长度 >= min（默认为4）的可打印字符字符串的生成器，在 filename 中进行查找并返回：

import string

def strings(filename, min=4):
    with open(filename, errors="ignore") as f:  # Python 3.x
    # with open(filename, "rb") as f:           # Python 2.x
        result = ""
        for c in f.read():
            if c in string.printable:
                result += c
                continue
            if len(result) >= min:
                yield result
            result = ""
        if len(result) >= min:  # catch result at EOF
            yield result

你可以遍历它：

for s in strings("something.bin"):
    # do something with s

...或者存储在列表中：

sl = list(strings("something.bin"))

我进行了简短的测试，似乎对于我选择的任意二进制文件，它给出了与Unix strings命令相同的输出。然而，这个方法很幼稚（首先，它一次性将整个文件读入内存，对于大文件可能会很昂贵），并且很难达到类Unix strings命令的性能。