在二进制文件中替换字节序列

16

如何用最佳方法将二进制文件中的一系列字节替换为相同长度的其他字节?这些二进制文件非常大,约有50 MB,并且不应一次性加载到内存中。

更新:我不知道需要替换的字节的位置,我需要先找到它们。


以写模式打开文件,将指针移动到旧字节的位置,写入新字节。 - Marc B
你如何知道要修改的字节的确切位置?它是一个固定偏移量吗? - Dirk Vollmar
4个回答

24

假设你是在尝试替换文件中已知的部分

  • 打开一个允许读写访问的FileStream
  • 定位到正确的位置
  • 覆盖现有数据

稍后会提供示例代码...

public static void ReplaceData(string filename, int position, byte[] data)
{
    using (Stream stream = File.Open(filename, FileMode.Open))
    {
        stream.Position = position;
        stream.Write(data, 0, data.Length);
    }
}
如果你想要对一个二进制数据进行类似于 string.Replace 的替换操作(例如,“总是将字节 { 51,20,34 } 替换为 { 20,35,15 }”),这就比较困难了。下面是一个简单的描述:
  • 分配缓冲区至少与所需数据相同大小
  • 重复读取数据并扫描目标数据是否存在
  • 如果找到匹配项,回退到正确位置(例如,stream.Position -= buffer.Length - indexWithinBuffer;),然后覆盖数据
听起来很简单...但棘手的是,如果目标数据在缓冲区的末尾附近,你需要记住所有可能的匹配,并记录到目前为止已经匹配了多远,这样,如果在读取下一个缓冲区时又得到一个匹配,你就可以检测到它。
可能有避免这种棘手情况的方法,但我不想试图即兴想出它们 :)
编辑:好吧,我有一个想法可能会有所帮助...
  • 保持缓冲区至少两倍大
  • 重复以下步骤:
    • 将缓冲区的第二半部分复制到第一半部分
    • 从文件中填充第二半部分的缓冲区
    • 在整个缓冲区中搜索你要查找的数据
这样,如果数据存在,它就会完全在缓冲区内的某个位置。需要注意流的位置以便回到正确的位置,但我认为这应该有效。如果你尝试查找所有匹配项,那么就会更棘手,但至少找到第一个匹配项应该相对简单...

@Tomas:好的,我大致描述了你应该怎么做。 - Jon Skeet
@Tomas:就像我所说的,通过记住可能的匹配项,即您已检测到块的结尾包含您要查找的数据的开头 - Jon Skeet
@Tomas:请看我的下一个编辑——我想出了一个方案,实现起来不应该太难。 - Jon Skeet
@Jon 谢谢,我明白了,会测试并标记问题已解决。 - Tomas
1
再次感谢您的想法。我已经构建了这种方法,您可以在http://stackoverflow.com/questions/6536400/replace-sequence-of-strings-in-binary-file找到它。 - Tomas
显示剩余5条评论

6
我的解决方案:
    /// <summary>
    /// Copy data from a file to an other, replacing search term, ignoring case.
    /// </summary>
    /// <param name="originalFile"></param>
    /// <param name="outputFile"></param>
    /// <param name="searchTerm"></param>
    /// <param name="replaceTerm"></param>
    private static void ReplaceTextInBinaryFile(string originalFile, string outputFile, string searchTerm, string replaceTerm)
    {
        byte b;
        //UpperCase bytes to search
        byte[] searchBytes = Encoding.UTF8.GetBytes(searchTerm.ToUpper());
        //LowerCase bytes to search
        byte[] searchBytesLower = Encoding.UTF8.GetBytes(searchTerm.ToLower());
        //Temporary bytes during found loop
        byte[] bytesToAdd = new byte[searchBytes.Length];
        //Search length
        int searchBytesLength = searchBytes.Length;
        //First Upper char
        byte searchByte0 = searchBytes[0];
        //First Lower char
        byte searchByte0Lower = searchBytesLower[0];
        //Replace with bytes
        byte[] replaceBytes = Encoding.UTF8.GetBytes(replaceTerm);
        int counter = 0;
        using (FileStream inputStream = File.OpenRead(originalFile)) {
            //input length
            long srcLength = inputStream.Length;
            using (BinaryReader inputReader = new BinaryReader(inputStream)) {
                using (FileStream outputStream = File.OpenWrite(outputFile)) {
                    using (BinaryWriter outputWriter = new BinaryWriter(outputStream)) {
                        for (int nSrc = 0; nSrc < srcLength; ++nSrc)
                            //first byte
                            if ((b = inputReader.ReadByte()) == searchByte0
                                || b == searchByte0Lower) {
                                bytesToAdd[0] = b;
                                int nSearch = 1;
                                //next bytes
                                for (; nSearch < searchBytesLength; ++nSearch)
                                    //get byte, save it and test
                                    if ((b = bytesToAdd[nSearch] = inputReader.ReadByte()) != searchBytes[nSearch]
                                        && b != searchBytesLower[nSearch]) {
                                        break;//fail
                                    }
                                    //Avoid overflow. No need, in my case, because no chance to see searchTerm at the end.
                                    //else if (nSrc + nSearch >= srcLength)
                                    //    break;

                                if (nSearch == searchBytesLength) {
                                    //success
                                    ++counter;
                                    outputWriter.Write(replaceBytes);
                                    nSrc += nSearch - 1;
                                }
                                else {
                                    //failed, add saved bytes
                                    outputWriter.Write(bytesToAdd, 0, nSearch + 1);
                                    nSrc += nSearch;
                                }
                            }
                            else
                                outputWriter.Write(b);
                    }
                }
            }
        }
        Console.WriteLine("ReplaceTextInBinaryFile.counter = " + counter);
    }

对我来说工作得很好。 - quilkin

5
你可以使用我的BinaryUtility在不像这样将整个文件加载到内存中的情况下搜索和替换一个或多个字节:
var searchAndReplace = new List<Tuple<byte[], byte[]>>() 
{
    Tuple.Create(
        BitConverter.GetBytes((UInt32)0xDEADBEEF),
        BitConverter.GetBytes((UInt32)0x01234567)),
    Tuple.Create(
        BitConverter.GetBytes((UInt32)0xAABBCCDD),
        BitConverter.GetBytes((UInt16)0xAFFE)),
};
using(var reader =
    new BinaryReader(new FileStream(@"C:\temp\data.bin", FileMode.Open)))
{
    using(var writer =
        new BinaryWriter(new FileStream(@"C:\temp\result.bin", FileMode.Create)))
    {
        BinaryUtility.Replace(reader, writer, searchAndReplace);
    }
}

BinaryUtility代码:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;

public static class BinaryUtility
{
    public static IEnumerable<byte> GetByteStream(BinaryReader reader)
    {
        const int bufferSize = 1024;
        byte[] buffer;
        do
        {
            buffer = reader.ReadBytes(bufferSize);
            foreach (var d in buffer) { yield return d; }
        } while (bufferSize == buffer.Length);
    }

    public static void Replace(BinaryReader reader, BinaryWriter writer, IEnumerable<Tuple<byte[], byte[]>> searchAndReplace)
    {
        foreach (byte d in Replace(GetByteStream(reader), searchAndReplace)) { writer.Write(d); }
    }

    public static IEnumerable<byte> Replace(IEnumerable<byte> source, IEnumerable<Tuple<byte[], byte[]>> searchAndReplace)
    {
        foreach (var s in searchAndReplace)
        {
            source = Replace(source, s.Item1, s.Item2);
        }
        return source;
    }

    public static IEnumerable<byte> Replace(IEnumerable<byte> input, IEnumerable<byte> from, IEnumerable<byte> to)
    {
        var fromEnumerator = from.GetEnumerator();
        fromEnumerator.MoveNext();
        int match = 0;
        foreach (var data in input)
        {
            if (data == fromEnumerator.Current)
            {
                match++;
                if (fromEnumerator.MoveNext()) { continue; }
                foreach (byte d in to) { yield return d; }
                match = 0;
                fromEnumerator.Reset();
                fromEnumerator.MoveNext();
                continue;
            }
            if (0 != match)
            {
                foreach (byte d in from.Take(match)) { yield return d; }
                match = 0;
                fromEnumerator.Reset();
                fromEnumerator.MoveNext();
            }
            yield return data;
        }
        if (0 != match)
        {
            foreach (byte d in from.Take(match)) { yield return d; }
        }
    }
}

2
    public static void BinaryReplace(string sourceFile, byte[] sourceSeq, string targetFile, byte[] targetSeq)
    {
        FileStream sourceStream = File.OpenRead(sourceFile);
        FileStream targetStream = File.Create(targetFile);

        try
        {
            int b;
            long foundSeqOffset = -1;
            int searchByteCursor = 0;

            while ((b=sourceStream.ReadByte()) != -1)
            {
                if (sourceSeq[searchByteCursor] == b)
                {
                    if (searchByteCursor == sourceSeq.Length - 1)
                    {
                        targetStream.Write(targetSeq, 0, targetSeq.Length);
                        searchByteCursor = 0;
                        foundSeqOffset = -1;
                    }
                    else 
                    {
                        if (searchByteCursor == 0)
                        {
                            foundSeqOffset = sourceStream.Position - 1;
                        }

                        ++searchByteCursor;
                    }
                }
                else
                {
                    if (searchByteCursor == 0)
                    {
                        targetStream.WriteByte((byte) b);
                    }
                    else
                    {
                        targetStream.WriteByte(sourceSeq[0]);
                        sourceStream.Position = foundSeqOffset + 1;
                        searchByteCursor = 0;
                        foundSeqOffset = -1;
                    }
                }
            }
        }
        finally
        {
            sourceStream.Dispose();
            targetStream.Dispose();
        }
    }

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接