将byte[]数组的一部分复制到PDFReader中

Question

将byte[]数组的一部分复制到PDFReader中

4

这是继续努力减少我在如何使用SqlDataReader重新填充字节数组?中提到的内存负载的文章。

所以我有一个固定大小的字节数组，例如new byte[400000]。我将在这个数组中放置不同大小（小于400000）的pdf文件。

伪代码如下：

public void Run()
{
    byte[] fileRetrievedFromDatabase = new byte[400000];
    foreach (var document in documentArray)
    {
        // Refill the file with data from the database
        var currentDocumentSize = PopulateFileWithPDFDataFromDatabase(fileRetrievedFromDatabase);

        var reader = new iTextSharp.text.pdf.PdfReader(fileRetrievedFromDatabase.Take((int)currentDocumentSize ).ToArray());
        pageCount = reader.NumberOfPages;
        // DO ADDITIONAL WORK
    } 
}

private int PopulateFileWithPDFDataFromDatabase(byte[] fileRetrievedFromDatabase)
{
    // DataAccessCode Goes here
    int documentSize = 0;
    int bufferSize = 100;                   // Size of the BLOB buffer.
    byte[] outbyte = new byte[bufferSize];  // The BLOB byte[] buffer to be filled by GetBytes.

    myReader = logoCMD.ExecuteReader(CommandBehavior.SequentialAccess);

    Array.Clear(fileRetrievedFromDatabase, 0, fileRetrievedFromDatabase.Length);

    if (myReader == null)
    {
        return;
    }

    while (myReader.Read())
    {
        documentSize = myReader.GetBytes(0, 0, null, 0, 0);

        // Reset the starting byte for the new BLOB.
        startIndex = 0;

        // Read the bytes into outbyte[] and retain the number of bytes returned.
        retval = myReader.GetBytes(0, startIndex, outbyte, 0, bufferSize);

        // Continue reading and writing while there are bytes beyond the size of the buffer.
        while (retval == bufferSize)
        {
            Array.Copy(outbyte, 0, fileRetrievedFromDatabase, startIndex, retval);

            // Reposition the start index to the end of the last buffer and fill the buffer.
            startIndex += retval;
            retval = myReader.GetBytes(0, startIndex, outbyte, 0, bufferSize);
        }
    }

    return documentSize;
}

上述代码的问题在于，当我尝试访问PDF阅读器时，总是会出现“找不到重建尾部。原始错误：找不到PDF startxref”的错误。我认为这是因为字节数组太长并且有尾随0。但由于我正在使用字节数组，以便我不需要在LOH上不断构建新对象，所以我需要这样做。

那么我该如何获取我需要的数组片段并将其发送到PDFReader呢？

更新：

所以我看了一下源代码，意识到我从我的实际代码中得到了一些混淆的变量。在循环的每次迭代中，我基本上是在重用从数据库中检索到的文件对象。由于它是通过引用传递的，它会被清除（设置为全零），然后填充PopulateFileWithPDFDataFromDatabase。然后使用此对象创建新的PDF。

如果我没有这样做，就会在每次迭代中创建一个新的大型字节数组，而大对象堆栈会变满，并最终抛出OutOfMemory异常。

- Cyfer13

2个回答

1

显然，目前while循环的结构方式，在最后一次迭代时没有复制数据。需要添加以下内容：

if (outbyte != null && outbyte.Length > 0 && retval > 0)
{
    Array.Copy(outbyte, 0, currentDocument.Data, startIndex, retval);
}

现在它可以工作，但我肯定需要进行重构。

- Cyfer13

而且好处是，如果字节末尾有零，PDFReader会忽略它们。 - Cyfer13

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Kiril · Accepted Answer

你至少有两个选择：

将缓冲区视为具有起始和结束位置的循环缓冲区，需要一个在outByte中写入的最后一个字节的索引，并且当达到该索引时必须停止读取。
只需读取与data数组中相同数量的字节，以避免读取不属于同一文件的“未知”部分的缓冲区。

换句话说，不要将bufferSize作为最后一个参数，而是使用data.Length。

// Read the bytes into outbyte[] and retain the number of bytes returned.
retval = myReader.GetBytes(0, startIndex, outbyte, 0, data.Length);

如果data的长度为10，而您的outbyte缓冲区为15，则应仅读取data.Length而不是bufferSize。

然而，我仍然不明白您如何重复使用outbyte "缓冲区"，如果这就是您所做的...根据您在答案中提供的信息，我无法理解。也许您可以澄清正在被重复使用的内容。