如何使用C#中的迭代器反向读取文本文件

102

我需要处理一个大文件,大约有400K行和200M大小。但有时我需要从底部向上处理。如何在这里使用迭代器(yield return)?基本上,我不想把所有内容都加载到内存中,我知道在.NET中使用迭代器更有效率。


一个可能的解决方案是从末尾读取足够大的数据,然后使用String.LastIndexOf向后搜索"\r\n"。 - Sam Hobbs
请查看我在重复问题中的评论:https://dev59.com/YnRC5IYBdhLWcg3wJNgN#33907602 - Xan-Kun Clark-Davis
11个回答

153

如果你使用固定大小的编码(例如ASCII),阅读文本文件的倒叙并不难。但是,如果你使用可变大小的编码(如UTF-8),每次获取数据时都要检查自己是否在字符中间,这将会很棘手。

框架中没有内置此功能,我怀疑你需要为每种可变宽度编码单独进行硬编码。

编辑:这已经被“某种程度上”测试过了-但这并不意味着它没有一些微妙的错误。它使用了MiscUtil中的StreamUtil,但我在底部只包含了必要的(新)方法。哦,还需要重构-你会看到一个相当沉重的方法:

using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text;

namespace MiscUtil.IO
{
    /// <summary>
    /// Takes an encoding (defaulting to UTF-8) and a function which produces a seekable stream
    /// (or a filename for convenience) and yields lines from the end of the stream backwards.
    /// Only single byte encodings, and UTF-8 and Unicode, are supported. The stream
    /// returned by the function must be seekable.
    /// </summary>
    public sealed class ReverseLineReader : IEnumerable<string>
    {
        /// <summary>
        /// Buffer size to use by default. Classes with internal access can specify
        /// a different buffer size - this is useful for testing.
        /// </summary>
        private const int DefaultBufferSize = 4096;

        /// <summary>
        /// Means of creating a Stream to read from.
        /// </summary>
        private readonly Func<Stream> streamSource;

        /// <summary>
        /// Encoding to use when converting bytes to text
        /// </summary>
        private readonly Encoding encoding;

        /// <summary>
        /// Size of buffer (in bytes) to read each time we read from the
        /// stream. This must be at least as big as the maximum number of
        /// bytes for a single character.
        /// </summary>
        private readonly int bufferSize;

        /// <summary>
        /// Function which, when given a position within a file and a byte, states whether
        /// or not the byte represents the start of a character.
        /// </summary>
        private Func<long,byte,bool> characterStartDetector;

        /// <summary>
        /// Creates a LineReader from a stream source. The delegate is only
        /// called when the enumerator is fetched. UTF-8 is used to decode
        /// the stream into text.
        /// </summary>
        /// <param name="streamSource">Data source</param>
        public ReverseLineReader(Func<Stream> streamSource)
            : this(streamSource, Encoding.UTF8)
        {
        }

        /// <summary>
        /// Creates a LineReader from a filename. The file is only opened
        /// (or even checked for existence) when the enumerator is fetched.
        /// UTF8 is used to decode the file into text.
        /// </summary>
        /// <param name="filename">File to read from</param>
        public ReverseLineReader(string filename)
            : this(filename, Encoding.UTF8)
        {
        }

        /// <summary>
        /// Creates a LineReader from a filename. The file is only opened
        /// (or even checked for existence) when the enumerator is fetched.
        /// </summary>
        /// <param name="filename">File to read from</param>
        /// <param name="encoding">Encoding to use to decode the file into text</param>
        public ReverseLineReader(string filename, Encoding encoding)
            : this(() => File.OpenRead(filename), encoding)
        {
        }

        /// <summary>
        /// Creates a LineReader from a stream source. The delegate is only
        /// called when the enumerator is fetched.
        /// </summary>
        /// <param name="streamSource">Data source</param>
        /// <param name="encoding">Encoding to use to decode the stream into text</param>
        public ReverseLineReader(Func<Stream> streamSource, Encoding encoding)
            : this(streamSource, encoding, DefaultBufferSize)
        {
        }

        internal ReverseLineReader(Func<Stream> streamSource, Encoding encoding, int bufferSize)
        {
            this.streamSource = streamSource;
            this.encoding = encoding;
            this.bufferSize = bufferSize;
            if (encoding.IsSingleByte)
            {
                // For a single byte encoding, every byte is the start (and end) of a character
                characterStartDetector = (pos, data) => true;
            }
            else if (encoding is UnicodeEncoding)
            {
                // For UTF-16, even-numbered positions are the start of a character.
                // TODO: This assumes no surrogate pairs. More work required
                // to handle that.
                characterStartDetector = (pos, data) => (pos & 1) == 0;
            }
            else if (encoding is UTF8Encoding)
            {
                // For UTF-8, bytes with the top bit clear or the second bit set are the start of a character
                // See http://www.cl.cam.ac.uk/~mgk25/unicode.html
                characterStartDetector = (pos, data) => (data & 0x80) == 0 || (data & 0x40) != 0;
            }
            else
            {
                throw new ArgumentException("Only single byte, UTF-8 and Unicode encodings are permitted");
            }
        }

        /// <summary>
        /// Returns the enumerator reading strings backwards. If this method discovers that
        /// the returned stream is either unreadable or unseekable, a NotSupportedException is thrown.
        /// </summary>
        public IEnumerator<string> GetEnumerator()
        {
            Stream stream = streamSource();
            if (!stream.CanSeek)
            {
                stream.Dispose();
                throw new NotSupportedException("Unable to seek within stream");
            }
            if (!stream.CanRead)
            {
                stream.Dispose();
                throw new NotSupportedException("Unable to read within stream");
            }
            return GetEnumeratorImpl(stream);
        }

        private IEnumerator<string> GetEnumeratorImpl(Stream stream)
        {
            try
            {
                long position = stream.Length;

                if (encoding is UnicodeEncoding && (position & 1) != 0)
                {
                    throw new InvalidDataException("UTF-16 encoding provided, but stream has odd length.");
                }

                // Allow up to two bytes for data from the start of the previous
                // read which didn't quite make it as full characters
                byte[] buffer = new byte[bufferSize + 2];
                char[] charBuffer = new char[encoding.GetMaxCharCount(buffer.Length)];
                int leftOverData = 0;
                String previousEnd = null;
                // TextReader doesn't return an empty string if there's line break at the end
                // of the data. Therefore we don't return an empty string if it's our *first*
                // return.
                bool firstYield = true;

                // A line-feed at the start of the previous buffer means we need to swallow
                // the carriage-return at the end of this buffer - hence this needs declaring
                // way up here!
                bool swallowCarriageReturn = false;

                while (position > 0)
                {
                    int bytesToRead = Math.Min(position > int.MaxValue ? bufferSize : (int)position, bufferSize);

                    position -= bytesToRead;
                    stream.Position = position;
                    StreamUtil.ReadExactly(stream, buffer, bytesToRead);
                    // If we haven't read a full buffer, but we had bytes left
                    // over from before, copy them to the end of the buffer
                    if (leftOverData > 0 && bytesToRead != bufferSize)
                    {
                        // Buffer.BlockCopy doesn't document its behaviour with respect
                        // to overlapping data: we *might* just have read 7 bytes instead of
                        // 8, and have two bytes to copy...
                        Array.Copy(buffer, bufferSize, buffer, bytesToRead, leftOverData);
                    }
                    // We've now *effectively* read this much data.
                    bytesToRead += leftOverData;

                    int firstCharPosition = 0;
                    while (!characterStartDetector(position + firstCharPosition, buffer[firstCharPosition]))
                    {
                        firstCharPosition++;
                        // Bad UTF-8 sequences could trigger this. For UTF-8 we should always
                        // see a valid character start in every 3 bytes, and if this is the start of the file
                        // so we've done a short read, we should have the character start
                        // somewhere in the usable buffer.
                        if (firstCharPosition == 3 || firstCharPosition == bytesToRead)
                        {
                            throw new InvalidDataException("Invalid UTF-8 data");
                        }
                    }
                    leftOverData = firstCharPosition;

                    int charsRead = encoding.GetChars(buffer, firstCharPosition, bytesToRead - firstCharPosition, charBuffer, 0);
                    int endExclusive = charsRead;

                    for (int i = charsRead - 1; i >= 0; i--)
                    {
                        char lookingAt = charBuffer[i];
                        if (swallowCarriageReturn)
                        {
                            swallowCarriageReturn = false;
                            if (lookingAt == '\r')
                            {
                                endExclusive--;
                                continue;
                            }
                        }
                        // Anything non-line-breaking, just keep looking backwards
                        if (lookingAt != '\n' && lookingAt != '\r')
                        {
                            continue;
                        }
                        // End of CRLF? Swallow the preceding CR
                        if (lookingAt == '\n')
                        {
                            swallowCarriageReturn = true;
                        }
                        int start = i + 1;
                        string bufferContents = new string(charBuffer, start, endExclusive - start);
                        endExclusive = i;
                        string stringToYield = previousEnd == null ? bufferContents : bufferContents + previousEnd;
                        if (!firstYield || stringToYield.Length != 0)
                        {
                            yield return stringToYield;
                        }
                        firstYield = false;
                        previousEnd = null;
                    }

                    previousEnd = endExclusive == 0 ? null : (new string(charBuffer, 0, endExclusive) + previousEnd);

                    // If we didn't decode the start of the array, put it at the end for next time
                    if (leftOverData != 0)
                    {
                        Buffer.BlockCopy(buffer, 0, buffer, bufferSize, leftOverData);
                    }
                }
                if (leftOverData != 0)
                {
                    // At the start of the final buffer, we had the end of another character.
                    throw new InvalidDataException("Invalid UTF-8 data at start of stream");
                }
                if (firstYield && string.IsNullOrEmpty(previousEnd))
                {
                    yield break;
                }
                yield return previousEnd ?? "";
            }
            finally
            {
                stream.Dispose();
            }
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }
}


// StreamUtil.cs:
public static class StreamUtil
{
    public static void ReadExactly(Stream input, byte[] buffer, int bytesToRead)
    {
        int index = 0;
        while (index < bytesToRead)
        {
            int read = input.Read(buffer, index, bytesToRead - index);
            if (read == 0)
            {
                throw new EndOfStreamException
                    (String.Format("End of stream reached with {0} byte{1} left to read.",
                                   bytesToRead - index,
                                   bytesToRead - index == 1 ? "s" : ""));
            }
            index += read;
        }
    }
}

非常欢迎反馈。这很有趣 :)


6
哇!我知道这段代码已经超过三年了,但是它太棒了!谢谢!(附言:我只是用File.Open(filename,FileMode.Open,FileAccess.Read,FileShare.ReadWrite)替换了File.OpenRead(filename),以便迭代器可以读取已经打开的文件)。 - Stefano
1
@GrimaceofDespair:更多的是“因为那样我就必须为继承进行设计,这将在设计时间和未来的灵活性方面增加非常显著的成本”。通常甚至不清楚如何合理地使用继承 - 在找到这种明确性之前最好禁止它,以我的看法。 - Jon Skeet
3
虽然我喜欢将问题分解为正交方面。一旦你弄清楚了如何在你的情况下打开文件,我期望我的代码对你来说只需工作。 - Jon Skeet
1
据我所知,这两者是相同的。这是BOM(字节顺序标记)字符(不是字节),是的,我目前没有明确尝试删除它。对我来说,什么时候IO例程应该删除它从来不清楚 - 毕竟它作为一个字符存在于文件中。我建议你现在要么确保你的文件不以BOM开头,要么自己替换它(line = line.Replace("\ufeff", ""))。 - Jon Skeet
1
如果任何人想要能够与另一个进程共享文件,例如当您想要读取父进程正在写入的日志文件时,只需将以下代码:File.OpenRead(filename)替换为:new FileStream(filename, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite) - mostlydev
显示剩余32条评论

8

注意:这种方法不起作用(在编辑部分有解释)

您可以使用File.ReadLines获取行迭代器

foreach (var line in File.ReadLines(@"C:\temp\ReverseRead.txt").Reverse())
{
    if (noNeedToReadFurther)
        break;

    // process line here
    Console.WriteLine(line);
}

编辑:

阅读applejacks01的评论后,我进行了一些测试,看起来.Reverse()确实会加载整个文件。

我使用File.ReadLines()打印40MB文件的第一行,控制台应用程序的内存使用量为5MB。然后,使用File.ReadLines().Reverse()打印同一文件的最后一行 - 内存使用量为95MB

结论

无论Reverse()正在做什么,对于读取大文件的底部,它都不是一个好选择


3
我想知道Reverse方法是否确实将整个文件加载到内存中。难道不需要先确定Enumerable的结束点吗?也就是说,在内部,Enumerable会完全枚举文件以创建一个临时数组,然后对该数组执行Reverse操作,再使用yield关键字逐个枚举,从而创建一个新的Enumerable,以相反的顺序迭代。 - applejacks01
3
原回答是错误的,但我将编辑后的答案保留在这里,因为它可以防止其他人使用这种方法。 - Roman Gudkov

8

处理大文件的快速解决方案:在C#中使用PowerShell的Get-Content命令并带上Tail参数。

using System.Management.Automation;

using (PowerShell powerShell = PowerShell.Create())
{
    string lastLine = powerShell.AddCommand("Get-Content")
        .AddParameter("Path", @"c:\a.txt")
        .AddParameter("Tail", 1)
        .Invoke().FirstOrDefault()?.ToString();
}

需要引用:'System.Management.Automation.dll' - 可能在类似'C:\Program Files (x86)\Reference Assemblies\Microsoft\WindowsPowerShell\3.0'的位置

使用PowerShell会产生一定的开销,但对于大文件而言是值得的。


请注意,这只会从文件中获取最后一行,而不是迭代。 - mklement0

4

我也加入了我的解决方案。在阅读了一些答案后,没有一个真正适合我的情况。

我从后往前一字节一字节地读取,直到找到换行符为止,然后将收集的字节作为字符串返回,不使用缓冲

用法:

var reader = new ReverseTextReader(path);
while (!reader.EndOfStream)
{
    Console.WriteLine(reader.ReadLine());  
}

实现:

public class ReverseTextReader
{
    private const int LineFeedLf = 10;
    private const int LineFeedCr = 13;
    private readonly Stream _stream;
    private readonly Encoding _encoding;

    public bool EndOfStream => _stream.Position == 0;

    public ReverseTextReader(Stream stream, Encoding encoding)
    {
        _stream = stream;
        _encoding = encoding;
        _stream.Position = _stream.Length;
    }

    public string ReadLine()
    {
        if (_stream.Position == 0) return null;

        var line = new List<byte>();
        var endOfLine = false;
        while (!endOfLine)
        {
            var b = _stream.ReadByteFromBehind();

            if (b == -1 || b == LineFeedLf)
            {
                endOfLine = true;
            } 
            line.Add(Convert.ToByte(b));
        }

        line.Reverse();
        return _encoding.GetString(line.ToArray());
    }
}

public static class StreamExtensions
{
    public static int ReadByteFromBehind(this Stream stream)
    {
        if (stream.Position == 0) return -1;

        stream.Position = stream.Position - 1;
        var value = stream.ReadByte();
        stream.Position = stream.Position - 1;
        return value;
    }
}

3
要创建一个文件迭代器,您可以这样做:

编辑:

这是我修复后的固定宽度反向文件阅读器版本:

public static IEnumerable<string> readFile()
{
    using (FileStream reader = new FileStream(@"c:\test.txt",FileMode.Open,FileAccess.Read))
    {
        int i=0;
        StringBuilder lineBuffer = new StringBuilder();
        int byteRead;
        while (-i < reader.Length)
        {
            reader.Seek(--i, SeekOrigin.End);
            byteRead = reader.ReadByte();
            if (byteRead == 10 && lineBuffer.Length > 0)
            {
                yield return Reverse(lineBuffer.ToString());
                lineBuffer.Remove(0, lineBuffer.Length);
            }
            lineBuffer.Append((char)byteRead);
        }
        yield return Reverse(lineBuffer.ToString());
        reader.Close();
    }
}

public static string Reverse(string str)
{
    char[] arr = new char[str.Length];
    for (int i = 0; i < str.Length; i++)
        arr[i] = str[str.Length - 1 - i];
    return new string(arr);
}

这对于ISO-8859-1来说现在已经接近正确了,但对于任何其他编码都不是。编码使得这变得非常棘手:( - Jon Skeet
“close to being correct for ISO-8859-1”是什么意思?还缺少什么? - Igor Zelaya
处理方式不太正确,无法匹配"\r" "\n"和"\r\n",后者只被计算为一个换行符。 - Jon Skeet
1
它也从不产生空行 - "a\n\nb" 应该产生 "a", "", "b"。 - Jon Skeet
仅当我找到 '\n' (ASCII 10) 时,才会使 lineBuffer 掉电。你是对的,我没有考虑 '\r'。 - Igor Zelaya
我不确定关于产生空行的问题。当调用StreamReader类的ReadLine()方法时,这是默认行为吗? - Igor Zelaya

2

我将文件逐行放入列表中,然后使用List.Reverse();

        StreamReader objReader = new StreamReader(filename);
        string sLine = "";
        ArrayList arrText = new ArrayList();

        while (sLine != null)
        {
            sLine = objReader.ReadLine();
            if (sLine != null)
                arrText.Add(sLine);
        }
        objReader.Close();


        arrText.Reverse();

        foreach (string sOutput in arrText)
        {

...


5
不适合处理大文件,因为需要将整个文件加载到内存中。而且原帖明确指出,不想完全加载该文件。 - CodesInChaos

1

我知道这篇文章很老,但是因为我找不到如何使用最受欢迎的解决方案,最终我找到了这个: 以下是我在VB和C#中找到的内存成本最低的最佳答案

http://www.blakepell.com/2010-11-29-backward-file-reader-vb-csharp-source

希望我能帮助其他人,因为我花了几个小时才最终找到这篇文章!
[编辑]
以下是C#代码:
//*********************************************************************************************************************************
//
//             Class:  BackwardReader
//      Initial Date:  11/29/2010
//     Last Modified:  11/29/2010
//     Programmer(s):  Original C# Source - the_real_herminator
//                     http://social.msdn.microsoft.com/forums/en-US/csharpgeneral/thread/9acdde1a-03cd-4018-9f87-6e201d8f5d09
//                     VB Converstion - Blake Pell
//
//*********************************************************************************************************************************

using System.Text;
using System.IO;
public class BackwardReader
{
    private string path;
    private FileStream fs = null;
    public BackwardReader(string path)
    {
        this.path = path;
        fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
        fs.Seek(0, SeekOrigin.End);
    }
    public string Readline()
    {
        byte[] line;
        byte[] text = new byte[1];
        long position = 0;
        int count;
        fs.Seek(0, SeekOrigin.Current);
        position = fs.Position;
        //do we have trailing rn?
        if (fs.Length > 1)
        {
            byte[] vagnretur = new byte[2];
            fs.Seek(-2, SeekOrigin.Current);
            fs.Read(vagnretur, 0, 2);
            if (ASCIIEncoding.ASCII.GetString(vagnretur).Equals("rn"))
            {
                //move it back
                fs.Seek(-2, SeekOrigin.Current);
                position = fs.Position;
            }
        }
        while (fs.Position > 0)
        {
            text.Initialize();
            //read one char
            fs.Read(text, 0, 1);
            string asciiText = ASCIIEncoding.ASCII.GetString(text);
            //moveback to the charachter before
            fs.Seek(-2, SeekOrigin.Current);
            if (asciiText.Equals("n"))
            {
                fs.Read(text, 0, 1);
                asciiText = ASCIIEncoding.ASCII.GetString(text);
                if (asciiText.Equals("r"))
                {
                    fs.Seek(1, SeekOrigin.Current);
                    break;
                }
            }
        }
        count = int.Parse((position - fs.Position).ToString());
        line = new byte[count];
        fs.Read(line, 0, count);
        fs.Seek(-count, SeekOrigin.Current);
        return ASCIIEncoding.ASCII.GetString(line);
    }
    public bool SOF
    {
        get
        {
            return fs.Position == 0;
        }
    }
    public void Close()
    {
        fs.Close();
    }
}

你应该在回答中包括链接中的相关部分,并仅添加用于参考的链接,以便即使链接更改,你的回答仍然具有价值。 - Thomas Flinkow
如果您有私有的 IDisposable 字段,您应该也要实现 IDisposable 接口,并正确地处理这些字段的释放。 - thomasb
要使这段代码工作,应该用 "\n" 和 "\r" 替换 "n" 和 "r"。不幸的是,即使修复后这段代码依然运行缓慢,甚至对于较小的文件也是如此,请查看 Jon Person 的解决方案。 - Dima Sherba

1

这里已经有了一些好的答案,这是另一个可以使用的与LINQ兼容的类,它专注于性能和对大文件的支持。它假定"\r\n"为行终止符。

用法:

var reader = new ReverseTextReader(@"C:\Temp\ReverseTest.txt");
while (!reader.EndOfStream)
    Console.WriteLine(reader.ReadLine());

ReverseTextReader类:

/// <summary>
/// Reads a text file backwards, line-by-line.
/// </summary>
/// <remarks>This class uses file seeking to read a text file of any size in reverse order.  This
/// is useful for needs such as reading a log file newest-entries first.</remarks>
public sealed class ReverseTextReader : IEnumerable<string>
{
    private const int BufferSize = 16384;   // The number of bytes read from the uderlying stream.
    private readonly Stream _stream;        // Stores the stream feeding data into this reader
    private readonly Encoding _encoding;    // Stores the encoding used to process the file
    private byte[] _leftoverBuffer;         // Stores the leftover partial line after processing a buffer
    private readonly Queue<string> _lines;  // Stores the lines parsed from the buffer

    #region Constructors

    /// <summary>
    /// Creates a reader for the specified file.
    /// </summary>
    /// <param name="filePath"></param>
    public ReverseTextReader(string filePath)
        : this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), Encoding.Default)
    { }

    /// <summary>
    /// Creates a reader using the specified stream.
    /// </summary>
    /// <param name="stream"></param>
    public ReverseTextReader(Stream stream)
        : this(stream, Encoding.Default)
    { }

    /// <summary>
    /// Creates a reader using the specified path and encoding.
    /// </summary>
    /// <param name="filePath"></param>
    /// <param name="encoding"></param>
    public ReverseTextReader(string filePath, Encoding encoding)
        : this(new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read), encoding)
    { }

    /// <summary>
    /// Creates a reader using the specified stream and encoding.
    /// </summary>
    /// <param name="stream"></param>
    /// <param name="encoding"></param>
    public ReverseTextReader(Stream stream, Encoding encoding)
    {          
        _stream = stream;
        _encoding = encoding;
        _lines = new Queue<string>(128);            
        // The stream needs to support seeking for this to work
        if(!_stream.CanSeek)
            throw new InvalidOperationException("The specified stream needs to support seeking to be read backwards.");
        if (!_stream.CanRead)
            throw new InvalidOperationException("The specified stream needs to support reading to be read backwards.");
        // Set the current position to the end of the file
        _stream.Position = _stream.Length;
        _leftoverBuffer = new byte[0];
    }

    #endregion

    #region Overrides

    /// <summary>
    /// Reads the next previous line from the underlying stream.
    /// </summary>
    /// <returns></returns>
    public string ReadLine()
    {
        // Are there lines left to read? If so, return the next one
        if (_lines.Count != 0) return _lines.Dequeue();
        // Are we at the beginning of the stream? If so, we're done
        if (_stream.Position == 0) return null;

        #region Read and Process the Next Chunk

        // Remember the current position
        var currentPosition = _stream.Position;
        var newPosition = currentPosition - BufferSize;
        // Are we before the beginning of the stream?
        if (newPosition < 0) newPosition = 0;
        // Calculate the buffer size to read
        var count = (int)(currentPosition - newPosition);
        // Set the new position
        _stream.Position = newPosition;
        // Make a new buffer but append the previous leftovers
        var buffer = new byte[count + _leftoverBuffer.Length];
        // Read the next buffer
        _stream.Read(buffer, 0, count);
        // Move the position of the stream back
        _stream.Position = newPosition;
        // And copy in the leftovers from the last buffer
        if (_leftoverBuffer.Length != 0)
            Array.Copy(_leftoverBuffer, 0, buffer, count, _leftoverBuffer.Length);
        // Look for CrLf delimiters
        var end = buffer.Length - 1;
        var start = buffer.Length - 2;
        // Search backwards for a line feed
        while (start >= 0)
        {
            // Is it a line feed?
            if (buffer[start] == 10)
            {
                // Yes.  Extract a line and queue it (but exclude the \r\n)
                _lines.Enqueue(_encoding.GetString(buffer, start + 1, end - start - 2));
                // And reset the end
                end = start;
            }
            // Move to the previous character
            start--;
        }
        // What's left over is a portion of a line. Save it for later.
        _leftoverBuffer = new byte[end + 1];
        Array.Copy(buffer, 0, _leftoverBuffer, 0, end + 1);
        // Are we at the beginning of the stream?
        if (_stream.Position == 0)
            // Yes.  Add the last line.
            _lines.Enqueue(_encoding.GetString(_leftoverBuffer, 0, end - 1));

        #endregion

        // If we have something in the queue, return it
        return _lines.Count == 0 ? null : _lines.Dequeue();
    }

    #endregion

    #region IEnumerator<string> Interface

    public IEnumerator<string> GetEnumerator()
    {
        string line;
        // So long as the next line isn't null...
        while ((line = ReadLine()) != null)
            // Read and return it.
            yield return line;
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        throw new NotImplementedException();
    }

    #endregion
}

这是一篇旧文章,但我花了很长时间才让它能够向后读取。这个新的代码实际上可以工作,并且速度很快。我所做的一个小改变是将其实现为IDisposable以实现更安全的执行。 - Dima Sherba

1

你可以逐个字符地向后读取文件,并缓存所有字符,直到遇到回车符和/或换行符。

然后,你将反转收集到的字符串,并将其作为一行输出。


4
逐个字符倒序读取文件很困难 - 因为你必须能够识别字符的开头。这取决于编码方式,这样做有多简单。 - Jon Skeet

0
如果有其他人遇到这个问题,我用以下 PowerShell 脚本解决了它,可以轻松地修改为 C# 脚本,只需花费少量的精力即可。
[System.IO.FileStream]$fileStream = [System.IO.File]::Open("C:\Name_of_very_large_file.log", [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::ReadWrite)
[System.IO.BufferedStream]$bs = New-Object System.IO.BufferedStream $fileStream;
[System.IO.StreamReader]$sr = New-Object System.IO.StreamReader $bs;


$buff = New-Object char[] 20;
$seek = $bs.Seek($fileStream.Length - 10000, [System.IO.SeekOrigin]::Begin);

while(($line = $sr.ReadLine()) -ne $null)
{
     $line;
}

这基本上从文件的最后10,000个字符开始读取,输出每一行。


1
这将从最后的10,000个字节向前读取,而不是从末尾向开头反向读取。另外,为什么不只是使用 .Seek(-10000, [System.IO.SeekOrigin]::End); - AKX

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接