使用StreamReader读取一行数据时如何不消耗数据?

28

有没有一种方法可以预读下一行,以测试下一行是否包含特定的标签数据?

我正在处理一个格式,其中有一个开始标记但没有结束标记。

我想读取一行将其添加到结构中,然后测试下一行,以确保它不是新的“节点”,如果不是,则继续添加;如果是,则关闭该结构并创建一个新结构。

我能想到的唯一解决方案是同时使用两个流读取器,以一种有序的方式进行遍历,但这似乎很浪费(而且可能根本行不通)。

我需要类似于peek的peekline的东西。


我认为PeekLine方法不是处理“无结束标记”问题的好方法,因为你总是需要查看下一行并测试新结构是否开始。我想将流的位置设置为前一行,下一个ReadLine将返回您已读取的行。 - Gqqnbig
4个回答

31
问题在于底层流可能甚至不支持随机访问。如果您查看流读取器实现,它使用缓冲区,因此即使流不支持随机访问,也可以实现TextReader.Peek()。
您可以编写一个简单的适配器来读取下一行并在内部进行缓冲,类似于这样:
 public class PeekableStreamReaderAdapter
    {
        private StreamReader Underlying;
        private Queue<string> BufferedLines;

        public PeekableStreamReaderAdapter(StreamReader underlying)
        {
            Underlying = underlying;
            BufferedLines = new Queue<string>();
        }

        public string PeekLine()
        {
            string line = Underlying.ReadLine();
            if (line == null)
                return null;
            BufferedLines.Enqueue(line);
            return line;
        }


        public string ReadLine()
        {
            if (BufferedLines.Count > 0)
                return BufferedLines.Dequeue();
            return Underlying.ReadLine();
        }
    }

2
我会在使用前初始化BufferedLines :) 另外,我会为PeekLine()使用另一个名称,因为该名称表明它总是返回相同的行(从上次ReadLine的位置开始的下一行)。已投+1票。 - tofi9
1
感谢添加了初始化器。甚至没有编译代码。也许像LookAheadReadLine()这样的东西更合适。 - Nic Strong
7
我稍微进行了扩展,使这个类从TextReader继承:https://gist.github.com/1317325 - Andy Edinborough
1
@AndyEdinborough 真是太喜欢 PeekableTextReader 了。 - Chris Marisic
@AndyEdinborough 你刚刚为我节省了两个小时,非常出色的工作,感谢你! - tempy
@AndyEdinborough 很棒(+1 给你们两个)。但是在编写好的可重用代码后,您不想实现 Seek() 吗?此外,我不太明白为什么您将 MemoryStream 用作“缓冲区”。它不会变得和整个文件一样大吗?这时候直接预先加载整个文件不是更简单吗? - Jon Coombs

4
您可以通过访问StreamReader.BaseStream.Position来存储位置,然后读取下一行,进行测试,然后在读取该行之前将其定位到该位置:
            // Peek at the next line
            long peekPos = reader.BaseStream.Position;
            string line = reader.ReadLine();

            if (line.StartsWith("<tag start>"))
            {
                // This is a new tag, so we reset the position
                reader.BaseStream.Seek(pos);    

            }
            else
            {
                // This is part of the same node.
            }

这需要大量的查找和重复阅读相同的内容。但是,使用一些逻辑,您可以完全避免这种情况 - 例如,当您看到一个新的标签开始时,关闭现有的结构并开始一个新的结构 - 这是一个基本算法:

        SomeStructure myStructure = null;
        while (!reader.EndOfStream)
        {
            string currentLine = reader.ReadLine();
            if (currentLine.StartsWith("<tag start>"))
            {
                // Close out existing structure.
                if (myStructure != null)
                {
                    // Close out the existing structure.
                }

                // Create a new structure and add this line.
                myStructure = new Structure();                   
                // Append to myStructure.
            }
            else
            {
                // Add to the existing structure.
                if (myStructure != null)
                {
                    // Append to existing myStructure
                }
                else
                {
                    // This means the first line was not part of a structure.
                    // Either handle this case, or throw an exception.
                }
            }
        }

2
看这里:似乎底层流的位置并不总是与StreamReader匹配:https://dev59.com/CErSa4cB1Zd3GeqPVVN2 - Casebash

1

为什么这么难?无论如何都要返回下一行。检查它是否是一个新节点,如果不是,则将其添加到结构体中。如果是新节点,则创建一个新的结构体。

// Not exactly C# but close enough
Collection structs = new Collection();
Struct struct;
while ((line = readline()) != null)) {
    if (IsNode(line)) {
        if (struct != null) structs.add(struct);
        struct = new Struct();
        continue;
    }
    // Whatever processing you need to do
    struct.addLine(line);
}
structs.add(struct); // Add the last one to the collection

// Use your structures here
foreach s in structs {

}

0

这是我目前的进展。我更倾向于使用分割路线而不是逐行使用流读取器。

我相信有一些地方需要更加优雅,但现在似乎它正在工作。

请让我知道您的想法。

struct INDI
    {
        public string ID;
        public string Name;
        public string Sex;
        public string BirthDay;
        public bool Dead;


    }
    struct FAM
    {
        public string FamID;
        public string type;
        public string IndiID;
    }
    List<INDI> Individuals = new List<INDI>();
    List<FAM> Family = new List<FAM>();
    private void button1_Click(object sender, EventArgs e)
    {
        string path = @"C:\mostrecent.ged";
        ParseGedcom(path);
    }

    private void ParseGedcom(string path)
    {
        //Open path to GED file
        StreamReader SR = new StreamReader(path);

        //Read entire block and then plit on 0 @ for individuals and familys (no other info is needed for this instance)
        string[] Holder = SR.ReadToEnd().Replace("0 @", "\u0646").Split('\u0646');

        //For each new cell in the holder array look for Individuals and familys
        foreach (string Node in Holder)
        {

            //Sub Split the string on the returns to get a true block of info
            string[] SubNode = Node.Replace("\r\n", "\r").Split('\r');
            //If a individual is found
            if (SubNode[0].Contains("INDI"))
            {
                //Create new Structure
                INDI I = new INDI();
                //Add the ID number and remove extra formating
                I.ID = SubNode[0].Replace("@", "").Replace(" INDI", "").Trim();
                //Find the name remove extra formating for last name
                I.Name = SubNode[FindIndexinArray(SubNode, "NAME")].Replace("1 NAME", "").Replace("/", "").Trim(); 
                //Find Sex and remove extra formating
                I.Sex = SubNode[FindIndexinArray(SubNode, "SEX")].Replace("1 SEX ", "").Trim();

                //Deterine if there is a brithday -1 means no
                if (FindIndexinArray(SubNode, "1 BIRT ") != -1)
                {
                    // add birthday to Struct 
                    I.BirthDay = SubNode[FindIndexinArray(SubNode, "1 BIRT ") + 1].Replace("2 DATE ", "").Trim();
                }

                // deterimin if there is a death tag will return -1 if not found
                if (FindIndexinArray(SubNode, "1 DEAT ") != -1)
                {
                    //convert Y or N to true or false ( defaults to False so no need to change unless Y is found.
                    if (SubNode[FindIndexinArray(SubNode, "1 DEAT ")].Replace("1 DEAT ", "").Trim() == "Y")
                    {
                        //set death
                        I.Dead = true;
                    }
                }
                //add the Struct to the list for later use
                Individuals.Add(I);
            }

            // Start Family section
            else if (SubNode[0].Contains("FAM"))
            {
                //grab Fam id from node early on to keep from doing it over and over
                string FamID = SubNode[0].Replace("@ FAM", "");

                // Multiple children can exist for each family so this section had to be a bit more dynaimic

                // Look at each line of node
                foreach (string Line in SubNode)
                {
                    // If node is HUSB
                    if (Line.Contains("1 HUSB "))
                    {

                        FAM F = new FAM();
                        F.FamID = FamID;
                        F.type = "PAR";
                        F.IndiID = Line.Replace("1 HUSB ", "").Replace("@","").Trim();
                        Family.Add(F);
                    }
                        //If node for Wife
                    else if (Line.Contains("1 WIFE "))
                    {
                        FAM F = new FAM();
                        F.FamID = FamID;
                        F.type = "PAR";
                        F.IndiID = Line.Replace("1 WIFE ", "").Replace("@", "").Trim();
                        Family.Add(F);
                    }
                        //if node for multi children
                    else if (Line.Contains("1 CHIL "))
                    {
                        FAM F = new FAM();
                         F.FamID = FamID;
                        F.type = "CHIL";
                        F.IndiID = Line.Replace("1 CHIL ", "").Replace("@", "");
                        Family.Add(F);
                    }
                }
            }
        }
    }

    private int FindIndexinArray(string[] Arr, string search)
    {
        int Val = -1;
        for (int i = 0; i < Arr.Length; i++)
        {
            if (Arr[i].Contains(search))
            {
                Val = i;
            }
        }
        return Val;
    }

1
FAM和INDI对于那些结构来说是可怕的名称(如果其他人需要阅读或使用你的代码)。 - Josh Smeaton
那是我认为很容易理解的标签名称。 - Crash893

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接