如何在Delphi 7中加载只包含CR分隔符的文本文件时解决问题?

6
我有一个大的文本文件(约100MB),每一行都是由CR字符分隔的,而不是CRLF。
我尝试使用TStringList.LoadFromFile()或ReadLn(F,..)逐行读取这个文本文件,但这两种方法都要求行之间由CRLF分隔。
您有没有任何高效快速的方法来读取这种类型的文本文件?
谢谢。
PS:我正在使用Delphi 7。
4个回答

8

这应该可以做到。 将文本文件读入内存流中。 然后用内容填充字符串列表。 textList.Text 接受任何组合的 CRLFCRLF 来形成一行。

function MemoryStreamToString( M : TMemoryStream) : string;
begin
  SetString( Result,PChar(M.Memory),M.Size div SizeOf(Char)); // Works in all Delphi versions
end;

var
  memStream : TMemoryStream;
  textList  : TStringList;
begin
  textList := TStringList.Create; 
  try
    memStream:= TMemoryStream.Create;
    try
      memStream.LoadFromFile('mytextfile.txt');
      textList.Text := MemoryStreamToString( memStream);  // any combination of CR,LF,CRLF interprets as a line
    finally
      memStream.Free;
    end;
    // do something with textList

  finally
    textList.Free;
  end;

end;

我在 SetString(Result, M.Memory, M.Size) 上遇到了编译错误 "不兼容的类型"。 - Kawaii-Hachii
我改成了SetString(Result, PChar(M.Memory), Integer(M.Size)); 然后编译成功了。但是,如果我检查textList.count,结果是1。所以,转换不正确。文本文件包含多行,每行以CR结尾。你的函数将整个行转换为一个字符串,这是错误的。 - Kawaii-Hachii
1
奇怪的是,这来自帮助文件: 当设置文本时,该值将被解析为子字符串,每当遇到回车符或换行符时就会分隔。(两者不需要成对出现)。 这至少从Delphi5开始就是一样的。 - LU RD
我在Delphi5中进行了一个测试,在文本文件中随机插入了#13。毫无疑问,文档是正确的,上述代码也能正常工作。(在M.Memory周围添加了PChar)。 - LU RD
没错,我测试了其他小文件,它是可以工作的。我认为,我的测试文件存在其他问题。它包含很多#0字符。 - Kawaii-Hachii
显示剩余2条评论

4

我一直想要解决这个问题,所以我写了一个解决方案,它是JvCsvDataSet的一部分。我的问题如下:

  1. 我想读取可能具有CR、CR+LF或仅LF结尾的文件。
  2. 我想要像ReadLn一样的东西,但对于点#1非常灵活,并且不具有ReadLn的众所周知的问题。因此,古老的Pascal拥有Textfile类型和ReadLn过程。现代Class等效项是必需的。
  3. 我希望它成为类似流的对象,以便我可以逐行读取,并且不会将整个3.7 GB兆字节的文件加载到内存中。此外,我希望Position是Int64类型,并且我希望能够处理非常大的文件(> 2 GB)。
  4. 我希望它在Delphi 7中工作,也在Delphi XE2中工作,以及两者之间的所有内容。
  5. 我希望它非常非常非常快。因此,我花了一些时间优化块读取性能和解析。

因此,如果您想要做到这一点,您可以编写以下内容:

procedure TForm1.Button1Click(Sender: TObject);
var
ts:TTextStream;
s:String;
begin
 ts := TTextStream.Create('c:\temp\test.txt', fm_OpenReadShared);
 try
 while not ts.Eof do begin
   s := ts.ReadLine;
   doSomethingWith(s);
 end;
 finally
    ts.Free;
 end;
end;

好的。看起来很容易,是吧?确实如此。而且它甚至还有一个文件模式标志(注意那里的read-shared选项?)。现在你所需要的就是TTextStream的代码,它们在这里:

unit textStreamUnit;
{$M+}


{$R-}

{
  textStreamUnit

  This code is based on some of the content of the JvCsvDataSet written by Warren Postma, and others,
  licensed under MOZILLA Public License.
 }

interface

uses
  Windows,
  Classes,
  SysUtils;


const
  cQuote = #34;
  cLf    = #10;
  cCR    = #13;

 { File stream mode flags used in TTextStream }

  { Significant 16 bits are reserved for standard file stream mode bits. }
  { Standard system values like fmOpenReadWrite are in SysUtils. }
  fm_APPEND_FLAG  = $20000;
  fm_REWRITE_FLAG = $10000;

  { combined Friendly mode flag values }
  fm_Append          = fmOpenReadWrite or fm_APPEND_FLAG;
  fm_OpenReadShared  = fmOpenRead      or fmShareDenyWrite;
  fm_OpenRewrite     = fmOpenReadWrite or fm_REWRITE_FLAG;
  fm_Truncate        = fmCreate        or fm_REWRITE_FLAG;
  fm_Rewrite         = fmCreate        or fm_REWRITE_FLAG;

  TextStreamReadChunkSize = 8192; // 8k chunk reads.

resourcestring
    RsECannotReadFile = 'Cannot read file %';


type
  ETextStreamException = class(Exception);

{$ifndef UNICODE}
  RawByteString=AnsiString;
{$endif}

  TTextStream = class(TObject)
  private
    FStream: TFileStream; // Tried TJclFileStream also but it was too slow! Do NOT use JCL streams here. -wpostma.
    FFilename: string;
    FStreamBuffer: PAnsiChar;
    FStreamIndex: Integer;
    FStreamSize: Integer;
    FLastReadFlag: Boolean;

    procedure _StreamReadBufInit;
  public
    function ReadLine: RawByteString;   { read a string, one per line, wow. Text files. Cool eh?}

    procedure Append;
    procedure Rewrite;

    procedure Write(const s: RawByteString);        {write a string. wow, eh? }
    procedure WriteLine(const s: RawByteString);    {write string followed by Cr+Lf }

    procedure WriteChar(c: AnsiChar);

    procedure WriteCrLf;
    //procedure Write(const s: string);

    function Eof: Boolean; {is at end of file? }

    { MODE is typically a fm_xxx constant thatimplies a default set of stream mode bits plus some extended bit flags that are specific to this stream type.}
    constructor Create(const FileName: string; Mode: DWORD = fm_OpenReadShared; Rights: Cardinal = 0); reintroduce; virtual;
    destructor Destroy; override;

    function Size: Int64; //override;   // sanity

    { read-only properties at runtime}
    property Filename: string read FFilename;
    property Stream: TFileStream read FStream; { Get at the underlying stream object}
  end;

implementation





// 2 gigabyte file limit workaround:
function GetFileSizeEx(h: HFILE; FileSize: PULargeInteger): BOOL; stdcall;  external Kernel32;

procedure TTextStream.Append; 
begin
  Stream.Seek(0, soFromEnd);
end;

constructor TTextStream.Create(const FileName: string; Mode: DWORD; Rights: Cardinal);
var
  IsAppend: Boolean;
  IsRewrite: Boolean;
begin
  inherited Create;
  FFilename := FileName;

  FLastReadFlag := False;
  IsAppend := (Mode and fm_APPEND_FLAG) <> 0;
  IsRewrite := (Mode and fm_REWRITE_FLAG) <> 0;

  FStream := TFileStream.Create(Filename, {16 lower bits only}Word(Mode), Rights);

  //Stream := FStream; { this makes everything in the base class actually work if we inherited from Easy Stream}

  if IsAppend then
    Self.Append  // seek to the end.
  else
    Stream.Position := 0;

  if IsRewrite then
    Rewrite;

  _StreamReadBufInit;
end;

destructor TTextStream.Destroy;
begin
  if Assigned(FStream) then
    FStream.Position := 0; // avoid nukage
  FreeAndNil(FStream);
  FreeMem(FStreamBuffer); // Buffered reads for speed.
  inherited Destroy;
end;

function TTextStream.Eof: Boolean;
begin
  if not Assigned(FStream) then
    Result := False
    //Result := True
  else
    Result := FLastReadFlag and (FStreamIndex >= FStreamSize);
    //Result := FStream.Position >= FStream.Size;
end;

{ TTextStream.ReadLine:
  This reads a line of text, normally terminated by carriage return and/or linefeed
  but it is a bit special, and adapted for CSV usage because CR/LF characters
  inside quotes are read as a single line.

  This is a VERY PERFORMANCE CRITICAL function. We loop tightly inside here.
  So there should be as few procedure-calls inside the repeat loop as possible.


}
function TTextStream.ReadLine: RawByteString;
var
  Buf: array of AnsiChar;
  n: Integer;
  QuoteFlag: Boolean;
  LStreamBuffer: PAnsiChar;
  LStreamSize: Integer;
  LStreamIndex: Integer;

  procedure FillStreamBuffer;
  begin
    FStreamSize := Stream.Read(LStreamBuffer[0], TextStreamReadChunkSize);
    LStreamSize := FStreamSize;
    if LStreamSize = 0 then
    begin
      if FStream.Position >= FStream.Size then
        FLastReadFlag := True
      else
        raise ETextStreamException.CreateResFmt(@RsECannotReadFile, [FFilename]);
    end
    else
    if LStreamSize < TextStreamReadChunkSize then
      FLastReadFlag := True;
    FStreamIndex := 0;
    LStreamIndex := 0;
  end;

begin
  { Ignore linefeeds, read until carriage return, strip carriage return, and return it }
  SetLength(Buf, 150);

  n := 0;
  QuoteFlag := False;

  LStreamBuffer := FStreamBuffer;
  LStreamSize := FStreamSize;
  LStreamIndex := FStreamIndex;
  while True do
  begin
    if n >= Length(Buf) then
      SetLength(Buf, n + 100);

    if LStreamIndex >= LStreamSize then
      FillStreamBuffer;

    if LStreamIndex >= LStreamSize then
      Break;

    Buf[n] := LStreamBuffer[LStreamIndex];
    Inc(LStreamIndex);

    case Buf[n] of
      cQuote: {34} // quote
        QuoteFlag := not QuoteFlag;
      cLf: {10} // linefeed
        if not QuoteFlag then
          Break;
      cCR: {13} // carriage return
        begin
          if not QuoteFlag then
          begin
            { If it is a CRLF we must skip the LF. Otherwise the next call to ReadLine
              would return an empty line. }
            if LStreamIndex >= LStreamSize then
              FillStreamBuffer;
            if LStreamBuffer[LStreamIndex] = cLf then
              Inc(LStreamIndex);

            Break;
          end;
        end
    end;
    Inc(n);
  end;
  FStreamIndex := LStreamIndex;

  SetString(Result, PAnsiChar(@Buf[0]), n);
end;

procedure TTextStream.Rewrite;
begin
  if Assigned(FStream) then
    FStream.Size := 0;// truncate!
end;

function TTextStream.Size: Int64; { Get file size }
begin
  if Assigned(FStream) then
    GetFileSizeEx(FStream.Handle, PULargeInteger(@Result)) {int64 Result}
  else
    Result := 0;
end;

{ Look at this. A stream that can handle a string parameter. What will they think of next? }
procedure TTextStream.Write(const s: RawByteString);
begin
  Stream.Write(s[1], Length(s)); {The author of TStreams would like you not to be able to just write Stream.Write(s).  Weird. }
end;

procedure TTextStream.WriteChar(c: AnsiChar);
begin
  Stream.Write(c, SizeOf(AnsiChar));
end;

procedure TTextStream.WriteCrLf;
begin
  WriteChar(#13);
  WriteChar(#10);
end;

procedure TTextStream.WriteLine(const s: RawByteString);
begin
  Write(s);
  WriteCrLf;
end;

procedure TTextStream._StreamReadBufInit;
begin
  if not Assigned(FStreamBuffer) then
  begin
    //FStreamBuffer := AllocMem(TextStreamReadChunkSize);
    GetMem(FStreamBuffer, TextStreamReadChunkSize);
  end;
end;

end.

这个在其他答案失败的情况下在 Delphi XE2 中完美地工作了。感谢 Warren。跨越时空的问候 :) - Hein du Plessis
太棒了。我喜欢这个小工具类。 - Warren P

0

看看这个能否帮到你:

https://dev59.com/YXE85IYBdhLWcg3wXCMv#2957614

乍一看,似乎可以在代码中将EOL字符更改为#13#10之外的其他内容。

它还逐行解析(您可以将其用作缓冲区),而不是将整个文件加载到内存中(对于100MB+文件可能会有问题)。


1
从文件中加载TStrings内容与加载其Text属性一样占用内存。 - OnTheFly

0

如果我没记错的话,在从文件中读取文本之前,您需要设置字符串列表的LineBreak属性。

....
const 
  CR = #13;
  LF = #10;
  LFCR = #10#13;
begin
  MyStringList.LineBreak:= CR;
  MyStringList.LoadFromFile(.....

请参阅:http://docwiki.embarcadero.com/VCL/XE2/en/Classes.TStrings.LineBreak

不确定 Delphi 7 是否支持此功能(刚刚检查了一下,D2007 支持,因此我认为 D7 也应该支持)。


不幸的是,Delphi 7没有这个功能。 - Kawaii-Hachii

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接