从末尾读取大文件

Question

从末尾读取大文件

phpfile-io

14

我可以在PHP端读取文件吗？比如我想读取最后10-20行的内容？

如果文件大小超过10MB，我会开始收到错误信息。

如何防止发生这种错误？

读取普通文件时，我们使用以下代码：

if ($handle) {
    while (($buffer = fgets($handle, 4096)) !== false) {
    $i1++;
    $content[$i1]=$buffer;
    }
    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
    }
    fclose($handle);
}

我的文件可能超过10MB，但我只需要读取最后几行。我该怎么做？

谢谢。

- kritya

可能是重复的问题: PHP - 从文本文件末尾读取 - hippietrail

11个回答

7

这取决于你如何解释“可以”。

如果你想知道是否可以直接使用PHP函数而不必读取所有前面的行，则答案是：不行，你不能这样做。

换行符是数据的一种解释方式，只有在实际读取数据时才能知道它们的位置。

如果文件非常大，我不会这样做。最好的方法是从文件末尾开始扫描文件，并逐渐从文件末尾读取块。

更新

以下是一种仅使用PHP的方法，可以读取文件的最后n行，而无需读取整个文件：

function last_lines($path, $line_count, $block_size = 512){
    $lines = array();

    // we will always have a fragment of a non-complete line
    // keep this in here till we have our next entire line.
    $leftover = "";

    $fh = fopen($path, 'r');
    // go to the end of the file
    fseek($fh, 0, SEEK_END);
    do{
        // need to know whether we can actually go back
        // $block_size bytes
        $can_read = $block_size;
        if(ftell($fh) < $block_size){
            $can_read = ftell($fh);
        }

        // go back as many bytes as we can
        // read them to $data and then move the file pointer
        // back to where we were.
        fseek($fh, -$can_read, SEEK_CUR);
        $data = fread($fh, $can_read);
        $data .= $leftover;
        fseek($fh, -$can_read, SEEK_CUR);

        // split lines by \n. Then reverse them,
        // now the last line is most likely not a complete
        // line which is why we do not directly add it, but
        // append it to the data read the next time.
        $split_data = array_reverse(explode("\n", $data));
        $new_lines = array_slice($split_data, 0, -1);
        $lines = array_merge($lines, $new_lines);
        $leftover = $split_data[count($split_data) - 1];
    }
    while(count($lines) < $line_count && ftell($fh) != 0);
    if(ftell($fh) == 0){
        $lines[] = $leftover;
    }
    fclose($fh);
    // Usually, we will read too many lines, correct that here.
    return array_slice($lines, 0, $line_count);
}

- phant0m

你可以完全不阅读前面的所有行，就像你在最后一句中建议的那样。 :) - awgy

@awgy：我所说的“直接”是指使用PHP函数或操作系统的帮助；也许我的措辞不太恰当 :) - phant0m

@kritya，@awgy：我已经添加了我所描述的实现。 - phant0m

这段代码能否被认为是GPLv2+兼容的呢？ :) 我想在WordPress插件中使用它，而官方存储库有这样的许可要求，SO使用的CC-wiki不兼容。 :( - Rarst

1

@Rarst：当然，你可以使用那个许可证。(我这样说就足够了吧？) - phant0m

显示剩余5条评论

7

这不是纯PHP，但常见的解决方案是使用tac命令，它是cat的反转并以相反的顺序加载文件。使用exec()或passthru()在服务器上运行它，然后读取结果。例如：

<?php
$myfile = 'myfile.txt';
$command = "tac $myfile > /tmp/myfilereversed.txt";
exec($command);
$currentRow = 0;
$numRows = 20;  // stops after this number of rows
$handle = fopen("/tmp/myfilereversed.txt", "r");
while (!feof($handle) && $currentRow <= $numRows) {
   $currentRow++;
   $buffer = fgets($handle, 4096);
   echo $buffer."<br>";
}
fclose($handle);
?>

- Eran Galperin

但是它会影响实际文件还是只是虚拟地执行命令？ - kritya

它不会影响实际文件，但是它会创建一个新文件/tmp/myfilereversed.txt，所以您需要在所有操作完成后删除它。 - Greenisha

6

以下代码片段对我有用：

```

以下片段对我有用。

```

$file = popen("tac $filename",'r');

while ($line = fgets($file)) {
   echo $line;
}

参考资料: http://laughingmeme.org/2008/02/28/reading-a-file-backwards-in-php/ 本文介绍如何在PHP中倒序读取文件。可以通过fseek()函数定位文件的末尾，然后使用fgetc()函数按一个字符一个字符地向前读取整个文件。为了使代码更加清晰易懂，可以将此操作封装到一个函数中。

- Sukhjinder Singh

@Lenin 是的，我测试了1G。 - Sukhjinder Singh

只是想指出这段代码片段实际上使用了Perl命令（tac），该命令可能可用，也可能不可用。 - Goozak

3

如果您的代码出现错误并且报告错误，您应该在帖子中包含错误！

您之所以会遇到错误是因为您试图将整个文件内容存储在PHP的内存空间中。

解决问题最有效的方法是像Greenisha建议的那样，寻找文件末尾，然后再返回一点。但是，Greenisha返回一点的机制效率不高。

相反，请考虑从流中获取最后几行的方法（即在无法寻找时）：

while (($buffer = fgets($handle, 4096)) !== false) {
    $i1++;
    $content[$i1]=$buffer;
    unset($content[$i1-$lines_to_keep]);
}

如果您知道最大行长度为4096，则应该：

if (4096*lines_to_keep<filesize($input_file)) {
   fseek($fp, -4096*$lines_to_keep, SEEK_END);
}

然后应用我之前描述的循环。

由于C语言有一些更有效处理字节流的方法，在POSIX/Unix/Linux/BSD系统上最快的解决方案就是：

$last_lines=system("last -" . $lines_to_keep . " filename");

- symcbean

如果您能再详细解释一下，那就太好了。对于取消设置的想法点个赞。 - kritya

你的解决方案也会遍历整个文件，除了使用fgets和fseek的开销更大，速度要慢得多。 - Stephane Gosselin

@stefgosselin：不是的 - 再读一遍 - 它只迭代通过文件末尾的一个块，该块比要提取的数据大或相同大小。 - symcbean

3

对于 Linux，您可以执行以下操作：

$linesToRead = 10;
exec("tail -n{$linesToRead} {$myFileName}" , $content);

您将在$content变量中获得一组行的数组。

纯PHP解决方案。

$f = fopen($myFileName, 'r');

    $maxLineLength = 1000;  // Real maximum length of your records
    $linesToRead = 10;
    fseek($f, -$maxLineLength*$linesToRead, SEEK_END);  // Moves cursor back from the end of file
    $res = array();
    while (($buffer = fgets($f, $maxLineLength)) !== false) {
        $res[] = $buffer;
    }

    $content = array_slice($res, -$linesToRead);

- Victor

3

如果你知道行的长度，就可以避免很多黑魔法，只需抓取文件末端的一段即可。

我需要从一个非常大的日志文件中获取最后15行，它们总共约有3000个字符。所以我为了安全起见，只抓取了最后8000字节，然后像正常情况下读取文件，并从末尾取出需要的内容。

    $fh = fopen($file, "r");
    fseek($fh, -8192, SEEK_END);
    $lines = array();
    while($lines[] = fgets($fh)) {}

这可能比最高评分答案更有效率，最高评分答案是逐个字符读取文件，比较每个字符，然后根据换行符进行拆分。

- felwithe

2

这里有另外一种解决方案。在fgets()中没有行长控制，但你可以添加它。

/* Read file from end line by line */
$fp = fopen( dirname(__FILE__) . '\\some_file.txt', 'r');
$lines_read = 0;
$lines_to_read = 1000;
fseek($fp, 0, SEEK_END); //goto EOF
$eol_size = 2; // for windows is 2, rest is 1
$eol_char = "\r\n"; // mac=\r, unix=\n
while ($lines_read < $lines_to_read) {
    if (ftell($fp)==0) break; //break on BOF (beginning...)
    do {
            fseek($fp, -1, SEEK_CUR); //seek 1 by 1 char from EOF
        $eol = fgetc($fp) . fgetc($fp); //search for EOL (remove 1 fgetc if needed)
        fseek($fp, -$eol_size, SEEK_CUR); //go back for EOL
    } while ($eol != $eol_char && ftell($fp)>0 ); //check EOL and BOF

    $position = ftell($fp); //save current position
    if ($position != 0) fseek($fp, $eol_size, SEEK_CUR); //move for EOL
    echo fgets($fp); //read LINE or do whatever is needed
    fseek($fp, $position, SEEK_SET); //set current position
    $lines_read++;
}
fclose($fp);

- Stritof

1

在搜索同样内容时，我找到了下面的内容，认为它对其他人也有用，因此在这里分享：

/* 逐行从文件末尾读取文件 */

function tail_custom($filepath, $lines = 1, $adaptive = true) {
        // Open file
        $f = @fopen($filepath, "rb");
        if ($f === false) return false;

        // Sets buffer size, according to the number of lines to retrieve.
        // This gives a performance boost when reading a few lines from the file.
        if (!$adaptive) $buffer = 4096;
        else $buffer = ($lines < 2 ? 64 : ($lines < 10 ? 512 : 4096));

        // Jump to last character
        fseek($f, -1, SEEK_END);

        // Read it and adjust line number if necessary
        // (Otherwise the result would be wrong if file doesn't end with a blank line)
        if (fread($f, 1) != "\n") $lines -= 1;

        // Start reading
        $output = '';
        $chunk = '';

        // While we would like more
        while (ftell($f) > 0 && $lines >= 0) {

            // Figure out how far back we should jump
            $seek = min(ftell($f), $buffer);

            // Do the jump (backwards, relative to where we are)
            fseek($f, -$seek, SEEK_CUR);

            // Read a chunk and prepend it to our output
            $output = ($chunk = fread($f, $seek)) . $output;

            // Jump back to where we started reading
            fseek($f, -mb_strlen($chunk, '8bit'), SEEK_CUR);

            // Decrease our line counter
            $lines -= substr_count($chunk, "\n");

        }

        // While we have too many lines
        // (Because of buffer size we might have read too many)
        while ($lines++ < 0) {
            // Find first newline and remove all text before that
            $output = substr($output, strpos($output, "\n") + 1);
        }

        // Close file and return
        fclose($f);     
        return trim($output);

    }

- Just_Do_It

0

这里提供了一个更完整的“尾部”建议示例。这似乎是一种简单而有效的方法 - 谢谢。非常大的文件不应该是问题，也不需要临时文件。

$out = array();
$ret = null;

// capture the last 30 files of the log file into a buffer
exec('tail -30 ' . $weatherLog, $buf, $ret);

if ( $ret == 0 ) {

  // process the captured lines one at a time
  foreach ($buf as $line) {
    $n = sscanf($line, "%s temperature %f", $dt, $t);
    if ( $n > 0 ) $temperature = $t;
    $n = sscanf($line, "%s humidity %f", $dt, $h);
    if ( $n > 0 ) $humidity = $h;
  }
  printf("<tr><th>Temperature</th><td>%0.1f</td></tr>\n", 
          $temperature);
  printf("<tr><th>Humidity</th><td>%0.1f</td></tr>\n", $humidity);
}
else { # something bad happened }

在上面的例子中，代码读取了30行文本输出，并显示文件中最后的温度和湿度读数（这就是为什么printf在循环外部的原因，以防你想知道）。该文件由ESP32填充，即使传感器报告只有nan，它也会每隔几分钟添加一次到文件中。因此，30行可以获得足够的读数，所以它不应该失败。每个读数都包括日期和时间，因此在最终版本中，输出将包括读数被采取的时间。

- madnordski

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Greenisha · Accepted Answer

19

您可以使用 fopen 和 fseek 从文件末尾向后导航。例如：

$fp = @fopen($file, "r");
$pos = -2;
while (fgetc($fp) != "\n") {
    fseek($fp, $pos, SEEK_END);
    $pos = $pos - 1;
}
$lastline = fgets($fp);

- Greenisha

通过使用带有负偏移量和 SEEK_END 的 fseek，您可以将位置指示器设置为文件末尾之前 $offset 字节的位置，因此您无需从文件开头读取。 - Greenisha

如果文件以换行符结尾，这段代码将只返回换行符。此外，我认为 $pos 应该在循环开始之前初始化为 -1。 - awgy

5

稍作更新，似乎fseek在内部使用整数，这会防止您在32位设置中设置超过2147483647的位置。这阻止了我在大约4.8GB的日志文件上使用它。 - Kickstart

你如何使用这个例子读取最后的40行？ - Avión

@Avión $pos = -40; 而不是 $pos = -2; - Greenisha

显示剩余4条评论