PHP：从大型文本文件的末尾获取行

Question

PHP：从大型文本文件的末尾获取行

php

3

我已经搜索了很长一段时间，但没有找到正确的解决方法。

我有一个日志文件，大小为100MB左右，大约140,000行文本。使用PHP，我想获取文件中的最后500行。

如何获取这500行呢？大多数函数都将文件读入内存，对于这种情况来说并不可行。最好避免执行系统命令。

- hexacyanide

请仅返回已翻译的文本：参见 https://dev59.com/dnI_5IYBdhLWcg3wJPdu#1510248 - Felix

这个答案会对你有所帮助 - dan-lee

3个回答

4

我写了这个函数，我觉得它工作得很好。它返回一个行数组，就像file一样。如果你想要它返回一个字符串，就像file_get_contents，那么只需将return语句更改为return implode('', array_reverse($lines));:

function file_get_tail($filename, $num_lines = 10){

    $file = fopen($filename, "r");

    fseek($file, -1, SEEK_END);

    for ($line = 0, $lines = array(); $line < $num_lines && false !== ($char = fgetc($file));) {
        if($char === "\n"){
            if(isset($lines[$line])){
                $lines[$line][] = $char;
                $lines[$line] = implode('', array_reverse($lines[$line]));
                $line++;
            }
        }else
            $lines[$line][] = $char;
        fseek($file, -2, SEEK_CUR);
    }
    fclose($file);

    if($line < $num_lines)
        $lines[$line] = implode('', array_reverse($lines[$line]));

    return array_reverse($lines);
}

例子：

file_get_tail('filename.txt', 500);

- Paul

1

你的代码中有一个小错别字 -> ($): $file = fopen("filename", "r"); -> $file = fopen("$filename", "r"); 除此之外，它运行得很好。如果我想访问远程服务器上的日志文件怎么办？ - Mohammed Joraid

这个不起作用，抛出了内存限制异常。 - undefined

4

如果您想在PHP中实现这个功能：

<?php
/**
  Read last N lines from file.

  @param $filename string  path to file. must support seeking
  @param $n        int     number of lines to get.

  @return array            up to $n lines of text
*/
function tail($filename, $n)
{
  $buffer_size = 1024;

  $fp = fopen($filename, 'r');
  if (!$fp) return array();

  fseek($fp, 0, SEEK_END);
  $pos = ftell($fp);

  $input = '';
  $line_count = 0;

  while ($line_count < $n + 1)
  {
    // read the previous block of input
    $read_size = $pos >= $buffer_size ? $buffer_size : $pos;
    fseek($fp, $pos - $read_size, SEEK_SET);

    // prepend the current block, and count the new lines
    $input = fread($fp, $read_size).$input;
    $line_count = substr_count(ltrim($input), "\n");

    // if $pos is == 0 we are at start of file
    $pos -= $read_size;
    if (!$pos) break;
  }

  fclose($fp);

  // return the last 50 lines found  

  return array_slice(explode("\n", rtrim($input)), -$n);
}

var_dump(tail('/var/log/syslog', 50));

这个部分大部分没有经过测试，但应该足以让您获得一个完整的工作解决方案。

缓冲区大小为1024，但可以更改为更大或更小。（您甚至可以根据每行长度的$ n *估计值动态设置它。）这应该比逐个字符查找更好，尽管这意味着我们需要使用substr_count()来查找新行。

- Matthew

对于一个未经测试的解决方案，它开箱即用地表现非常出色。谢谢，我的东西终于能正常工作了。 - rezizter

11年过去了 :) 这个工作进展得非常顺利，感谢你在2012年的分享 :) - undefined

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Chris Trahey · Accepted Answer

6

如果你在一个' nix 机器上，你应该能够使用 shell 转义和工具'tail'。虽然已经有一段时间了，但是大概是这样的：

$lastLines = `tail -n 500`;

请注意引号的使用，它会在BASH或类似环境中执行字符串并返回结果。

- Chris Trahey

当启用安全模式或禁用shell_exec()时，反引号运算符将被禁用。这对于共享主机非常重要。 - Bailey Parker

如果shell执行不可用，那么解决方案可能是一个快速而棘手的小算法，利用fopen、fseek和循环; 基本上是倒着读取文件，直到有500行... - Chris Trahey

2

@hexacyanide，为什么你需要每秒运行五次它？也许你最好使用proc_open()，只需监听tail -f的输出。（或者也许没有必要使用PHP来执行你正在做的任何操作。） - Matthew

1

值得探索更多基于流的方法，因为这里讨论的方法有两个缺点：1. 当更改很少时，文件被过度检查；2. 当更改频繁时，可能会被低估。任何行数和频率的组合都只是在这两个极端之间做出的妥协，而实际上可能存在一种与实际更改数量相当的解决方案（即它“挂起”，直到日志中有实际更改）。如果您想探索该选项，则至少应该在不同的SO帖子中进行。 - Chris Trahey

1

经过一些实验，这似乎是持续跟踪文件最简单和最可靠的方法：$stream = popen("tail -n 500 -f $filename", "r"); 您可以安全使用 fgets() 而无需担心它会占用 CPU。 - Matthew

显示剩余3条评论