如何使用PowerShell打印文件中的某一行？

Question

如何使用PowerShell打印文件中的某一行？

powershell

42

我在这台服务器上没有一个像样的文本编辑器，但我需要查看某个文件中第10行错误的原因。不过我有PowerShell...

- northben

5

(get-content myfile.txt)[9] 是什么意思？ - CB.

2

是的，问题在于处理大文件时速度会非常慢，因为需要在返回[index]之前读取整个文件。 - CB.

我在Windows Powershell中尝试使用(get-content myfile.txt)[9]。 - Steve Staple

7个回答

23

这将显示myfile.txt文件的第10行：

get-content myfile.txt | select -first 1 -skip 9

-first和-skip都是可选参数，在类似情况下，-context或-last也可能有用。

- northben

4

这对于小文件来说是有效的。除非有什么改变，Get-Content 会将整个文件读入内存中。这在处理大文件时并不总是有效。 - lit

12

你可以使用 Get-Content 命令的 -TotalCount 参数读取前 n 行，然后使用 Select-Object 返回第 n 行：

Get-Content file.txt -TotalCount 9 | Select-Object -Last 1;

根据 @C.B. 的评论，这样做可以提高性能，只需读取到第 n 行，而不是整个文件。请注意，您可以使用别名 -First 或 -Head 替换 -TotalCount。

- Lance U. Matthews

7

只是为了好玩，这里有一些测试：

# Added this for @Graimer's request ;) (not same computer, but one with HD little more
# performant...)
> measure-command { Get-Content ita\ita.txt -TotalCount 260000 | Select-Object -Last 1 }


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 28
Milliseconds      : 893
Ticks             : 288932649
TotalDays         : 0,000334412788194444
TotalHours        : 0,00802590691666667
TotalMinutes      : 0,481554415
TotalSeconds      : 28,8932649
TotalMilliseconds : 28893,2649


> measure-command { (gc "c:\ps\ita\ita.txt")[260000] }


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 9
Milliseconds      : 257
Ticks             : 92572893
TotalDays         : 0,000107144552083333
TotalHours        : 0,00257146925
TotalMinutes      : 0,154288155
TotalSeconds      : 9,2572893
TotalMilliseconds : 9257,2893


> measure-command { ([System.IO.File]::ReadAllLines("c:\ps\ita\ita.txt"))[260000] }


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 234
Ticks             : 2348059
TotalDays         : 2,71766087962963E-06
TotalHours        : 6,52238611111111E-05
TotalMinutes      : 0,00391343166666667
TotalSeconds      : 0,2348059
TotalMilliseconds : 234,8059



> measure-command {get-content .\ita\ita.txt | select -index 260000}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 36
Milliseconds      : 591
Ticks             : 365912596
TotalDays         : 0,000423509949074074
TotalHours        : 0,0101642387777778
TotalMinutes      : 0,609854326666667
TotalSeconds      : 36,5912596
TotalMilliseconds : 36591,2596

获胜者是：([System.IO.File]::ReadAllLines(路径))[索引]。

- CB.

@Bacon的回答怎么样？既然你已经有了一个示例文件 :-) - Frode F.

@Graimer 已添加 :)。所有这些测试都旨在在大文件中寻找大索引，我认为对于小索引值，结果可能会有所不同。每个测试都在新的 PowerShell 会话中完成，以避免硬盘预缓存功能的影响。 - CB.

2

我真的很惊讶ReadAllLines()不仅更快，而且比两个Get-Content的用法快得多。正如其名称所示，它也在读取整个文件。无论如何，如果您想尝试另一种方法，我已经发布了另一种方法。此外，每当我使用Measure-Command来对代码进行基准测试时，我通常会像这样运行它：1..10 | % { Measure-Command { ... } } | Measure-Object TotalMilliseconds -Average -Min -Max -Sum;这样我可以从多次测试运行中获得更准确的数字。 - Lance U. Matthews

我遇到了这个错误： Exception calling "ReadAllLines" with "1" argument(s): "Array dimensions exceeded supported range." At line:1 char:1 - Nate Anderson

7

这是一个直接使用.NET的System.IO类的函数：

function GetLineAt([String] $path, [Int32] $index)
{
    [System.IO.FileMode] $mode = [System.IO.FileMode]::Open;
    [System.IO.FileAccess] $access = [System.IO.FileAccess]::Read;
    [System.IO.FileShare] $share = [System.IO.FileShare]::Read;
    [Int32] $bufferSize = 16 * 1024;
    [System.IO.FileOptions] $options = [System.IO.FileOptions]::SequentialScan;
    [System.Text.Encoding] $defaultEncoding = [System.Text.Encoding]::UTF8;
    # FileStream(String, FileMode, FileAccess, FileShare, Int32, FileOptions) constructor
    # http://msdn.microsoft.com/library/d0y914c5.aspx
    [System.IO.FileStream] $input = New-Object `
        -TypeName 'System.IO.FileStream' `
        -ArgumentList ($path, $mode, $access, $share, $bufferSize, $options);
    # StreamReader(Stream, Encoding, Boolean, Int32) constructor
    # http://msdn.microsoft.com/library/ms143458.aspx
    [System.IO.StreamReader] $reader = New-Object `
        -TypeName 'System.IO.StreamReader' `
        -ArgumentList ($input, $defaultEncoding, $true, $bufferSize);
    [String] $line = $null;
    [Int32] $currentIndex = 0;

    try
    {
        while (($line = $reader.ReadLine()) -ne $null)
        {
            if ($currentIndex++ -eq $index)
            {
                return $line;
            }
        }
    }
    finally
    {
        # Close $reader and $input
        $reader.Close();
    }

    # There are less than ($index + 1) lines in the file
    return $null;
}

GetLineAt 'file.txt' 9;

调整$bufferSize变量可能会影响性能。更简洁的版本可以使用默认缓冲区大小，并且不提供优化提示，就像这样：

function GetLineAt([String] $path, [Int32] $index)
{
    # StreamReader(String, Boolean) constructor
    # http://msdn.microsoft.com/library/9y86s1a9.aspx
    [System.IO.StreamReader] $reader = New-Object `
        -TypeName 'System.IO.StreamReader' `
        -ArgumentList ($path, $true);
    [String] $line = $null;
    [Int32] $currentIndex = 0;

    try
    {
        while (($line = $reader.ReadLine()) -ne $null)
        {
            if ($currentIndex++ -eq $index)
            {
                return $line;
            }
        }
    }
    finally
    {
        $reader.Close();
    }

    # There are less than ($index + 1) lines in the file
    return $null;
}

GetLineAt 'file.txt' 9;

- Lance U. Matthews

1

过度工程化：查看BACON在SO上的解决方案，可以快速读取文本文件。 :) - northben

5

我在寻找如何处理一个大文件时偶然发现了这个问题——正是我所需要的。 - Tao

@Tao 谢谢。很高兴“有人”发现这个有用。有时候内置的PowerShell cmdlets不能给你所需的控制或效率，特别是像你说的那样，当处理大文件时。 - Lance U. Matthews

+1 给Northben有关过度工程的（有趣）解释。 +1 给Bacon对他的努力。 - prabhakaran

3

为了减少内存消耗并加快搜索速度，您可以使用Get-Content命令的-ReadCount选项（https://technet.microsoft.com/ru-ru/library/hh849787.aspx）。当您处理大文件时，这可能节省数小时时间。以下是一个示例：

$n = 60699010
$src = 'hugefile.csv'
$batch = 100
$timer = [Diagnostics.Stopwatch]::StartNew()

$count = 0
Get-Content $src -ReadCount $batch -TotalCount $n | %  { 
    $count += $_.Length
    if ($count -ge $n ) {
        $_[($n - $count + $_.Length - 1)]
    }
}

$timer.Stop()
$timer.Elapsed

这将打印第 $n 行和经过的时间。

- ganisimov

1

我知道这是一个老问题了，但尽管它是该主题中最受关注的问题之一，但没有一个答案完全令我满意。使用Get-Content很容易，但当涉及到真正大的文本文件时（例如 >5 GB），它就会显露出局限性。

我找到了一种解决方案，它不需要将整个文件加载到主内存中，并且比Get-Content更快（几乎像Linux上的sed那样快，如this）：

[Linq.Enumerable] :: ElementAt([System.IO.File] :: ReadLines("<path_to_file>"), <index>) 

在我的计算机上，这大约需要4秒钟才能找到~4.5 GB文件中间的一行，而(Get-Content -Path <path_to_file> -TotalCount <index>) [-1] 则需要大约35秒钟。

- gscaparrotti

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Frode F. · Accepted Answer

41

使用select就像是小菜一碟。

Get-Content file.txt | Select-Object -Index (line - 1)

例如，获取第5行

Get-Content file.txt | Select-Object -Index 4

或者你可以使用：

(Get-Content file.txt)[4]

- Frode F.

4

这样做不会先将整个内容检索到内存中吗？这是不好的。 - Stacker

请参考C.B.的回答以获取性能统计数据（时间而非内存）。如果您正在处理大文件，则这种方法效率低下。如果您只使用少量小文件，则性能并不那么重要。有时候，清晰的代码更为重要。这个回答已经有5年了 - PowerShell也发生了变化。 - Frode F.

@Stacker @Lance的回答是正确的答案（只要你需要的行在文件的开头）。基本上相当于nix的head -n。 - Hashbrown

1

十年后如此有用，谢谢你。 - Sabuncu