根据特定列对CSV进行排序？

Question

根据特定列对CSV进行排序？

perlsorting

4

我确定我以前做过这个，但我忘了一些小细节。如何按特定列对CSV文件进行排序？我感兴趣的是有和没有第三方Perl模块的答案。主要是没有的方法，因为我并不总是能够安装额外的模块。

示例数据：

name,25,female name,24,male name,27,female name,21,male 在第二个数字列上排序后期望的最终结果：

name,21,male name,24,male name,25,female name,27,female

- ckarl787

7个回答

8

在始终有另一种方法的精神下，记住普通的GNU sort可能已经足够了。

$ sort -t, -k2 -n unsorted.txt
name,21,male
name,24,male
name,25,female
name,27,female

命令行参数的位置：

-t, # use comma as the record separator
-k2 # sort on the second key (record) in the line
-n  # sort using numerical comparison (like using <=> instead of cmp in perl)

如果你想要一个Perl解决方案，可以使用qx()进行包装。

- Simon Whitaker

6

还有 DBD::CSV：

#!/usr/bin/perl

use strict; use warnings;
use DBI;

my $dbh = DBI->connect('dbi:CSV:', undef, undef, {
    RaiseError => 1,
    f_ext => '.csv',
    csv_tables => { test => { col_names => [qw' name age sex '] } },
});

my $sth = $dbh->prepare(q{
    SELECT name, age, sex FROM test ORDER BY age
});

$sth->execute;

while ( my @row = $sth->fetchrow_array ) {
    print join(',' => @row), "\n";
}

$sth->finish;
$dbh->disconnect;

输出:

姓名,21岁,男性
姓名,24岁,男性
姓名,25岁,女性
姓名,27岁,女性

- Sinan Ünür

3

原帖中要求不使用第三方模块（我理解为不使用来自CPAN的任何东西）。虽然这种限制会极大地限制你编写良好的现代Perl代码的能力，但在这种情况下，可以使用（核心）Text::ParseWords模块来替代（非核心）Text::CSV。因此，从Alan的示例中大量借鉴，我们得到：

#!/usr/bin/env perl

use strict;
use warnings;

use Text::ParseWords;

my @rows;

while (<DATA>) {
    push @rows, [ parse_line(',', 0, $_) ];
}

@rows = sort { $a->[1] <=> $b->[1] } @rows;

foreach (@rows) {
    print join ',', @$_;
}

__DATA__
name,25,female
name,24,male
name,27,female
name,21,male

- Dave Cross

0

当你提供自己的比较代码时，可以对任何内容进行排序。只需使用正则表达式提取所需元素，或在这种情况下可能使用分割，然后在该元素上进行比较。如果有很多元素，我会将数据解析为一个列表的列表，然后比较代码可以访问它而无需解析。这将消除重复解析同一行，因为它与其他行进行比较。

- JOTN

0

使用 Raku（前身为 Perl6）

这是一个相当快速且简单的解决方案，主要用于“手动制作”的 CSV。只要每行只有一个（1）年龄，代码就可以正常工作：读取行$a，组合 1 到 3 个被逗号包围的<digit> 并分配给 @b，派生排序索引 $c，使用 $c 对行 $a 进行重新排序：

~$ raku -e 'my $a=lines();  my @b=$a.comb(/ \, <(\d**1..3)> \, /).pairs;  my $c=@b.sort(*.values)>>.keys.flat;  $a[$c.flat]>>.put;' sort_age.txt
name,21,male
name,24,male
name,25,female
name,27,female

我在OP的输入文件中添加了一些虚拟行，以查看上面的代码如何处理以下情况：1）空年龄字段，2）空字符串“”作为年龄，3）虚假的“9999”作为年龄，以及4）虚假的“NA”作为年龄。上面的代码失败得很惨。要修复这个问题，您必须编写一个三元运算符，当正则表达式无法匹配一行时插入数值占位符（例如零）。

下面是一个更长但更强大的解决方案。请注意-我使用占位符值999将具有空/无效年龄的行移动到底部：

~$ raku -e 'my @a=lines(); my @b = do for @a {if $_ ~~ m/ \, <(\d**1..3)> \, / -> { +$/ } else { 999 }; }; my $c=@b.pairs.sort(*.values)>>.keys.flat;  @a[$c.flat]>>.put;' sort_age.txt
name,21,male
name,24,male
name,25,female
name,27,female
name,,male
name,"",female
name,9999,male
name,NA,male

要进行反向排序，请在创建 $c 的方法链末尾添加 .reverse。同样，更改 else 占位符参数以将没有有效年龄的行移动到顶部或底部。此外，可以使用三元运算符编写上面的 @b 创建： my @b = do for @a {(m/ \, <(\d**1..3)> \, /) ?? +$/ !! 999 }; 作为替代方案。

以下是未排序的输入文件：

$ cat sort_age.txt
name,,male
name,"",female
name,9999,male
name,NA,male
name,25,female
name,24,male
name,27,female
name,21,male

HTH.

https://raku.org/

- jubilatious1

-1

我会这样做：

#!/usr/bin/perl
use warnings;
use strict;

my @rows = map { chomp; [split /[,\s]+/, $_] } <DATA>; #read each row into an array
my @sorted = sort { $a->[1] <=> $b->[1] } @rows; # sort the rows (numerically) by second column

for (@sorted) {
  print join(', ', @$_) . "\n"; # print them out as CSV
}

__DATA__
name,25,female
name,24,male
name,27,female
name,21,male

- speedarius

2

没问题，只要你的名字中没有“John Doe, Esq.”。 - reinierpost

1

我们有像Text::CSV这样的CSV解析模块是有原因的。在一般情况下，仅仅通过逗号进行分割是不够的。 - Dave Cross

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Alan Haggai Alavi · Accepted Answer

由于CSV是一个相当复杂的格式，最好使用一个模块来为我们完成工作。

以下是使用Text::CSV模块的示例：

#!/usr/bin/env perl

use strict;
use warnings;

use constant AGE => 1;

use Text::CSV;

my $csv = Text::CSV->new();

my @rows;
while ( my $row_ref = $csv->getline( \*DATA ) ) {
    push @rows, $row_ref;
}

@rows = sort { $a->[AGE] <=> $b->[AGE] } @rows;

for my $row_ref (@rows) {
    $csv->combine(@$row_ref);
    print $csv->string(), "\n";
}

__DATA__
name,25,female
name,24,male
name,27,female
name,21,male