使用 Powershell / Perl 将多个 CSV 文件合并成一个文件？

Question

使用 Powershell / Perl 将多个 CSV 文件合并成一个文件？

6

我有以下CSV文件，我想将它们合并成一个CSV文件。

01.csv

apples,48,12,7
pear,17,16,2
orange,22,6,1

02.csv

apples,51,8,6
grape,87,42,12
pear,22,3,7

03.csv

apples,11,12,13
grape,81,5,8
pear,11,5,6

04.csv

apples,14,12,8
orange,5,7,9

希望输出：

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,87,42,12,81,5,8,,,
pear,17,16,2,22,3,7,11,5,6,,,
orange,22,6,1,,,,,,5,7,9

“有人能提供如何实现这个的指导吗？最好使用PowerShell，但如果Perl更容易也可以考虑其他替代方案。”

“谢谢Pantik，您的代码输出接近我想要的结果：”

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,87,42,12,81,5,8
orange,22,6,1,5,7,9
pear,17,16,2,22,3,7,11,5,6

很不幸，当CSV文件中没有相关条目时我需要“占位符”逗号，例如orange,22,6,1,,,,,,5,7,9而不是orange,22,6,1,5,7,9。

更新：我希望这些按文件名顺序解析，例如：

$myFiles = @(gci *.csv) | sort Name
foreach ($file in $myFiles){

问候泰德

- ted

1

看起来你想按文件名排序数据。例如，2.csv 和 3.csv 中的 orange 有空记录。如果这是一个要求，你应该在问题中加以说明。 - TLP

6个回答

2

第二个 Powershell 解决方案（按要求）

   $produce = @()
   $produce_hash = @{}
    $file_count = -1
    $myFiles = @(gci 0*.csv) | sort Name
     foreach ($file in $myFiles){ 
        $file_count ++
        $file_hash = @{}
                get-content $file | foreach-object{
                $line = $_.split(",")

                if ($produce -contains $line[0]){
                    $file_hash[$line[0]] += $line[1..3]
                    }

                else {
                    $produce += $line[0]
                    $file_hash[$line[0]] = @(,$line[0]) + (@($null) * 3 *  $file_count) + $line[1..3]
                    }

                  }
              $produce | foreach-object { 
                if ($file_hash[$_]){$produce_hash[$_] += $file_hash[$_]} 
                else {$produce_hash[$_] += @(,$null) * 3}
               }

    }          

    $ofs = ","
    $out = @()
    $produce_hash.keys | foreach-object {
     $out += [string]$produce_hash[$_]
     }

    $out | out-file "outputfile.csv" 

    gc outputfile.csv
apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,

- mjolinor

2

好的，gangabass的解决方案可行，而且比我的更酷，但我还是要加上我的。它稍微严格一些，并保留了一个可以使用的数据结构。所以，尽情享受吧。;)

use strict;
use warnings;

opendir my $dir, '.' or die $!;
my @csv = grep (/^\d+\.csv$/i, readdir $dir);
closedir $dir;
# sorting numerically based on leading digits in filename
@csv = sort {($a=~/^(\d+)/)[0] <=> ($b=~/^(\d+)/)[0]} @csv;

my %data;

# To print empty records we first need to know all the names
for my $file (@csv) {
    open my $fh, '<', $file or die $!;
    while (<$fh>) {
        if (m/^([^,]+),/) {
            @{ $data{$1} } = ();
        }
    }
    close $fh;
}

# Now we can fill in values
for my $file (@csv) {
    open my $fh, '<', $file or die $!;
    my %tmp;
    while (<$fh>) {
        chomp;
        next if (/^\s*$/);
        my ($tag,@values) = split (/,/);
        $tmp{$tag} = \@values;
    }
    for my $key (keys %data) {
        unless (defined $tmp{$key}) {
            # Fill in empty values
            @{$tmp{$key}} = ("","","");
        }
        push @{ $data{$key} }, @{ $tmp{$key} };
    }
}

&myreport; 

sub myreport {
    for my $key (sort keys %data) {
        print "$key," . (join ',', @{$data{$key}}), "\n";
    }
}

- TLP

2

Powershell：

$produce = "apples","grape","orange","pear"
$produce_hash = @{}
$produce | foreach-object {$produce_hash[$_] = @(,$_)}

$myFiles = @(gci *.csv) | sort Name
 foreach ($file in $myFiles){ 
    $file_hash = @{}
    $produce | foreach-object {$file_hash[$_] = @($null,$null,$null)}
        get-content $file | foreach-object{
            $line = $_.split(",")
            $file_hash[$line[0]] = $line[1..3]
            }
    $produce | foreach-object {
        $produce_hash[$_] += $file_hash[$_]
        }
  }

$ofs = ","
$out = @()
$produce | foreach-object {
 $out += [string]$produce_hash[$_]
 }

$out | out-file "outputfile.csv" 

gc outputfile.csv

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,

这应该很容易修改以适应其他项目。只需将它们添加到$produce数组中即可。

- mjolinor

谢谢mjolinor，这个能修改一下，使得不需要手动输入$produce数组中的项目...因为事先可能不知道会有哪些项目... - ted

可能有两种方法可以做到这一点：1- 通过两次读取数据，使用第一遍来收集第一个元素的唯一值以构建$produce数组。2- 设置一个计数器，并在处理每个文件时递增，以便您知道需要在该项的第一组值之前添加多少个$null数组。哪种方法最好可能取决于您的数据文件有多少/有多大。 - mjolinor

发布了第二个解决方案，可以自动填充$produce。 - mjolinor

1

你需要解析这些文件，我没有看到更容易的方法来做到这一点。

解决方案是使用PowerShell：

更新：好的，稍微调整了一下 - 希望更容易理解。

$items = @{}
$colCount = 0 # total amount of columns
# loop through all files
foreach ($file in (gci *.csv | sort Name))
{
    $content = Get-Content $file
    $itemsToAdd = 0; # columns added by this file
    foreach ($line in $content)
    {
        if ($line -match "^(?<group>\w+),(?<value>.*)") 
        { 
            $group = $matches["group"]
            if (-not $items.ContainsKey($group)) 
            {   # in case the row doesn't exists add and fill with empty columns
                $items.Add($group, @()) 
                for($i = 0; $i -lt $colCount; $i++) { $items[$group] += "" }
            }

            # add new values to correct row
            $matches["value"].Split(",") | foreach { $items[$group] += $_ }
            $itemsToAdd = ($matches["value"].Split(",") | measure).Count # saves col count
        } 
    }

    # in case that file didn't contain some row, add empty cols for those rows
    $colCount += $itemsToAdd
    $toAddEmpty = @()
    $items.Keys | ? { (($items[$_] | measure).Count -lt $colCount) } | foreach { $toAddEmpty += $_ }
    foreach ($key in $toAddEmpty) 
    {   
        for($i = 0; $i -lt $itemsToAdd; $i++) { $items[$key] += "" }
    }
}

# output
Remove-Item "output.csv" -ea 0
foreach ($key in $items.Keys)
{
    "$key,{0}" -f [string]::Join(",", $items[$key]) | Add-Content "output.csv"
}

输出：

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,

- Tomas Panik

感谢PantikT的努力，非常感谢 - 请查看我对问题的更新以获得反馈，因为这并没有完全生成我想要的输出。 - ted

0

这里有一种更简洁的方法来实现它。然而，当项目缺失时，它仍然不会添加逗号。

Get-ChildItem D:\temp\a\ *.csv | 
    Get-Content |
    ForEach-Object -begin { $result=@{} } -process {
        $name, $otherCols = $_ -split '(?<=\w+),'
        if (!$result[$name]) { $result[$name] = @() }
        $result[$name] += $otherCols
    } -end {
        $result.GetEnumerator() | % {
            "{0},{1}" -f $_.Key, ($_.Value -join ",")
        }
    } | Sort

- stej

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- gangabass · Accepted Answer

这是我的 Perl 版本：

use strict;
use warnings;

my $filenum = 0;

my ( %fruits, %data );
foreach my $file ( sort glob("*.csv") ) {

    $filenum++;
    open my $fh, "<", $file or die $!;

    while ( my $line = <$fh> ) {

        chomp $line;

        my ( $fruit, @values ) = split /,/, $line;

        $fruits{$fruit} = 1;

        $data{$filenum}{$fruit} = \@values;
    }

    close $fh;
}
foreach my $fruit ( sort keys %fruits ) {

    print $fruit, ",", join( ",", map { $data{$_}{$fruit} ? @{ $data{$_}{$fruit} } : ",," } 1 .. $filenum ), "\n";
}

这给了我：

apples,48,12,7,51,8,6,11,12,13,14,12,8
grape,,,,87,42,12,81,5,8,,,
orange,22,6,1,,,,,,,5,7,9
pear,17,16,2,22,3,7,11,5,6,,,

那么，你是在葡萄这个单词上打错了还是我理解错了什么？