多目的地嵌套 while 循环计算距离

5
我现在搞得很糟糕。我想使用严格和警告的方式创建一个脚本(对我来说仍然是一个挑战;)。但是现在我完全迷失了方向。我已经看了很多例子,所以非常困惑。我正在尝试使用lat/lon计算2个点之间的距离。我认为我已经通过gis::distance解决了这个问题。但是问题是,我正在尝试找到彼此之间距离在5000米以内的目的地。(如果目的地相同,则跳过)。因此,当它找到一个距离另一个目的地在5000米以内的目的地时,我希望将其放置在第一个文件中最后一个元素之后。
两个输入文件都相同,以下是它们的外观。两个文件都有约45k行。
Europe;3;France;23;Parijs;42545;48,856555;2,350976
Europe;3;France;23;Parisot;84459;44,264381;1,857827
Europe;3;France;23;Parlan;11337;44,828976;2,172435
Europe;3;France;23;Parnac;35670;46,4533;1,4425
Europe;3;France;23;Parnans;22065;45,1097;5,1456

假设这些目的地中有两个彼此靠近,我想要这样输出它们:
Europe;3;France;23;Parijs;42545;48,856555;2,350976;Parlan;11337;200
Europe;3;France;23;Parisot;84459;44,264381;1,857827;
Europe;3;France;23;Parlan;11337;44,828976;2,172435;
Europe;3;France;23;Parnac;35670;46,4533;1,4425;Parisot;84459;2000;Parnans;22065;350
Europe;3;France;23;Parnans;22065;45,1097;5,1456;

实际结果当然会匹配多于2个。在输出文件中,已匹配的目的地、目的地ID和计算距离将被添加。每个目的地可能都有多个匹配项。这真的很难解释,哈哈。正如我所说,我使用了strict&;warnings并将错误缩小到最小,但仍未完全解决。以下是这些错误:

Global symbol "$infile1" requires explicit package name at E:\etc.pl line 17.
Execution of E:\etc.pl aborted due to compilation errors.

这是我目前的代码。我的声明让我很糊涂,需要帮助。

有人可以帮我吗?(也许这不是最有效的方法,但现在它可以帮助我逐步了解 Perl)

use strict;
use warnings;
use GIS::Distance::Lite qw(distance);

my $inputfile1 = shift || die "Give input!\n";
my $inputfile2 = shift || die "Give more input!\n";
my $outputfile = shift || die "Give output!\n";

open my $INFILE1, '<', $inputfile1  or die "In use/Not found :$!\n";
open my $INFILE2, '<', $inputfile2  or die "In use/Not found :$!\n";
open my $OUTFILE, '>', $outputfile  or die "In use/Not found :$!\n";

my $maxdist = 5000;
my $mindist = 0.0001;

while ( my @infile1 ){ 
    my @elements = split(";",$infile1);

    my $lat1 = $elements[6];
    my $lon1 = $elements[7];

    $lat1 =~ s/,/./g;
    $lon1 =~ s/,/./g;

    seek my $infile2, 0, 0;

    print "1. $lat1\n";
    print "2. $lon1\n";

    while ( my @infile2 ){
        my @loopelements = split(";",$infile2);

        my $lat2 = $loopelements[6];
        my $lon2 = $loopelements[7];

        $lat2 =~ s/,/./g;
        $lon2 =~ s/,/./g;

        print "3. $lat1\n";
        print "4. $lon1\n";

        my $distance = distance($lat1, $lon1 => $lat2, $lon2);      # Afstand berekenen tussen latlon1 and latlon2

        print "5. $distance\n";

        my $afstand = sprintf("%.4f",$distance);

        print "6. $afstand\n";

        if (($afstand < $maxdist) and (!($elements[4] == $loopelements[4]))){ 
            push (@elements, $afstand,$loopelements[4],$loopelements[5]);
            print "7. $afstand\n";
            } else {
                next;
                }
        }

  @elements = join(";",@elements);  # add ';' to all elements
  print OUTFILE "@elements";
  #if ($i == 10) {last;}
  }
close(INFILE1);
close(INFILE2);
close(OUTFILE);

--------------- EDIT --------------

您好,我又来了。我看了您的更新代码,这是我的一个相当复杂版本哈哈。老实说,我只理解了一半。它仍然非常有用!我决定保留原始脚本设计,并采纳您的改进,但它仍然无法正常运行。如果您不介意的话,我有几个问题:

我对脚本进行了一些调整。第一个是现在跳过具有零的latlons,因为那将得到无用的结果。在同一行中,它还跳过了空单元格,这也是无用的。我已经为两个输入文件都做了这个调整。

哦,当我说elements[4]时,我想说的是elements[5],所以它应该是数字。所以我把ne换成了!=,如果我没记错的话。但我认为我又创建了一个无限循环,因为它没有循环遍历第二个文件。我知道我可能看起来很固执,但我想先理解我的原始脚本,然后再开始使用您的版本。seek函数似乎无法正常工作。以下是当前的脚本。

use strict;
use warnings;
use GIS::Distance::Lite qw(distance);

my $inputfile1 = shift || die "Give input!\n";
my $inputfile2 = shift || die "Give more input!\n";
my $outputfile = shift || die "Give output!\n";

open my $INFILE1, '<', $inputfile1  or die "In use/Not found :$!\n";
open my $INFILE2, '<', $inputfile2  or die "In use/Not found :$!\n";
open my $OUTFILE, '>', $outputfile  or die "In use/Not found :$!\n";

my $maxdist = 3000;
my $mindist = 0.0001;

while (my $infile1 = <$INFILE1> ){
  chomp $infile1;
  my @elements = split(";",$infile1);

  print "1. $elements[6]\n";
  print "2. $elements[7]\n";

  my $lat1 = $elements[6];
  my $lon1 = $elements[7];

if ((($lat1 and $lon1) ne '0') and (!($lat1 and $lon1) eq "")){
        $lat1 =~ s/,/./;
        $lon1 =~ s/,/./;
        print "lat1: $lat1\n";
        print "lon1: $lon1\n";  
        } else {
            next;
            }

  print "3. $lat1\n";
  print "4. $lon1\n";

  seek $INFILE2, 0, 0;

  while ( my $infile2 = <$INFILE2> ){
    chomp $infile2;
    my @loopelements = split(";",$infile2);

print "5. $elements[6]\n";
print "6. $elements[7]\n";

    my $lat2 = $loopelements[6];
    my $lon2 = $loopelements[7];

if ((($lat2 and $lon2) ne '0') and (!($lat2 and $lon2) eq "")){
        $lat2 =~ s/,/./;
        $lon2 =~ s/,/./;
        print "lat2: $lat1\n";
        print "lon2: $lon1\n";  
        } else {
            next;
            }

my $distance = distance($lat1, $lon1 => $lat2, $lon2);      # Afstand berekenen tussen latlon1 and latlon2

print "7. $distance\n";

my $afstand = sprintf("%.4f",$distance);

print "8. $afstand\n";

if ($afstand < $maxdist && $elements[4] != $loopelements[4]){ 
  push (@elements, $afstand, $loopelements[4],$loopelements[5]);
  print "9. $afstand\n";
    } else {
        next;
        }
  }
print $OUTFILE join(";",@elements), "\n";
}

close($INFILE1);
close($INFILE2);
close($OUTFILE);
1个回答

5
你已经做得很好了。现在让我们来看看你的错误消息。
这个很简单。在Perl中,所有变量名称都是区分大小写的。在顶部,你创建了一个词法变量$INFILE1。我会在稍后更详细地讨论词法。
全局符号"$infile1"需要在E:\ etc.pl第17行显式指定包名称。
open my $INFILE1, '<', $inputfile1  or die "In use/Not found :$!\n";

在这里,你把它全部大写了,这样做也没问题。如果这样有助于你记住它是一个词汇文件句柄(文件句柄曾经是全局的,并且像INFILE1这样命名),你可以这样做。但是,在后面(第17行)你使用了$infile1
my @elements = split(";",$infile1);

你没有声明那个变量(使用my),所以它会抛出这个错误。但这还不是全部。
我认为你正在尝试从那个文件句柄中读取数据。但这并不起作用。我将逐步解释这个问题。 - 实际上,你已经构建了一个无限循环,但你还没有意识到它。
    while ( my @infile1 ){ 

这个while循环永远不会停止。使用my声明@infile1总是返回true值,因为它总是有效的。所以你永远无法打破循环。

  • I guess you're trying to read the file line by line. So let's see how we can do that:

    while (my $infile1 = <$INFILE1> ){ 
      my @elements = split(";",$infile1);
    

    You need to read from the file like this. Now the assignment in the while loop's head will be true only as long as there is a line returned from the file handle. Once it is at the end of the file, it will return undef, thus ending the loop. Yay. Also note how your $infile1 in the next line with split now is correct.

    You also need to add chomp to the mix, because there are new line characters at the end of the file:

    while (my $infile1 = <$INFILE1> ){ 
      chomp $infile1;
      my @elements = split(";",$infile1);
    
  • Next is the seek line. This looks like you want to read the second file from the beginning for each line of the first file. That makes sense in a way, but is very inefficient. I'll talk about that later. You do need to change the my though. You don't have to create a new variable here. Also, use the correct name:

    seek $INFILE2, 0, 0;
    
  • Let's instead fix the second while loop:

    while (my $infile2 = <$INFILE2>){
      chomp $infile2;
      my @loopelements = split(";",$infile2);
    
  • The next thing I noticed was in line 42:

    my $distance = distance($lat1, $lon1 => $lat2, $lon2);
    

    Don't worry, there is nothing wrong here. I'd just like to note that the => is another way to write a comma (,). It's called the fat comma sometimes and it makes it easier to read, for example, hash assignments.

  • In line 50 you've already got the distance.

    if (($afstand < $maxdist) and (!($elements[4] == $loopelements[4]))){     
    

    and is usually used to do error checking. See the perldoc as to why. You should use && instead. Because it has higher precedence, you can leave out the parenthesis. You can also change your !($a == $b) construct to use the != operator instead. But since it holds the city name, and that is a string and not a number, you need to use ne, which is the opposite of eq. So this line now becomes:

    if ($afstand < $maxdist && $elements[4] ne $loopelements[4]){
    

    It's a lot better to read, isn't it?

  • In line 58 you join your array @elements and assign it to itself. That is rather strange. It will replace the array with a new array that has only one element - the joined string. Let's leave that line until the next bullet and look at it then.

  • In line 59 you've got a print statement, but you are now using a global file handle OUTFILE that you have never created. Instead, you need to use your lexical file handle from the top, $OUTFILE. If we now add the join from the line above directly to the print statement and also add a \n new line character at the end, the line becomes:

    print $OUTFILE join(";",@elements), "\n";
    
  • Now only the last part remains: You need to close the file handles, but again you're using global ones. Use your lexical ones instead:

    close($INFILE1);
    close($INFILE2);
    close($OUTFILE);
    

完整的代码现在看起来像这样:

use strict;
use warnings;
use GIS::Distance::Lite qw(distance);

my $inputfile1 = shift || die "Give input!\n";
my $inputfile2 = shift || die "Give more input!\n";
my $outputfile = shift || die "Give output!\n";

open my $INFILE1, '<', $inputfile1  or die "In use/Not found :$!\n";
open my $INFILE2, '<', $inputfile2  or die "In use/Not found :$!\n";
open my $OUTFILE, '>', $outputfile  or die "In use/Not found :$!\n";

my $maxdist = 5000;
my $mindist = 0.0001;

while (my $infile1 = <$INFILE1> ){
  chomp $infile1;
  my @elements = split(";",$infile1);

  my $lat1 = $elements[6];
  my $lon1 = $elements[7];

  $lat1 =~ s/,/./g;
  $lon1 =~ s/,/./g;

  print "1. $lat1\n";
  print "2. $lon1\n";

  seek $INFILE2, 0, 0;

  while ( my $infile2 = <$INFILE2> ){
    chomp $infile2;
    my @loopelements = split(";",$infile2);

    my $lat2 = $loopelements[6];
    my $lon2 = $loopelements[7];

    $lat2 =~ s/,/./g;
    $lon2 =~ s/,/./g;

    print "3. $lat1\n";
    print "4. $lon1\n";

    my $distance = distance($lat1, $lon1 => $lat2, $lon2);      # Afstand berekenen tussen latlon1 and latlon2

    print "5. $distance\n";

    my $afstand = sprintf("%.4f",$distance);

    print "6. $afstand\n";

    if ($afstand < $maxdist && $elements[4] ne $loopelements[4]){ 
      push (@elements, $afstand,$loopelements[4],$loopelements[5]);
      print "7. $afstand\n";
    } else {
      next;
    }
  }

  print $OUTFILE join(";",@elements), "\n";
}

close($INFILE1);
close($INFILE2);
close($OUTFILE);

现在来说一下你的算法是如何工作的:先完整地读取第二个文件,然后在每次迭代中再与第一个文件进行比较,这样更加高效。这样只需要读取文件一次。
use strict;
use warnings;
use GIS::Distance::Lite qw(distance);
use feature qw(say);

my $inputfile1 = shift || die "first file missing";
my $inputfile2 = shift || die "second file missing";
my $outputfile = shift || die "output file missing!";

# Read the second file first
my @file2; # save the lines of INFILE2 as array refs
open my $INFILE2, '<', $inputfile2  or die "In use/Not found :$!";
while ( my $infile2 = <$INFILE2> ){ 
  chomp $infile2;
  my @loopelements = split(/;/, $infile2);

  $loopelements[6] =~ y/,/./;
  $loopelements[7] =~ y/,/./;

  push @file2, \@loopelements;
}
close($INFILE2);

open my $INFILE1, '<', $inputfile1  or die "In use/Not found :$!";
open my $OUTFILE, '>', $outputfile  or die "In use/Not found :$!";

my $maxdist = 5000;
my $mindist = 0.0001;

while (my $infile1 = <$INFILE1> ){
  chomp $infile1;
  my @elements = split(";",$infile1);

  my $lat1 = $elements[6];
  my $lon1 = $elements[7];

  $lat1 =~ y/,/./;
  $lon1 =~ y/,/./;

  say "1. $lat1";
  say "2. $lon1";

  FILE2: foreach my $loopelements ( @file2 ){
    my ($lat2, $lon2) = @$loopelements[6, 7];

    say "3. $lat2";
    say "4. $lon2";

    my $distance = distance($lat1, $lon1 => $lat2, $lon2);      # Afstand berekenen tussen latlon1 and latlon2

    say "5. $distance";

    my $afstand = sprintf("%.4f",$distance);

    say "6. $afstand";

    if ($afstand < $maxdist && $elements[4] ne $$loopelements[4]){ 
      push (@elements, $afstand, $$loopelements[4], $$loopelements[5]);
      say "7. $afstand";
    } else {
      next FILE2;
    }
  }

  say $OUTFILE join(";",@elements);
}

close($INFILE1);
close($OUTFILE);

现在,让我们看看我改变了什么。
  • 首先,我在顶部添加了use feature qw(say)sayprint相同,但会添加一个新行。这样可以节省一些打字时间。还可以查看feature获取更多信息。
  • 我从所有die语句中删除了"\n"字符。如果在那里放置一个新行,则会从输出中删除行号。如果这是有意的,请忽略此建议。以下是Perldoc对此的说明:

    如果LIST的最后一个元素没有以换行符结尾,则还会打印当前脚本行号和输入行号(如果有),并提供一个换行符。

  • 最重要的部分是我所做的算法更改。我将第二个文件的while循环移动到程序的顶部,以在其它while循环之外。该文件被读入到数组@file2中。每个元素都包含一组带有行字段的数组引用。逗号已经被更改为句号。

    我将s///替换运算符更改为y///(简称tr///)转译运算符。由于只更改一个符号,因此这已足够。它也更快。即使您保留正则表达式替换,也不需要/g修饰符,因为浮点数只有一个逗号,所以不必进行多次替换。

    现在所有这些事情只针对文件2执行一次。这在执行40k+次时节省了相当多的计算时间。

  • 我更改了错误消息的措辞,以便更好地理解。这是个人偏好。您不必这样做。

  • 我将第二个while更改为foreach循环,以迭代新的@file2数组的元素。我确实保留了$lat2$lon2变量以增加清晰度。您可以省略这些变量,并直接使用数组(引用)元素。在赋值中,我使用了数组切片将其放入一行中。

  • 由于$loopelements替换了@loopelements并且它是一个数组引用,因此我们现在需要使用$$loopelements[$index]访问其中存储的数据。

我希望这能帮助你理解为什么我进行了某些改进。

请记住,在Perl中有多种方法可以完成任务 - 这是一件好事。很少有"正确的方法",但通常有许多达到目标的方法。其中一些效率更高,而其他一些则更易于维护。诀窍在于找到这两种情况之间的平衡点。


更新:

这里是我使用的输入文件。您需要它们来比较结果。

file1.csv:

Europe;3;France;23;Parijs;42545;48,856555;2,350976
Europe;3;France;23;Parisot;84459;44,264381;1,857827
Europe;3;France;23;Parlan;11337;44,828976;2,172435
Europe;3;France;23;Parnac;35670;46,4533;1,4425
Europe;3;France;23;Parnans;22065;45,1097;5,1456

file2.csv:

Europe;3;France;23;Parlan;11337;44,828976;2,172435
Europe;3;France;23;Parnac;35670;46,4533;1,4425
Europe;3;France;23;Parnans;22065;45,1097;5,1456
Europe;3;France;23;Parijs;42545;48,856555;2,350976
Europe;3;France;23;Parisot;84459;44,264381;1,857827

我将会给这个添加建议。 - simbabque
我还在修复故障。我马上就会处理。谢谢@cjm。 - simbabque
我稍后会添加其他内容。 - simbabque
只需读取一次,像我在编辑中所做的那样将其存储在数据结构中,并在外部和内部循环中遍历该数据结构。不要尝试干扰文件句柄。将其存储在内存中更加高效。无论如何,您都不需要使用seek。它用于重新读取一定长度的字节。 - simbabque
@simbabque,它终于行了:) 感谢您的所有帮助和耐心;) 我一开始使用我的原始脚本,这样我就可以理解它(有点)然后开始用你的版本。大部分都很容易理解,除了数组引用。我发现这很困难。我想我需要更多地学习它。还有,在foreach前面和if中的next后面的"FILE2"。它是为了澄清还是实际的代码?我以前从未见过这样的东西。 编辑我忘了问是否有一种方法只输出在末尾推送结果的行? - Jan
显示剩余11条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接