以下的 'RenameFilesWithAccentedAndDiacriticalLatinChars.pl' PERL 脚本可以重命名带有重音和变音符号的拉丁字符文件:
- 此 PERL 脚本从给定参数的文件夹开始,否则从当前文件夹开始。
- 它递归搜索 CP 1250、CP 1252、CP 1254 和 CP 1257 中属于 80 - FF 范围内的字符(大多是带重音的拉丁字符)或带有变音符号的拉丁字符的文件。
- 它通过仅从拉丁字符中删除重音和变音符号来计算新文件名(例如,Été --> Ete)。
- 它显示所有建议的重命名和可能的冲突,并要求用户全局批准。
- 如果用户已经批准,它将重命名所有没有冲突的文件。
选项 '--batch' 可以避免交互式问题。请谨慎使用。
选项 '--' 可以避免下一个参数被解释为选项。
特别警告:
- 此脚本最初是以UTF-8编码的,应该保持不变。
- 此脚本可能会重命名许多文件。
- 文件名理论上只使用UTF-8编码。但是有些文件名可能包含一些具有传统编码的字符。
- 作者已经努力进行了一致性检查、鲁棒性、冲突检测和适当编码的使用。因此,此脚本应该只通过删除拉丁字符的重音和变音符号来重命名文件。
- 但是,此脚本仅在有限数量的操作系统(Windows、Mac OS X、Linux)和有限数量的终端编码(CP 850、ISO-8859-1、UTF-8)下进行了测试。
- 因此,在奇怪的情况下,此脚本可能会将许多文件重命名为随机名称。
- 因此,应谨慎使用此脚本,并极其小心地修改它(注意内部字符串、输入、输出和命令的编码)。
use 5.008_000;
use warnings;
use strict;
use Encode;
$| = 1;
sub ucRemoveEolUnderscoreDash
{
local $_ = uc($_[0]);
chomp;
tr/_\-//d;
$_;
}
my $Encoding_Western = 'ISO-8859-1';
my $Encoding_Central = 'ISO-8859-2';
my $Encoding_Baltic = 'ISO-8859-4';
my $Encoding_Turkish = 'ISO-8859-9';
my $Encoding_W_Euro = 'ISO-8859-15';
my $Code_Page_OldWest = 850;
my $Code_Page_Central = 1250;
my $Code_Page_Western = 1252;
my $Code_Page_Turkish = 1254;
my $Code_Page_Baltic = 1257;
my $Code_Page_UTF8 = 65001;
my $HighBitSetChars = pack('C*', 0x80..0xFF);
my %SuperEncodings =
( &ucRemoveEolUnderscoreDash($Encoding_Western), 'cp'.$Code_Page_Western,
&ucRemoveEolUnderscoreDash($Encoding_Central), 'cp'.$Code_Page_Central,
&ucRemoveEolUnderscoreDash($Encoding_Baltic), 'cp'.$Code_Page_Baltic,
&ucRemoveEolUnderscoreDash($Encoding_Turkish), 'cp'.$Code_Page_Turkish,
&ucRemoveEolUnderscoreDash($Encoding_W_Euro), 'cp'.$Code_Page_Western,
&ucRemoveEolUnderscoreDash('cp'.$Code_Page_OldWest),
'cp'.$Code_Page_Western );
my %EncodingNames = ( 'cp'.$Code_Page_Central, 'Central European',
'cp'.$Code_Page_Western, 'Western European',
'cp'.$Code_Page_Turkish, ' Turkish ',
'cp'.$Code_Page_Baltic, ' Baltic ' );
my %NonAccenChars = (
'cp'.$Code_Page_Central,
'E_,_,.++_%S_STZZ_````.--_Ts_stzz'.
'_``LoAlS`CS_--RZ`+,l`uP.,as_L~lz'.
'RAAAALCCCEEEEIIDDNNOOOOxRUUUUYTS'.
'raaaalccceeeeiiddnnoooo%ruuuuyt`',
'cp'.$Code_Page_Western,
'E_,f,.++^%S_O_Z__````.--~Ts_o_zY'.
'_!cLoYlS`Ca_--R-`+23`uP.,10_qh3_'.
'AAAAAAACEEEEIIIIDNOOOOOxOUUUUYTS'.
'aaaaaaaceeeeiiiidnooooo%ouuuuyty',
'cp'.$Code_Page_Turkish,
'E_,f,.++^%S_O____````.--~Ts_o__Y'.
'_!cLoYlS`Ca_--R-`+23`uP.,10_qh3_'.
'AAAAAAACEEEEIIIIGNOOOOOxOUUUUISS'.
'aaaaaaaceeeeiiiignooooo%ouuuuisy',
'cp'.$Code_Page_Baltic,
'E_,_,.++_%___``,_````.--_T___-,_'.
'__cLo_lSOCR_--RA`+23`uP.o1r_qh3a'.
'AIACAAEECEZEGKILSNNOOOOxULSUUZZS'.
'aiacaaeecezegkilsnnoooo%ulsuuzz`' );
my %AccentedChars;
my $AccentedChars = '';
my $NonAccenChars = '';
for ( $Code_Page_Central, $Code_Page_Western,
$Code_Page_Turkish, $Code_Page_Baltic )
{
$AccentedChars{'cp'.$_} = decode('cp'.$_, $HighBitSetChars);
$AccentedChars .= $AccentedChars{'cp'.$_};
$NonAccenChars .= $NonAccenChars{'cp'.$_};
}
my $QuotedMetaNonAccenChars = quotemeta($NonAccenChars);
my $DiacriticalChars = '';
for ( 0x0300..0x036F, 0x1DC0..0x1DFF )
{ $DiacriticalChars .= chr($_) }
my $b_Help = 0;
my $b_Interactive = 1;
my $b_UTF8 = 0;
my $b_Parameter = 0;
my $Folder;
for ( @ARGV )
{
if ( lc($_) eq '--' )
{ $b_Parameter = 1 }
elsif ( (not $b_Parameter) and (lc($_) eq '--batch') )
{ $b_Interactive = 0 }
elsif ( (not $b_Parameter) and (lc($_) eq '--utf8') )
{ $b_UTF8 = 1 }
elsif ( $b_Parameter or (substr($_, 0, 1) ne '-') )
{
if ( defined($Folder) )
{ die "$0 accepts only 1 parameter\n" }
else
{ $Folder = $_ }
}
else
{ $b_Help = 1 }
}
if ( $b_Help )
{
die << "END_OF_HELP"
$0 [--help] [--batch] [--] [folder]
This script renames files with accented and diacritical Latin characters :
- This PERL script starts from the folder given in parameter, or else from
the current folder.
- It recursively searches for files with characters belonging to 80 - FF of
CP 1250, CP 1252, CP 1254 and CP 1257 (mostly accented Latin characters)
or Latin characters having diacritical marks.
- It calculates new file names by removing the accents and diacritical marks
only from Latin characters (For example, Été --> Ete).
- It displays all proposed renaming and perhaps conflicts, and asks the user
for global approval.
- If the user has approved, it renames all files having no conflict.
Option '--batch' avoids interactive questions. Use with care.
Option '--' avoids the next parameter to be interpreted as option.
SPECIAL WARNING :
- This script was originally encoded in UTF-8, and should stay so.
- This script may rename a lot of files.
- Files names are theoretically all encoded only with UTF-8. But some file
names may be found to contain also some characters having legacy encoding.
- The author has applied efforts for consistency checks, robustness, conflict
detection and use of appropriate encoding.
So this script should only rename files by removing accents and diacritical
marks from Latin characters.
- But this script has been tested only under a limited number of OS
(Windows, Mac OS X, Linux) and a limited number of terminal encodings
(CP 850, ISO-8859-1, UTF-8).
- So, under weird circumstances, this script could rename many files with
random names.
- Therefore, this script should be used with care, and modified with extreme
care (beware encoding of internal strings, inputs, outputs and commands)
END_OF_HELP
}
if ( defined($Folder) )
{ chdir($Folder) or die "Can NOT set '$Folder' as current folder\n" }
utf8::decode($AccentedChars);
$_ = $AccentedChars;
eval "tr/$AccentedChars/$QuotedMetaNonAccenChars/";
if ( $@ ) { warn $@ }
if ( $@ or ($_ ne $NonAccenChars) )
{ die "$0: Consistency check on 'tr' FAILED :\n\n",
"Translated Accented Chars : ", length($_), ' : ', $_, "\n\n",
" Non Accented Chars : ", length($NonAccenChars), ' : ',
$NonAccenChars, "\n" }
my $b_Windows = ( defined($ENV{'OS'}) and ($ENV{'OS'} eq 'Windows_NT') );
my ($Q, $sep, $sep2, $HOME, $Find, @List, $cwd, @Move);
if ( $b_Windows )
{
$Q = '"';
$sep = '\\';
$sep2 = '\\\\';
$HOME = $ENV{'USERPROFILE'};
$Find = 'dir /b /s';
@List = ( ( (`ver 2>&1` =~ m/version\s+([0-9]+)/i) and ($1 >= 6) ) ?
('icacls') :
( 'cacls') );
$cwd = `cd`; chomp $cwd; $cwd = quotemeta($cwd);
@Move = ('move');
}
else
{
$Q = "'";
$sep = '/';
$sep2 = '/';
$HOME = $ENV{'HOME'};
$Find = 'find .';
@List = ('ls', '-d', '--');
@Move = ('mv', '--');
if ( -w '/bin' ) { die "$0: For safety reasons, ",
"usage is BLOCKED to administrators.\n"}
}
my $Encoding;
my $ucEncoding;
my $InputPipe = '-|';
my $Code_Page_Original;
my $Code_Page_Active;
if ( $b_Windows )
{
$_ = `chcp`;
m/([0-9]+)$/ or die "Non numeric Windows code page : ", $_;
$Code_Page_Original = $1;
print 'Windows Original Code Page = ', $Code_Page_Original,
( $Code_Page_Original == $Code_Page_UTF8 ?
' = UTF-8, display is perhaps correct with a true type font.' :
'' ), "\n\n";
$Code_Page_Active = $Code_Page_Original ;
$Encoding = ( $Code_Page_Active == $Code_Page_UTF8 ?
'utf8' :
'cp'.$Code_Page_Active ) ;
$InputPipe .= ":encoding($Encoding)";
print "InputPipe = '$InputPipe'\n\n";
if ( $Code_Page_Original != $Code_Page_UTF8 )
{
no warnings 'unopened';
@_ = stat(STDOUT);
use warnings;
if ( scalar(@_) and ($_[0] == 1) )
{ binmode(STDOUT, ":encoding(cp$Code_Page_Original)") }
else
{ binmode(STDOUT, ":encoding($Encoding)") }
}
}
elsif ( defined($ENV{'LANG'}) and ($ENV{'LANG'} =~ m/\.([^\@]+)$/i) )
{
$Encoding = $1;
my $Kernel = `uname -s`;
chomp $Kernel;
my $ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
if ( (lc($Kernel) ne 'darwin') and not grep {$_ eq $ucEncoding}
( map { ($_, &ucRemoveEolUnderscoreDash($_)) }
`locale -m` ) )
{ die "Encoding = '$Encoding' or '$ucEncoding' NOT supported ".
"by the OS\n" }
my $ucLocale = &ucRemoveEolUnderscoreDash($ENV{'LANG'});
if ( not grep {$_ eq $ucLocale}
( map { ($_, &ucRemoveEolUnderscoreDash($_)) }
`locale -a` ) )
{ die "Locale = '$ENV{LANG}' or '$ucLocale' NOT supported ".
"by the OS\n" }
if ( not defined(Encode::find_encoding($Encoding)) )
{ die "Encoding = '$Encoding' or '$ucEncoding' NOT supported ".
"by PERL\n" }
print "Encoding = '$Encoding' is supported by the OS and PERL\n\n";
binmode(STDOUT, ":encoding($Encoding)");
}
undef $_;
if ( defined($Encoding) )
{
$ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
if ( defined($SuperEncodings{$ucEncoding}) )
{ $_ = substr($AccentedChars{$SuperEncodings{$ucEncoding}},
0x20, 0x60) }
elsif ( defined($AccentedChars{$Encoding}) )
{ $_ = $AccentedChars{$Encoding} }
elsif ( $Encoding =~ m/^utf-?8$/i )
{ $_ = $AccentedChars }
}
if ( not defined($_) )
{ $_ = decode('cp'.$Code_Page_Central,
pack('C*', 0xC9, 0xD3, 0xD7, 0xDC,
0xE9, 0xF3, 0xF7, 0xFC)) }
utf8::decode($_);
my @EchoCommand = ( $b_Windows ?
"echo $_" :
('echo', $_) );
open(ECHO, $InputPipe, @EchoCommand) or die 'echo $_: ', $!;
my $Output = join('', <ECHO>);
close(ECHO);
chomp $Output;
utf8::decode($Output);
if ( $Output ne $_ )
{
warn "$0: Consistency check between parameter ",
"of 'echo' and output of 'echo' FAILED :\n\n",
"Parameter of 'echo' : ", length($_), ' : ', $_, "\n\n",
" Output of 'echo' : ", length($Output), ' : ', $Output, "\n";
exit 1;
}
if ( defined($Encoding) )
{
undef $_;
$ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
if ( defined($SuperEncodings{$ucEncoding}) )
{
$_ = $SuperEncodings{$ucEncoding};
print "--------- $EncodingNames{$_} ---------\n",
' ', substr($AccentedChars{$_}, 0x20, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x20, 0x20), "\n\n",
' ', substr($AccentedChars{$_}, 0x40, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x40, 0x20), "\n\n",
' ', substr($AccentedChars{$_}, 0x60, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x60, 0x20), "\n\n" }
else
{
for ( 'cp'.$Code_Page_Central, 'cp'.$Code_Page_Western,
'cp'.$Code_Page_Turkish, 'cp'.$Code_Page_Baltic )
{
if ( ('cp'.$Encoding eq $_) or ($Encoding =~ m/^utf-?8$/i) )
{ print "--------- $EncodingNames{$_} ---------\n",
' ', substr($AccentedChars{$_}, 0, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0, 0x20), "\n\n",
' ', substr($AccentedChars{$_}, 0x20, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x20, 0x20), "\n\n",
' ', substr($AccentedChars{$_}, 0x40, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x40, 0x20), "\n\n",
' ', substr($AccentedChars{$_}, 0x60, 0x20), "\n",
'--> ', substr($NonAccenChars{$_}, 0x60, 0x20), "\n\n" }
}
}
}
my $UnisonFile = $HOME.$sep.'.unison'.$sep.'common.unison';
my @Ignores;
if ( open(UnisonFile, '<', $UnisonFile) )
{
print "\nUnison File '", $UnisonFile, "'\n";
while ( <UnisonFile> )
{
if ( m/^\s*ignore\s*=\s*Name\s*(.+)/ )
{
$_ = $1 ;
if ( m/[$AccentedChars]/ )
{ push(@Ignores, $_) }
}
}
close(UnisonFile);
}
print map(" Ignore: ".$_."\n", @Ignores);
sub OutputAndErrorFromCommand
{
local $_;
my @Command = @_;
if ( defined($ENV{'OS'}) and ($ENV{'OS'} eq 'Windows_NT') )
{
for ( @Command )
{ s/^((-|.*(\s|')).*)$/$Q$1$Q/ }
my $Command = join(' ', @Command);
open(COMMAND, $InputPipe, "$Command 2>&1") or die '$Command: ', $!;
}
else
{
my $pid = open(COMMAND, $InputPipe);
defined($pid) or die "Can't fork: $!";
if ( $pid == 0 )
{
open STDERR, '>&=STDOUT';
exec @Command;
die "Can't @Command";
}
}
$_ = join('', <COMMAND>);
close COMMAND;
chomp;
utf8::decode($_);
$_;
}
my %Olds;
my $Old;
my $Dir;
my $Command;
my $ErrorMessage;
my $New;
my %News;
print "\n\nFiles with accented name and the corresponding non-accented name ",
":\n";
open(FIND, $InputPipe, $Find) or die $Find, ': ', $!;
FILE:
while ( <FIND> )
{
chomp;
utf8::decode($_);
if ( $b_Windows )
{ s/^$cwd$sep2// }
else
{ s/^\.$sep2// }
push(@{$Olds{$_}}, $_);
if ( m/([^$sep2]+)$/ and
($1 =~ m/[$AccentedChars]|([\ -\~][$DiacriticalChars])/) )
{
if ( $b_Windows and m/$Q/ )
{
print "\n $Q$_$Q\n*** contains quotes.\n";
next;
}
for my $Ignore ( @Ignores )
{
if ( m/$Ignore$/ )
{ next FILE }
}
$Old = $_ ;
m/^(.*$sep2)?([^$sep2]+)$/;
$Dir = ( defined($1) ? $1 : '');
$_ = $2;
print "\n $Q$Old$Q\n";
$ErrorMessage = &OutputAndErrorFromCommand(@List, $Old);
if ( $? != 0 )
{ print "*** $ErrorMessage\n" }
else
{
eval "tr/$AccentedChars/$QuotedMetaNonAccenChars/";
s/([\ -\~])[$DiacriticalChars]+/$1/g;
if ( $@ )
{ warn $@ }
else
{
$New = $Dir.$_;
if ( $b_Windows or (not utf8::is_utf8($Dir)) )
{ utf8::decode($New) }
$News{$Old} = $New;
push(@{$Olds{$New}}, $Old);
}
print "--> $Q$Dir$_$Q\n";
}
}
}
close(FIND);
my $b_NoDuplicate = 1;
for my $New ( sort keys %Olds )
{
if ( scalar(@{$Olds{$New}}) > 1 )
{
if ( $b_NoDuplicate )
{
print "\n\nFollowing files would have same non-accented name ",
":\n";
$b_NoDuplicate = 0;
}
print "\n", map(' '.$_."\n", @{$Olds{$New}}), '--> ', $New, "\n";
for ( @{$Olds{$New}} )
{ delete $News{$_} };
}
}
my $Number = scalar(keys %News);
print "\n\n";
if ( $Number < 1 )
{
print "There are NO file to rename\n";
exit;
}
if ( $b_Interactive )
{
print "In order to really rename the ", $Number,
" files which can safely be renamed, type 'rename' : ";
$_ = <STDIN>;
sleep 1;
if ( not m/^rename$/i )
{ exit 1 }
}
else
{ print $Number, " files will be renamed\n\n" }
$Number = 0;
my $Move = join(' ', @Move);
for ( sort {length($b) <=> length($a)} keys %News )
{
$ErrorMessage = &OutputAndErrorFromCommand(@Move, $_, $News{$_});
if ( $? == 0 )
{ $Number++ }
else
{ print "\n$Move $Q$_$Q\n", (' ' x length($Move)),
" $Q$News{$_}$Q\n", ('*' x length($Move)), " $ErrorMessage\n" }
}
print "\n$Number files have been successfully renamed\n";
__END__