

我在使用Java读取(和可能写入)Linux系统上的文件时遇到了问题。我的应用程序抱怨无法读取一些音频文件,当我查看系统时,我注意到ls -l也无法读取这些文件,并且所有出现问题的文件都包含带引号等字符,例如é,没有这些字符的文件则可以。

[root@N1-0247 Georges Bizet- Suites from Carmen & L'arlésienne]# pwd
/mnt/disk1/share/import/all/MusicUnmatched/WAV/Yan Pascal Tortelier/Georges Bizet- Suites from Carmen & L'arlésienne
[root@N1-0247 Georges Bizet- Suites from Carmen & L'arlésienne]# ls -l
ls: cannot access 20 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Farandole.WAV: No such file or directory
ls: cannot access 19 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Minuetto.WAV: No such file or directory
ls: cannot access 18 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Intermezzo.WAV: No such file or directory
ls: cannot access 17 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Pastorale.WAV: No such file or directory
ls: cannot access 16 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Carillon.WAV: No such file or directory
ls: cannot access 15 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Adagietto.WAV: No such file or directory
ls: cannot access 14 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Minuetto.WAV: No such file or directory
ls: cannot access 13 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Prélude.WAV: No such file or directory
ls: cannot access 08 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Chanson Du Toréador (Act II).WAV: No such file or directory
ls: cannot access 07 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Dans Bohème (Gypsy Song, Act II).WAV: No such file or directory
ls: cannot access 05 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Seguédille (Act I).WAV: No such file or directory
ls: cannot access 04 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Habeñera (Act I).WAV: No such file or directory
ls: cannot access 02 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Prélude (Prelude To Act I).WAV: No such file or directory
ls: cannot access 01 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Les Toréadors (Introduction To Act I).WAV: No such file or directory
total 192148
?????????? ? ?    ?           ?            ? 01 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Les Toréadors (Introduction To Act I).WAV
?????????? ? ?    ?           ?            ? 02 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Prélude (Prelude To Act I).WAV
-rw-rw-rw- 1 root root 36681194 Feb 21  2017 03 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- La Grade Montante (Street Urchins' Chorus, Act I).WAV
?????????? ? ?    ?           ?            ? 04 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Habeñera (Act I).WAV
?????????? ? ?    ?           ?            ? 05 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Seguédille (Act I).WAV
-rw-rw-rw- 1 root root 16455464 Feb 21  2017 06 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Les Dragons D'Alcala (Entr'acte, Act II).WAV
?????????? ? ?    ?           ?            ? 07 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Dans Bohème (Gypsy Song, Act II).WAV
?????????? ? ?    ?           ?            ? 08 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Chanson Du Toréador (Act II).WAV
-rw-rw-rw- 1 root root 27743402 Feb 21  2017 09 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Intermezzo (Entr'acte, Act III).WAV
-rw-rw-rw- 1 root root 39886886 Feb 21  2017 10 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Marche Des Contrebandiers (Introduction To Act III).WAV
-rw-rw-rw- 1 root root 52822606 Feb 21  2017 11 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Nocturne (Micaela's Aria, Act III).WAV
-rw-rw-rw- 1 root root 23100378 Feb 21  2017 12 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Argonaise (Entr'acte, Act IV).WAV
?????????? ? ?    ?           ?            ? 13 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Prélude.WAV
?????????? ? ?    ?           ?            ? 14 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Minuetto.WAV
?????????? ? ?    ?           ?            ? 15 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Adagietto.WAV
?????????? ? ?    ?           ?            ? 16 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Carillon.WAV
?????????? ? ?    ?           ?            ? 17 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Pastorale.WAV
?????????? ? ?    ?           ?            ? 18 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Intermezzo.WAV
?????????? ? ?    ?           ?            ? 19 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Minuetto.WAV
?????????? ? ?    ?           ?            ? 20 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Farandole.WAV

export LANG=en_US.UTF-8

export LC_ALL=en_US.UTF-8

我在其他 Linux 系统、Windows、macOS 等系统中没有遇到过这个问题。

文件系统用于存储文件(ext4,FAT等)? - Mikel Rychliski
你是从哪里挂载的?使用的是哪个文件系统? - Tarun Lalwani
文件系统是xfs,它是Linux盒子上的本地文件系统,Java应用程序直接在Linux盒子上运行。 - Paul Taylor
你能运行以下语句列出文件,以帮助我们了解哪些(八进制)字节用于(其中一个)有问题的文件名吗?LC_ALL=C ls 例如,在我的系统上会打印出:'test1__'$'\303\251''__.txt'而不是test1__é__.txt - JohannesB
LC_ALL ls 命令输出为 20 - L' Arl??sienne, suite for orchestra No. 1, from the incidental music- Farandole.WAV, 而不是 20 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Farandole.WAV。 - Paul Taylor




我尝试使用Java 11中的java.nio.Files在Debian 10上与XFS和bash(作为ls内置)进行复制,但无法重现以é命名的文件的问题。




事实上,我并不是在所有地方都使用文件,而是在使用ApacheCommons进行重命名,这可能是问题所在吗? - Paul Taylor
还不确定,请回复我的评论并运行以下命令以进行测试:LC_ALL=C ls,这样我们就可以更接近于重现潜在的问题。 - JohannesB
我已经在出现问题的同一台机器上进行了一些新文件的测试,但是我无法让它再次发生。因此,我让我的应用程序接受可以使用ASCII处理的文件名,然后将它们重命名为需要UTF8的文件名,并且它们之后仍然可以被访问。但由于该机器是黑盒设备,而我的应用程序是该设备上唯一重命名文件的应用程序,我担心是我的应用程序引起了这个问题。 - Paul Taylor
修复无法重现的问题很疯狂,但是在过去的一天里,我通过阅读有关Unicode的资料学到了很多,你不能赢得所有的胜利 :-) - JohannesB


你的文件系统已经损坏 - 这不是应用程序级别的问题,而是物理磁盘上的内容根据文件系统驱动程序转换为文件名和数据时无效。你需要检查哪个设备代表你的文件系统("mount"命令显示哪个设备挂载到哪个目录 - 可能是类似于/dev/sda1的东西)。你需要将其重新挂载为只读(如果这是你的根文件系统,则可能有些棘手),并运行fsck /dev/sda1(或者你的设备是什么)来修复它。不能百分之百地保证你可以恢复那些文件。





-rw-rw-rw- 1 root root 52822606 Feb 21  2017 11 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Nocturne (Micaela's Aria, Act III).WAV

所以,问题出在 Unicode 字符 é 上。

这个字符是这个:https://www.compart.com/en/unicode/U+00E9,因此它由一个空字节后跟 E9 组成。

问题是像 xfs 这样的 POSIX 文件系统不允许文件名中有空字节(请参阅XFS 文件系统中所有非法字符是什么?



例如,此页面列出了文件系统,并指示允许在其文件名中使用 Unicode 的文件系统:


(顺便提一下,在那个列表中有苹果的HFS+,但有趣的是,它已被苹果文件系统APFS所取代,该文件系统不允许在文件名中使用Unicode - https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/APFS_Guide/FAQ/FAQ.html


    String safeFilename = filename.replaceAll("é", "e");


    String safeFilename = filename.replaceAll( "\u00e9", "e" );

嗨,这很有趣,但U+00E9是UTF-16值,但它不应该被写入文件作为UTF-8值(即0xC3 0xA9),你认为呢? - Paul Taylor
而且,文件系统不会允许Java程序在路径名中包含空值。在那个级别上,它(应该)是编码无关的。 - Stephen C
授予您奖励,因为您提供了关于POSIX文件系统(如xfs)不允许空字节的信息。尽管我还没有解决这个问题,但我认为这可能是由于文件系统损坏或在复制到系统时出现问题导致的。 - Paul Taylor

