字节顺序标记在Java中读取文件时会出现问题

Question

字节顺序标记在Java中读取文件时会出现问题

127

我正在尝试使用Java读取CSV文件。其中一些文件可能在开头有字节顺序标记，但不是所有文件都有。当存在字节顺序标记时，它会随着第一行的其他内容一起被读取，从而在字符串比较中引起问题。

是否有一种简便方法可以在存在字节顺序标记时跳过它？

- Tom

дєЯиЃЄеПѓдї•е∞ЭиѓХдљњзФ®дї•дЄЛйУЊжО•жЭ•е§ДзРЖеЄ¶жЬЙBOMзЪДUTF-8жЦЗдїґпЉЪhttp://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html - Chris

11个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user8514159 · Answer 1

NotePad++ 是一个很好的工具，可以将 UTF-8 编码转换为 UTF-8（带 BOM）编码。

https://notepad-plus-plus.org/downloads/

UTF8BOMTester.java

public class UTF8BOMTester {

public static void main(String[] args) throws FileNotFoundException, IOException {
    // TODO Auto-generated method stub
    File file = new File("test.txt");
    boolean same = UTF8BOMInputStream.isSameEncodingType(file);
    System.out.println(same);
    if (same) {
        UTF8BOMInputStream is = new UTF8BOMInputStream(file);
        BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"));
        System.out.println(br.readLine());
    }

}

static void bytesPrint(byte[] b) {
    for (byte a : b)
        System.out.printf("%x ", a);
}}

UTF8BOMInputStream.java

public class UTF8BOMInputStream extends InputStream {

byte[] SYMBLE_BOM = { (byte) 0xEF, (byte) 0xBB, (byte) 0xBF };
FileInputStream fis;
final boolean isSameEncodingType;
public UTF8BOMInputStream(File file) throws IOException {
    FileInputStream fis=new FileInputStream(file);
    byte[] symble=new byte[3];
    fis.read(symble);
    bytesPrint(symble);
    isSameEncodingType=isSameEncodingType(symble);
    if(isSameEncodingType)
        this.fis=fis;
    else
        this.fis=null;
    
}

@Override
public int read() throws IOException {
    return fis.read();
}

void bytesPrint(byte[] b) {
    for (byte a : b)
        System.out.printf("%x ", a);
}

boolean bytesCompare(byte[] a, byte[] b) {
    if (a.length != b.length)
        return false;

    for (int i = 0; i < a.length; i++) {
        if (a[i] != b[i])
            return false;
    }
    return true;
}
boolean isSameEncodingType(byte[] symble) {
    return bytesCompare(symble,SYMBLE_BOM);
}
public static boolean isSameEncodingType(File file) throws IOException {
    return (new UTF8BOMInputStream(file)).isSameEncodingType;
}