如何将Reader转换为InputStream,以及将Writer转换为OutputStream?

100

有没有简便的方法可以避免处理文本编码问题?

13个回答

1
使用WriterOutputStream时需注意——它不总是能正确地/与普通输出流相同地处理将二进制数据写入文件。我曾遇到过这个问题,花了一段时间才找出原因。
如果可以的话,建议使用输出流作为基础,如果需要写入字符串,则使用OutputStreamWriter包装器来完成。将文本转换为字节比反向操作更加可靠,这可能就是为什么WriterOutputStream不是Java标准库的一部分的原因。

0
这是一个基于UTF-8编码的简单编码WriterOutputStream和ReaderInputStream的源代码。测试结果良好。
    // https://www.woolha.com/tutorials/deno-utf-8-encoding-decoding-examples
    public class WriterOutputStream extends OutputStream {
        final Writer    writer;

        int             count       = 0;
        int             codepoint   = 0;

        public WriterOutputStream(Writer writer) {
            this.writer = writer;
        }

        @Override
        public void write(int b) throws IOException {
            b &= 0xFF;
            switch (b >> 4) {
            case 0b0000:
            case 0b0001:
            case 0b0010:
            case 0b0011:
            case 0b0100:
            case 0b0101:
            case 0b0110:
            case 0b0111:
                count = 1;
                codepoint = b;
                break;

            case 0b1000:
            case 0b1001:
            case 0b1010:
            case 0b1011:
                codepoint <<= 6;
                codepoint |= b & 0b0011_1111;
                break;

            case 0b1100:
            case 0b1101:
                count = 2;
                codepoint = b & 0b0001_1111;
                break;

            case 0b1110:
                count = 3;
                codepoint = b & 0b0000_1111;
                break;

            case 0b1111:
                count = 4;
                codepoint = b & 0b0000_0111;
                break;
            }
            if (--count == 0) {
                writer.write(codepoint);
            }
        }
    }

    public class ReaderInputStream extends InputStream {
        final Reader    reader;
        int             count   = 0;
        int             codepoint;

        public ReaderInputStream(Reader reader) {
            this.reader = reader;
        }

        @Override
        public int read() throws IOException {
            if (count-- > 0) {
                int r = codepoint >> (count * 6);
                r &= 0b0011_1111;
                r |= 0b1000_0000;
                return r;
            }

            codepoint = reader.read();
            if (codepoint < 0)
                return -1;
            if (codepoint > 0xFFFF)
                return 0;

            if (codepoint < 0x80)
                return codepoint;

            if (codepoint < 0x800) {
                count = 1;
                int v = (codepoint >> 6) | 0b1100_0000;
                return v;
            }
            count = 2;
            int v = (codepoint >> 12) | 0b1110_0000;
            return v;
        }
    }

测试用例验证每个65536个字符是否正确编码和解码,以及验证它是否与Java编码匹配。代理验证(2个字符编码)被忽略,因为这在Java中处理。

    @Test
    public void testAll() throws IOException {
        for (char i = 0; i < 0xFFFF; i++) {
            CharArrayReader car = new CharArrayReader(new char[] { i });
            ReaderInputStream rtoi = new ReaderInputStream(car);
            byte[] data = IO.read(rtoi);

            CharArrayWriter caw = new CharArrayWriter();
            try (WriterOutputStream wtoo = new WriterOutputStream(caw)) {
                wtoo.write(data);
                char[] translated = caw.toCharArray();
                assertThat(translated.length).isEqualTo(1);
                assertThat((int) translated[0]).isEqualTo(i);

                if (!Character.isSurrogate((char) i)) {
                    try (InputStream stream = new ByteArrayInputStream(data)) {
                        caw = new CharArrayWriter();
                        IO.copy(data, caw);
                        translated = caw.toCharArray();
                        assertThat(translated.length).isEqualTo(1);
                        assertThat((int) translated[0]).isEqualTo(i);
                    }
                }
            }
        }
    }


-1

使用Java提供的内容从流中读取字符串。

InputStream s = new BufferedInputStream( new ReaderInputStream( new StringReader("a string")));

6
ReaderInputStream 是 Apache Commons IO 中的一个类。 - Will Beason

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接