Java剪贴板:在Linux上从Firefox粘贴HTML

14

我在Linux上将HTML从Firefox复制到Java6应用程序(仅限Linux)时遇到了奇怪的问题。以下是一个最小示例:

import java.awt.Toolkit;
import java.awt.datatransfer.Clipboard;
import java.awt.datatransfer.DataFlavor;
import java.awt.datatransfer.Transferable;
import java.io.Reader;
import java.nio.ByteBuffer;

class ClipboardPrinter {
    public static void main( String args[] ) throws Exception
    {
        Clipboard systemClipboard = Toolkit.getDefaultToolkit()
                .getSystemClipboard();
        Transferable transferData = systemClipboard.getContents(null);
        if (transferData == null) {
            System.out.println("no content");
            return;
        }

//      final DataFlavor htmlFlavorString = new DataFlavor("text/html;class=java.lang.String");
//      String html = (String)transferData.getTransferData(htmlFlavorString);
//      System.out.println("html = '" + html + "'");

        final DataFlavor htmlFlavor = new DataFlavor("text/html;class=java.nio.ByteBuffer;charset=US-ASCII");
        if (!transferData.isDataFlavorSupported(htmlFlavor)) {
            System.out.println("no text/html reader content");
            return;
        }

        ByteBuffer bb = (ByteBuffer)transferData.getTransferData(htmlFlavor);
        byte[] bytes = bb.array();
        for (byte b: bytes)
        {
            System.out.format("%02x", b);
        }
        System.out.println();
        final int cutoff = 2;
        byte[] bytes2 = new byte[bytes.length - cutoff];
        for (int i = cutoff; i < bytes.length; i++)
            bytes2[i-cutoff] = bytes[i];
        final String htmlContent = new String(bytes2, "UTF-16LE");


        System.out.println("htmlContent = '" + htmlContent + "'");
    }
}

首先我尝试使用new DataFlavor("text/html;class=java.lang.String"),但这会导致一个无法使用的字符串开头有两个值为65533的字符(即使去掉这两个字符也没有用)。

接下来,我使用了一个带有charset = US-ASCII的ByteBuffer数据 flavor(故意使用ASCII!):charset = UTF-16LE(或UTF-16或UTF-16BE)都不起作用。 使用上述charset = US-ASCII解决方案(以及new String(bytes2,“UTF-16LE”)),7位字符可以工作(但是例如umlauts不能正常工作,而是打印一个“?”)。

我剪掉了两个字节,因为似乎在开头有两个bom(不确定,可能是其他东西)?

我使用charset = UTF-8和截断= 6(开头有两个三字节的“替换字符”0xEFBFBD和umlaut被编码为两个错误的字符)获得类似的结果。 在这两种情况下,我都使用了new String(bytes2,“UTF-16LE”)

您对如何:

  • 支持此解决方案中的非ASCII字符(或找到更好的解决方案)有任何建议吗?
  • 确定它是UTF-16LE还是UTF-16BE?

谢谢! 欢迎任何提示!

顺便说一下:这是我(Linux)系统上支持的数据flavor(来自transferable.getTransferDataFlavors()):

[java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.Reader]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.lang.String]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.CharBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[C]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=application/x-java-serialized-object;representationclass=java.lang.String]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.Reader]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.lang.String]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.CharBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[C]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=unicode]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/x-moz-url-priv;representationclass=java.io.InputStream]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlinfo;representationclass=java.io.InputStream]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlcontext;representationclass=java.io.InputStream]
java.awt.datatransfer.DataFlavor[mimetype=text/x-moz-url-priv;representationclass=java.nio.ByteBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlinfo;representationclass=java.nio.ByteBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlcontext;representationclass=java.nio.ByteBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/x-moz-url-priv;representationclass=[B]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlinfo;representationclass=[B]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlcontext;representationclass=[B]]

可能是Java Drag and Drop Text via DropTargetListener的重复问题。 - Paul Sweatte
2个回答

1
我认为问题与他从剪贴板中读取的 US-ASCII 相关,然后转换为 Unicode 并期望保留德语umlauts。由于 US-ASCII 是一个7位字符集,德语 umlauts 不包含在内,并且在读取 US-ASCII 剪贴板后已经丢失。
public class CharsetDemo {
    public static void main(String[] args) throws Exception {
        byte[] bytes;

        // convert the German umlaut to bytes in US-ASCII charset
        bytes = "ö".getBytes("US-ASCII");
        System.out.println("US-ASCII");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + new String(bytes, "US-ASCII"));
        System.out.println();

        // create a unicode string from the US-ASCII bytes
        String utf8String = new String(bytes, "UTF-8");
        bytes = utf8String.getBytes("UTF-8");
        System.out.println("UTF-8");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + utf8String);
        System.out.println();

        // convert the German umlaut to bytes in ISO-8859-1 charset
        bytes = "ö".getBytes("ISO-8859-1");
        System.out.println("ISO 8859-1");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + new String(bytes, "ISO-8859-1"));
        System.out.println();

        // create a unicode string from the ISO-8859-1 bytes
        utf8String = new String(bytes, "UTF-8");
        bytes = utf8String.getBytes("UTF-8");
        System.out.println("UTF-8");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + utf8String);
        System.out.println();

        // bytes of the "REPLACEMET CHARACTER"
        System.out.println("replacement character bytes: " 
            + asHexString("\uFFFD".getBytes("UTF-8")));

    }

    static String asHexString(byte[] bytes) {
        StringBuilder sb = new StringBuilder();
        for (byte b : bytes) {
            sb.append(String.format("%X ", b));
        }
        return sb.toString();
    }
}

output

US-ASCII
bytes : 3F 
string: ?  <--- the question mark represents here the "REPLACEMENT CHARACTER"

UTF-8
bytes : 3F 
string: ?

ISO 8859-1
bytes : F6 
string: ö

UTF-8
bytes : EF BF BD  <-- the "REPLACEMENT CHARACTER", as "F6" is not a valid UTF-8 codepoint
string: �

replacement character bytes: EF BF BD 

谢谢回复。我同意在允许8位字符的情况下请求ASCII是没有意义的。然而,这并不能解决上述的剪贴板问题。 - Felix Natter

0

问题在Java7中仍然存在。在Java8中更糟糕的是:当你从Firefox粘贴HTML时,会产生垃圾(而不像Java7一样只有纯文本)。 - Felix Natter

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接