如何在Java中将转义字符作为文本读取？

Question

如何在Java中将转义字符作为文本读取？

3

public List<String> readRSS(String feedUrl, String openTag, String closeTag)
            throws IOException, MalformedURLException {

        URL url = new URL(feedUrl);
        BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));

        String currentLine;
        List<String> tempList = new ArrayList<String>();
        while ((currentLine = reader.readLine()) != null) {
            Integer tagEndIndex = 0;
            Integer tagStartIndex = 0;
            while (tagStartIndex >= 0) {
                tagStartIndex = currentLine.indexOf(openTag, tagEndIndex);
                if (tagStartIndex >= 0) {
                    tagEndIndex = currentLine.indexOf(closeTag, tagStartIndex);
                    tempList.add(currentLine.substring(tagStartIndex + openTag.length(), tagEndIndex) + "\n");
                }
            }
        }
        if (tempList.size() > 0) {
            if(openTag.contains("title")){
                tempList.remove(0);
                tempList.remove(0);
            }
            else if(openTag.contains("desc")){
                tempList.remove(0);
            }
        }
        return tempList;
    }

我写了一段代码来读取RSS源。一切都运行良好，但当解析器找到像这样的字符  时，它就会中断。这是因为它无法找到结束标记，因为XML已经被转义了。

我不知道如何在我的代码中修复它。有谁能帮我解决这个问题吗？

- Sander bakker

所以你想将转义字符读取为文本，然后（也许）跳过它们，对吗？ - progyammer

@progyammer 是的，我想跳过它们。现在发生的是：RSS阅读器看到一个然后就停止阅读了，导致无法到达</title>标签并崩溃。已更新OP并添加图像以使其更清晰。 - Sander bakker

没错。它是一个解析器，所以当遇到转义序列时会执行它应该执行的操作。你需要以某种方式覆盖那个规则并将所有内容都作为文本读取；你对输入的后处理只会稍微增加一点。 - progyammer

@程序员我理解这个问题：)。无论如何，你有想法我怎么能实现一个修复吗？ - Sander bakker

1

@tima，这是您需要的链接：http://www.ad.nl/home/rss.xml - Sander bakker

显示剩余2条评论

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- tima · Accepted Answer

问题在于特殊字符是一个换行符，因此您的开始和结束标记会出现在不同的行上。因此，如果您按行阅读，使用您的代码将无法正常工作。

您可以尝试类似以下的方法：

StringBuffer fullLine = new StringBuffer();

while ((currentLine = reader.readLine()) != null) {
    int tagStartIndex = currentLine.indexOf(openTag, 0);
    int tagEndIndex = currentLine.indexOf(closeTag, tagStartIndex);

    // both tags on the same line
    if (tagStartIndex != -1 && tagEndIndex != -1) {
        // process the whole line
        tempList.add(currentLine);
        fullLine = new StringBuffer();
    // no tags on this line but the buffer has been started
    } else if (tagStartIndex == -1 && tagEndIndex == -1 && fullLine.length() > 0) {
        /*
         * add the current line to the buffer; it is part 
         * of a larger line
         */
        fullLine.append(currentLine);
    // start tag is on this line
    } else if (tagStartIndex != -1 && tagEndIndex == -1) {
        /*
         *  line started but did not have an end tag; add it to 
         *  a new buffer
         */
        fullLine = new StringBuffer(currentLine);
        // end tag is on this line
    } else if (tagEndIndex != -1 && tagStartIndex == -1) {
        /*
         *  line ended but did not have a start tag; add it to 
         *  the current buffer and then process the buffer
         */
        fullLine.append(currentLine);
        tempList.add(fullLine.toString());
        fullLine = new StringBuffer();
    }
}

给定以下示例输入：

<title>another &#xD;
title 0</title>
<title>another title 1</title>
<title>another title 2</title>
<title>another title 3</title>
<desc>description 0</desc>
<desc>another &#xD;
description 1</desc>
<title>another title 4</title>
<title>another &#xD;
another line in between &#xD;
title 5</title>

tempList 中 title 的完整行如下：

<title>another &#xD;title 0</title>
<title>another title 1</title>
<title>another title 2</title>
<title>another title 3</title>
<title>another title 4</title>
<title>another &#xD;another line in between &#xD;title 5</title>

并且对于desc：

<desc>description 0</desc>
<desc>another &#xD;description 1</desc>

你应该在完整的RSS源上测试这种方法的性能。还要注意特殊字符不会被转义。