在哪里可以找到一个适用于输出的 Java Servlet 过滤器,它应用正则表达式?

11

我希望有人已经写了这个:

一个Servlet过滤器,可以配置正则表达式的搜索/替换模式,并将其应用于HTML输出。

这样的东西存在吗?


你到底想要改变什么?请求URL还是响应体?Tuckey的UrlRewriteFilter非常出色,但它旨在重写URL(就像可以使用众所周知的Apache HTTPD的RewriteRule一样)。要更改响应正文,您需要对功能需求更加具体。我没有想到这样的过滤器,但这太像消毒用户控制的输入以防止XSS。在这种情况下,正则表达式绝对不是完成任务的正确工具。 - BalusC
抱歉让您不太明白。我已编辑问题以表明我想修改HTML输出。 - Jeremy Stein
HTML输出中具体是什么?由于使用正则表达式来解析和修改HTML是一种极其不良的做法,因此从未编写过这样的过滤器。请更明确地说明功能要求。为什么需要这样的过滤器?为什么不直接在视图端进行更改?等等。 - BalusC
我们想通过框架将一个供应商基于JSP的Web应用程序整合到我们自己的应用中。我们需要从他们的输出中删除所有的target="_parent"。他们只给了我们已编译的JSP文件。我认为最简单的方法是添加一个过滤器来修改输出。 - Jeremy Stein
3个回答

15

我找不到一个现成的,于是我自己写了一个:

RegexFilter.java

package com.example;

import java.io.IOException;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.regex.Pattern;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletResponse;

/**
 * Applies search and replace patterns. To initialize this filter, the
 * param-names should be "search1", "replace1", "search2", "replace2", etc.
 */
public final class RegexFilter implements Filter {
    private List<Pattern> searchPatterns;
    private List<String> replaceStrings;

    /**
     * Finds the search and replace strings in the configuration file. Looks for
     * matching searchX and replaceX parameters.
     */
    public void init(FilterConfig filterConfig) {
        Map<String, String> patternMap = new HashMap<String, String>();

        // Walk through the parameters to find those whose names start with
        // search
        Enumeration<String> names = (Enumeration<String>) filterConfig.getInitParameterNames();
        while (names.hasMoreElements()) {
            String name = names.nextElement();
            if (name.startsWith("search")) {
                patternMap.put(name.substring(6), filterConfig.getInitParameter(name));
            }
        }
        this.searchPatterns = new ArrayList<Pattern>(patternMap.size());
        this.replaceStrings = new ArrayList<String>(patternMap.size());

        // Walk through the parameters again to find the matching replace params
        names = (Enumeration<String>) filterConfig.getInitParameterNames();
        while (names.hasMoreElements()) {
            String name = names.nextElement();
            if (name.startsWith("replace")) {
                String searchString = patternMap.get(name.substring(7));
                if (searchString != null) {
                    this.searchPatterns.add(Pattern.compile(searchString));
                    this.replaceStrings.add(filterConfig.getInitParameter(name));
                }
            }
        }
    }

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        // Wrap the response in a wrapper so we can get at the text after calling the next filter
        PrintWriter out = response.getWriter();
        CharResponseWrapper wrapper = new CharResponseWrapper((HttpServletResponse) response);
        chain.doFilter(request, wrapper);

        // Extract the text from the completed servlet and apply the regexes
        String modifiedHtml = wrapper.toString();
        for (int i = 0; i < this.searchPatterns.size(); i++) {
            modifiedHtml = this.searchPatterns.get(i).matcher(modifiedHtml).replaceAll(this.replaceStrings.get(i));
        }

        // Write our modified text to the real response
        response.setContentLength(modifiedHtml.getBytes().length);
        out.write(modifiedHtml);
        out.close();
    }

    public void destroy() {
        this.searchPatterns = null;
        this.replaceStrings = null;
    }
}

CharResponseWrapper.java

package com.example;

import java.io.CharArrayWriter;
import java.io.PrintWriter;

import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;

/**
 * Wraps the response object to capture the text written to it.
 */
public class CharResponseWrapper extends HttpServletResponseWrapper {
    private CharArrayWriter output;

    public CharResponseWrapper(HttpServletResponse response) {
        super(response);
        this.output = new CharArrayWriter();
    }

    public String toString() {
        return output.toString();
    }

    public PrintWriter getWriter() {
        return new PrintWriter(output);
    }
}

示例 web.xml

<web-app>
    <filter>
      <filter-name>RegexFilter</filter-name>
      <filter-class>com.example.RegexFilter</filter-class>
      <init-param><param-name>search1</param-name><param-value><![CDATA[(<\s*a\s[^>]*)(?<=\s)target\s*=\s*(?:'_parent'|"_parent"|_parent|'_top'|"_top"|_top)]]></param-value></init-param>
      <init-param><param-name>replace1</param-name><param-value>$1</param-value></init-param>
    </filter>
    <filter-mapping>
      <filter-name>RegexFilter</filter-name>
      <url-pattern>/*</url-pattern>
    </filter-mapping>
</web-app>

太棒了,我刚用这个来帮助我解决一个类似的问题! - Aaron Silverman
1
我建议在 out.close() 之前使用 out.flush(),以避免出现以下错误:java.net.ProtocolException: Didn't meet stated Content-Length, wrote: '27026' bytes instead of stated: '27023' bytes. - rudolfv

5
我不确定这是否符合您的要求,但有一个URL重写过滤器。它支持正则表达式。请参见这里
希望这可以帮到您。

这个库不仅支持重写传入的URL,还支持修改HTML页面上的链接: http://urlrewritefilter.googlecode.com/svn/trunk/src/doc/manual/4.0/index.html#outbound-rule 不错。 - rwitzel

2

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接