从MVC生成的HTML中删除额外的空格

31

我有一个MVC应用程序视图,生成了一个非常大的HTML值表格(> 20MB)。

我正在使用压缩过滤器在控制器中压缩视图。

 internal class CompressFilter : ActionFilterAttribute
 {
     public override void OnActionExecuting(ActionExecutingContext filterContext)
     {
         HttpRequestBase request = filterContext.HttpContext.Request;
         string acceptEncoding = request.Headers["Accept-Encoding"];
         if (string.IsNullOrEmpty(acceptEncoding))
             return;
         acceptEncoding = acceptEncoding.ToUpperInvariant();
         HttpResponseBase response = filterContext.HttpContext.Response;
         if (acceptEncoding.Contains("GZIP"))
         {
             response.AppendHeader("Content-encoding", "gzip");
             response.Filter = new GZipStream(response.Filter, CompressionMode.Compress);
         }
         else if (acceptEncoding.Contains("DEFLATE"))
         {
             response.AppendHeader("Content-encoding", "deflate");
             response.Filter = new DeflateStream(response.Filter, CompressionMode.Compress);
         }
     }
 }

在运行压缩过滤器之前,是否有办法消除视图中生成的相当大量的冗余空格(以减少压缩工作量和大小)?

编辑: 我使用了Womp下面建议的WhiteSpaceFilter技术,现在它可以正常工作了。

出于兴趣,这是Firebug分析的结果:

1)无压缩,无空格 - 21MB、2.59分钟
2)启用GZIP压缩,无空格 - 2MB、17.59秒
3)启用GZIP压缩,去掉空格 - 558kB、12.77秒

因此,肯定值得尝试。


1
有趣的结果,感谢您发布它们。 - womp
3
我知道这很老了,但你是否有兴趣发布完整代码? - ilivewithian
8个回答

20

这个人写了一个非常简洁的空格压缩器,它通过正则表达式快速地复制字节并删除空格块。他将其编写为http模块,但你可以将其中的7行核心代码提取出来并嵌入到你的函数中。


7
提醒大家注意,如果你阅读那个链接上的评论,会发现解决方案存在一些缺陷。 - Chris Haines
正则表达式并不是“快速”的。请查看我的测量结果https://dev59.com/snRA5IYBdhLWcg3wtwSe#15014794 - Eric J.

7

@womp已经提出了一种不错的方法,但是那个模块已经过时了。我一直在使用它,但结果发现这并不是最优的方式。以下是我所提出的问题:

使用正则表达式从整个HTML中删除空格,但保留pre标签内的空格

这是我如何做到的:

public class RemoveWhitespacesAttribute : ActionFilterAttribute {

    public override void OnActionExecuted(ActionExecutedContext filterContext) {

        var response = filterContext.HttpContext.Response;

        //Temp fix. I am not sure what causes this but ContentType is coming as text/html
        if (filterContext.HttpContext.Request.RawUrl != "/sitemap.xml") {

            if (response.ContentType == "text/html" && response.Filter != null) {
                response.Filter = new HelperClass(response.Filter);
            }
        }
    }

    private class HelperClass : Stream {

        private System.IO.Stream Base;

        public HelperClass(System.IO.Stream ResponseStream) {

            if (ResponseStream == null)
                throw new ArgumentNullException("ResponseStream");
            this.Base = ResponseStream;
        }

        StringBuilder s = new StringBuilder();

        public override void Write(byte[] buffer, int offset, int count) {

            string HTML = Encoding.UTF8.GetString(buffer, offset, count);

            //Thanks to Qtax
            //https://stackoverflow.com/questions/8762993/remove-white-space-from-entire-html-but-inside-pre-with-regular-expressions
            Regex reg = new Regex(@"(?<=\s)\s+(?![^<>]*</pre>)");
            HTML = reg.Replace(HTML, string.Empty);

            buffer = System.Text.Encoding.UTF8.GetBytes(HTML);
            this.Base.Write(buffer, 0, buffer.Length);
        }

        #region Other Members

        public override int Read(byte[] buffer, int offset, int count) {

            throw new NotSupportedException();
        }

        public override bool CanRead{ get { return false; } }

        public override bool CanSeek{ get { return false; } }

        public override bool CanWrite{ get { return true; } }

        public override long Length{ get { throw new NotSupportedException(); } }

        public override long Position {

            get { throw new NotSupportedException(); }
            set { throw new NotSupportedException(); }
        }

        public override void Flush() {

            Base.Flush();
        }

        public override long Seek(long offset, SeekOrigin origin) {

            throw new NotSupportedException();
        }

        public override void SetLength(long value) {

            throw new NotSupportedException();
        }

        #endregion
    }

}

1
今天早上进行了基准测试。在我的设置中,一个相当大的约78KB的HTML文件使用此过滤器处理需要250毫秒(在高端i7设置上)。我还注意到,对于大文件,过滤器会被多次调用...这意味着如果文件切片不正确,这可能会严重失败。理论上有更好的方法...在编译时删除额外的空格,但是到目前为止,我看到的唯一解决方案在我的环境中失败了。https://github.com/meleze/Meleze.Web - Eric J.

4
通过扩展 Razor,可以在编译时删除空格,从而消除生成的 HTML 中删除空格的运行时负担(根据我的测试非常显著)。使用基于 RegEx 的代码修剪 100KB 文档时,在高端 i7 上会产生长达 88ms 的影响,下面提供了 MVC 3 和 MVC 4 的编译时解决方案的实现:Meleze.Web。该方案在此处有详细描述:http://cestdumeleze.net/blog/2011/minifying-the-html-with-asp-net-mvc-and-razor/(但请使用 GitHub 代码或 NuGet DLL,因为博客文章中的代码仅涵盖 MVC 3)。

@MarcinHabuszewski:第一个链接没有失效。我刚刚点击了它,它已经加载了。也许是github.com的临时问题? - Eric J.
我看到你误读了我的评论。我写的是第二个链接失效了。至于第一个链接,我想说即使它已经失效(事实并非如此),因为它包含了你建议使用的工具名称,所以仍然具有一定的价值。虽然这个答案中只有这个地方涉及到了这个名称。 - jahu

2
#region Stream filter
class StringFilterStream : Stream
{
  private Stream _sink;
  private Func<string, string> _filter;

  public StringFilterStream(Stream sink, Func<string, string> filter) {
    _sink = sink;
    _filter = filter;
  }

  #region Mixin Properties/Methods
  public override bool CanRead { get { return true; } }
  public override bool CanSeek { get { return true; } }
  public override bool CanWrite { get { return true; } }
  public override void Flush() { _sink.Flush(); }
  public override long Length { get { return 0; } }
  private long _position;
  public override long Position {
    get { return _position; }
    set { _position = value; }
  }
  public override int Read(byte[] buffer, int offset, int count) {
    return _sink.Read(buffer, offset, count);
  }
  public override long Seek(long offset, SeekOrigin origin) {
    return _sink.Seek(offset, origin);
  }
  public override void SetLength(long value) {
    _sink.SetLength(value);
  }
  public override void Close() {
    _sink.Close();
  }
  #endregion

  public override void Write(byte[] buffer, int offset, int count) {
    // intercept the data and convert to string
    byte[] data = new byte[count];
    Buffer.BlockCopy(buffer, offset, data, 0, count);
    string s = Encoding.Default.GetString(buffer);

    // apply the filter
    s = _filter(s);

    // write the data back to stream
    byte[] outdata = Encoding.Default.GetBytes(s);
    _sink.Write(outdata, 0, outdata.GetLength(0));
  }
}
#endregion

public enum WebWhitespaceFilterContentType
{
  Xml = 0, Css = 1, Javascript = 2
}
public class WebWhitespaceFilterAttribute : ActionFilterAttribute
{
  private WebWhitespaceFilterContentType _contentType;

  public WebWhitespaceFilterAttribute() {
    _contentType = WebWhitespaceFilterContentType.Xml;
  }
  public WebWhitespaceFilterAttribute(WebWhitespaceFilterContentType contentType) {
    _contentType = contentType;
  }

  public override void OnActionExecuting(ActionExecutingContext filterContext) {

    var request = filterContext.HttpContext.Request;
    var response = filterContext.HttpContext.Response;

    switch (_contentType) {
      case WebWhitespaceFilterContentType.Xml:

        response.Filter = new StringFilterStream(response.Filter, s => {
          s = Regex.Replace(s, @"\s+", " ");
          s = Regex.Replace(s, @"\s*\n\s*", "\n");
          s = Regex.Replace(s, @"\s*\>\s*\<\s*", "><");
          // single-line doctype must be preserved
          var firstEndBracketPosition = s.IndexOf(">");
          if (firstEndBracketPosition >= 0) {
            s = s.Remove(firstEndBracketPosition, 1);
            s = s.Insert(firstEndBracketPosition, ">\n");
          }
          return s;
        });
        break;

      case WebWhitespaceFilterContentType.Css:
      case WebWhitespaceFilterContentType.Javascript:

        response.Filter = new StringFilterStream(response.Filter, s => {
          s = Regex.Replace(s, @"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/", "");
          s = Regex.Replace(s, @"\s+", " ");
          s = Regex.Replace(s, @"\s*{\s*", "{");
          s = Regex.Replace(s, @"\s*}\s*", "}");
          s = Regex.Replace(s, @"\s*;\s*", ";");
          return s;
        });
        break;
    }
  }
}

一些低质量的代码。 1)\s+已经匹配了换行符,那么为什么下一行还要\s\n\s。2)应该在OnResultExecuted而不是OnActionExecuting方法上工作,因为这样我们才知道控制器设置的内容类型。3)如果使用行注释//,会破坏<pre>元素和<script>块javascript。4)没有找到HTML对将doctype保留在单独一行的要求。 - insp

2

我认为,如果你的视图生成超过20MB的数据,你可能需要探索不同的显示数据方式,例如分页?


2
由于应用程序的特殊性质,很遗憾这是不可能的。 - WOPR
1
浏览器不会因为巨大的解析而崩溃吗? - Tom Anderson
不,看起来在IE6/7/8、Safari和Firefox下都没问题。 - WOPR

1

这是我在项目中使用的一个空格过滤器属性的VB.NET版本:

#Region "Imports"

    Imports System.IO

#End Region

Namespace MyCompany.Web.Mvc.Extensions.ActionFilters

    ''' <summary>
    ''' WhitespaceFilter attribute
    ''' </summary>
    Public NotInheritable Class WhitespaceFilterAttribute
        Inherits ActionFilterAttribute

        ''' <summary>
        ''' Called when action executing.   
        ''' </summary>
        ''' <param name="filterContext">The filter context.</param>
        ''' <remarks></remarks>
        Public Overrides Sub OnActionExecuting(filterContext As ActionExecutingContext)

                filterContext.HttpContext.Response.Filter = New WhitespaceFilterStream(filterContext.HttpContext.Response.Filter)

        End Sub

    #Region "Whitespace stream filter"

            ''' <summary>
            ''' Whitespace stream filter
            ''' </summary>
            Private Class WhitespaceFilterStream
                Inherits Stream

    #Region "Declarations"

                ' Member vars.
                Private Shared regexPattern As New Regex("(?<=[^])\t{2,}|(?<=[>])\s{2,}(?=[<])|(?<=[>])\s{2,11}(?=[<])|(?=[\n])\s{2,}")
                ' Property vars.
                Private sinkStreamValue As Stream
                Private positionValue As Long

    #End Region

    #Region "Constructor(s)"

                ''' <summary>
                ''' Contructor to create a new object.
                ''' </summary>
                ''' <param name="sink"></param>
                ''' <remarks></remarks>
                Public Sub New(sink As Stream)

                    Me.sinkStreamValue = sink

                End Sub

    #End Region

    #Region "Properites"

                ''' <summary>
                ''' Gets the CanRead value.
                ''' </summary>
                ''' <value></value>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides ReadOnly Property CanRead() As Boolean
                    Get
                        Return True
                    End Get
                End Property

                ''' <summary>
                ''' Gets the CanSeek value.
                ''' </summary>
                ''' <value></value>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides ReadOnly Property CanSeek() As Boolean
                    Get
                        Return True
                    End Get
                End Property

                ''' <summary>
                ''' Gets the CanWrite value.
                ''' </summary>
                ''' <value></value>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides ReadOnly Property CanWrite() As Boolean
                    Get
                        Return True
                    End Get
                End Property

                ''' <summary>
                ''' Get Length value.
                ''' </summary>
                ''' <value></value>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides ReadOnly Property Length() As Long
                    Get
                        Return 0
                    End Get
                End Property

                ''' <summary>
                ''' Get or sets Position value.
                ''' </summary>
                ''' <value></value>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides Property Position() As Long
                    Get
                        Return Me.positionValue
                    End Get
                    Set(value As Long)
                        Me.positionValue = value
                    End Set
                End Property

    #End Region

    #Region "Stream Overrides Methods"

                ''' <summary>
                ''' Stream object Close method.
                ''' </summary>
                ''' <remarks></remarks>
                Public Overrides Sub Close()

                    Me.sinkStreamValue.Close()

                End Sub

                ''' <summary>
                ''' Stream object Close method.
                ''' </summary>
                ''' <remarks></remarks>
                Public Overrides Sub Flush()

                    Me.sinkStreamValue.Flush()

                End Sub

                ''' <summary>
                ''' Stream object Read method.
                ''' </summary>
                ''' <param name="buffer"></param>
                ''' <param name="offset"></param>
                ''' <param name="count"></param>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides Function Read(buffer As Byte(), offset As Integer, count As Integer) As Integer

                    Return Me.sinkStreamValue.Read(buffer, offset, count)

                End Function

                ''' <summary>
                ''' Stream object Seek method.
                ''' </summary>
                ''' <param name="offset"></param>
                ''' <param name="origin"></param>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides Function Seek(offset As Long, origin As SeekOrigin) As Long

                    Return Me.sinkStreamValue.Seek(offset, origin)

                End Function

                ''' <summary>
                ''' Stream object SetLength method.
                ''' </summary>
                ''' <param name="value"></param>
                ''' <remarks></remarks>
                Public Overrides Sub SetLength(value As Long)

                    Me.sinkStreamValue.SetLength(value)

                End Sub

                ''' <summary>
                ''' Stream object Write method.
                ''' </summary>
                ''' <param name="bufferBytes"></param>
                ''' <param name="offset"></param>
                ''' <param name="count"></param>
                ''' <remarks></remarks>
                Public Overrides Sub Write(bufferBytes As Byte(), offset As Integer, count As Integer)

                    Dim html As String = Encoding.Default.GetString(bufferBytes)

                    Buffer.BlockCopy(bufferBytes, offset, New Byte(count - 1) {}, 0, count)
                    html = regexPattern.Replace(html, String.Empty)
                    Me.sinkStreamValue.Write(Encoding.Default.GetBytes(html), 0, Encoding.Default.GetBytes(html).GetLength(0))

                End Sub

    #End Region

            End Class

    #End Region

        End Class

    End Namespace

还有在Global.asax.vb文件中:

Shared Sub RegisterGlobalFilters(ByVal filters As GlobalFilterCollection)

    With filters
        ' Standard MVC filters
        .Add(New HandleErrorAttribute())
        ' MyCompany MVC filters
        .Add(New CompressionFilterAttribute)
        .Add(New WhitespaceFilterAttribute)
    End With

End Sub

1
我在互联网上搜索了几个小时,寻找一个能够解释如何同时压缩和删除空格的答案,这是唯一真正有效的答案!我还学到了将这些属性作为全局过滤器添加可以在不使用BaseController的情况下处理每个页面。谢谢@Ed Degagne。我改变的一件事是Write方法:var html = Encoding.UTF8.GetString(buffer, offset, count); var reg = new Regex(@"(?<=\s)\s+(?![^<>]*</pre>)"); html = reg.Replace(html, string.Empty); buffer = Encoding.UTF8.GetBytes(html); _base.Write(buffer, 0, buffer.Length); - David Létourneau
@DavidLétourneau 你能解释一下为什么要改变吗?谢谢! - Ed DeGagne
1
当然,我遇到了一些HTML渲染问题。例如,输入按钮没有正确显示和其他一些HTML组件。这是唯一的原因=) - David Létourneau
1
惊人。. . . . - user4573148

0

空格可以压缩得相当好,我认为删除它不会节省太多空间。

如果可能的话,我建议尝试将一些HTML卸载到客户端,使用JavaScript重新构建重复的内容。


我完全同意...我认为使用GZIP会消除空格。如果在GZIP之后再进行另一层的“空格移除”,而不是移除空格,我会感到惊讶,因为这不会对大小产生任何影响。 - Matt Kocaj

-3
如果您从视图返回JSON,则它已经被压缩,并且不应包含任何空格或CR / LF。您应该使用分页来避免一次向浏览器发送过多的数据。

这取决于您使用的JSON库以及它的配置方式。 - ryandenki
作为注释而非答案相关。 - Oybek

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接