剔除查询字符串的某些属性/值对，以便Varnish不会根据它们变化来缓存。

Question

剔除查询字符串的某些属性/值对，以便Varnish不会根据它们变化来缓存。

12

我的目标是将特定的查询字符串属性及其值添加到“白名单”中，以便Varnish不会在URL之间变化缓存。

示例：

Url 1: http://foo.com/someproduct.html?utm_code=google&type=hello  
Url 2: http://foo.com/someproduct.html?utm_code=yahoo&type=hello  
Url 3: http://foo.com/someproduct.html?utm_code=yahoo&type=goodbye

在上面的示例中，我想将"utm_code"列入白名单，但不包括"type"。因此，在第一个URL被访问后，我希望Varnish将缓存内容提供给第二个URL。

然而，在第三个URL的情况下，属性"type"的值不同，因此应该是Varnish缓存未命中。

我尝试了以下两种方法（在Drupal帮助文章中找到，但现在无法找到了），但似乎没有起作用。可能是因为我的正则表达式有误。

# 1. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([\?|&])utm_(campaign|content|medium|source|term)=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");

# 2. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([\?|&])utm_campaign=[^&\s]*&?", "\1");
set req.url = regsuball(req.url, "([\?|&])foo_bar=[^&\s]*&?", "\1");
set req.url = regsuball(req.url, "([\?|&])bar_baz=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");

- runamok

7个回答

3

正则表达式出现了问题。
我对两个regsub调用中使用的正则表达式进行了更改：

sub normalize_req_url {
    # Clean up root URL
    if (req.url ~ "^/(?:\?.*)?$") {
        set req.url = "/";
    }

    # Strip out Google Analytics campaign variables
    # They are only needed by the javascript running on the page
    # utm_source, utm_medium, utm_campaign, gclid, ...
    if (req.url ~ "(\?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=") {
        set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=[%\._A-z0-9-]+&?", "");
    }
    set req.url = regsub(req.url, "(\?&|\?|&)$", "");
}

第一个更改是 "[%._A-z0-9-]" 这一部分，因为短划线的作用类似于范围符号，所以我将其移动到了末尾，并且点应该被转义。

第二个更改不仅是删除剩余 URL 中的问号，还要删除和问号或和号。

- kipusoep

1

来自https://github.com/mattiasgeniar/varnish-4.0-configuration-templates：

# Some generic URL manipulation, useful for all templates that follow
# First remove the Google Analytics added parameters, useless for our backend
if (req.url ~ "(\?|&)(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=") {
  set req.url = regsuball(req.url, "&(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "");
  set req.url = regsuball(req.url, "\?(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "?");
  set req.url = regsub(req.url, "\?&", "?");
  set req.url = regsub(req.url, "\?$", "");
}

- Pere

对于较旧的3.x版本，语法是相同的：https://github.com/mattiasgeniar/varnish-3.0-configuration-templates/blob/master/production.vcl#L70 - Pere

0

你想去掉utm_code，但它没有被你使用的任何一个正则表达式所覆盖。

试试这个：

# Strip out specific utm_ values from request URL query parameters
set req.url = regsuball(req.url, "([\?|&])utm_(campaign|content|medium|source|term|code)=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");

或者，如果您想要删除所有以utm_开头的URL参数，可以使用以下代码：

# Strip out ALL utm_ values from request URL query parameters
set req.url = regsuball(req.url, "([\?|&])utm_(\w+)=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");

- Ketola

抱歉，我本意是要解释一下我的代码似乎对utm_campaign、utm_content等无效。utm_code只是我编造的一个“通用示例”。不过我最终找到了可行的方法，并将其添加到原始编辑中...感谢您的建议！ - runamok

实际上你几乎做到了。但是当你有尾随的utm_时它会失败，因为贪婪的&在结尾匹配导致下一个不匹配。需要：([?|&])utm_(\w+)=[^&\s]* - dalore

0

一个 runamok 的副本，但我的参数里使用的是加号 (+) 而不是 %20，因此我已经将它添加到了我的正则表达式中。

sub vcl_recv {
    # strip out certain querystring params that varnish should not vary cache by
    call normalize_req_url;
    # snip a bunch of other code
}
sub normalize_req_url {
    # Strip out Google Analytics campaign variables.
    # I allso stribe facebook local that are use for facebook javascript.
    # They are only neededby the javascript running on the page
    # utm_source, utm_medium, utm_campaign, gclid, ...
    if(req.url ~ "(\?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|fb_local|mr:[A-z]+)=") {
        set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|fb_local|mr:[A-z]+)=[%.+-_A-z0-9]+&?", "");
    }
    set req.url = regsub(req.url, "(\?&?)$", "");
}

- Evaldnet

0

你们有没有尝试过这个？ https://github.com/Dridi/libvmod-querystring 示例 set req.url = querystring.regfilter(req.url, "utm_.*");

- user2965205

0

我在runamok的答案基础上进行了改进，增加了对空参数的支持并对剩余参数进行排序。这是一个完整的vtc文件，我实现了它来验证正确性。

varnishtest "Test for URL normalization - Varnish 4"

server s1 {
  rxreq
  txresp -hdr "Backend: up" -body "Some content"
} -repeat 11 -start

varnish v1 -vcl+backend {
  import std;

  sub vcl_recv {
    # Strip out marketing variables. They are only needed by
    # the javascript running on the page.
    if (req.url ~ "(\?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)(=|&|$)") {
      # Process params with value.
      set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=[%.\-_A-z0-9]+&?", "");
      # Process params without value.
      set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=?(&|$)", "");
    }
    # Remove trailing '?', '?&'
    set req.url = regsub(req.url, "(\?&?)$", "");
    # Sort query params, also removes trailing '&'
    set req.url = std.querysort(req.url);
  }

  sub vcl_deliver {
    set resp.http.X-Normalized-URL = req.url;
  }
} -start

client c1 {
  # Basic, no params.
  txreq -url "/test/some-url"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"

  # One blacklisted param.
  txreq -url "/test/some-url?utm_campaign=1"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"

  # One blacklisted param, without value.
  txreq -url "/test/some-url?utm_campaign"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"

  # Two blacklisted params.
  txreq -url "/test/some-url?utm_campaign=1&origin=hpg"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"

  # Two blacklisted params, one without value
  txreq -url "/test/some-url?utm_campaign&origin=123-abc%20"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"

  # Two blacklisted params, both without value
  txreq -url "/test/some-url?utm_campaign&origin="
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"

  # Three blacklisted params.
  txreq -url "/test/some-url?utm_campaign=ABC&origin=hpg&siteurl=br2"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"

  # Three blacklisted params, two without value
  txreq -url "/test/some-url?utm_campaign=1&origin=&siteurl"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"

  # Three blacklisted params; one param to keep, with space encoded as +.
  txreq -url "/test/some-url?qss=hello+one&utm_campaign=some-value&origin=hpg&siteurl=br2"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url?qss=hello+one"

  # Three blacklisted params; one param to keep, with space encoded as %20, passed in-between blacklisted ones.
  txreq -url "/test/some-url?utm_campaign=1&qss=hello%20one&origin=hpg&siteurl=br2"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url?qss=hello%20one"

  # Three blacklisted params; three params to keep.
  txreq -url "/test/some-url?utm_campaign=a-value&qss=hello+one&origin=hpg&siteurl=br2&keep2=abc&keep1"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url?keep1&keep2=abc&qss=hello+one"
} -run

varnish v1 -expect client_req == 11

- Jedihe

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- runamok · Accepted Answer

我想到了一个解决方法并想要分享。我找到了这段可以运行我的需求的代码。

sub vcl_recv {

    # strip out certain querystring params that varnish should not vary cache by
    call normalize_req_url;

    # snip a bunch of other code
}

sub normalize_req_url {

    # Strip out Google Analytics campaign variables. They are only needed
    # by the javascript running on the page
    # utm_source, utm_medium, utm_campaign, gclid, ...
    if(req.url ~ "(\?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=") {
        set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=[%.-_A-z0-9]+&?", "");
    }
    set req.url = regsub(req.url, "(\?&?)$", "");
}