来自谷歌IP范围的DoS攻击

4
我认为我被谷歌IP范围(66.249.65.* - 可能是IP欺骗?)攻击了多个请求(每秒5个整天),这些请求有谷歌机器人签名(Googlebot/2.1; +http://www.google.com/bot.html),但它试图获取旧的URL(我已将其停用,因为它消耗了大量CPU/$)。如果我将此IP范围列入黑名单,则也会阻止合法的谷歌机器人:(。

更具讽刺意味的是:我的应用程序(http://expoonews.com)由谷歌应用引擎服务托管!

如何在不阻止谷歌机器人的情况下停止此行为?

以下是我的日志示例,以便更好地理解。

 A 2014-11-25 19:41:19.145 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Flincoln.pioneer.kohalibrary.com%2Fcgi-bin%2Fkoha%2Fopac-search.pl%3Fidx%3Disbn%26q%3D1842172131%26do%3DSearch
66.249.65.82 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Flincoln.pioneer.kohalibrary.com%2Fcgi-bin%2Fkoha%2Fopac-search.pl%3Fidx%3Disbn%26q%3D1842172131%26do%3DSearch HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:19.550 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fwww.dnevniavaz.ba%2Fkultura%2Ffilm%2Fprica-o-hapsenju-ratnog-zlocinca
66.249.65.86 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Fwww.dnevniavaz.ba%2Fkultura%2Ffilm%2Fprica-o-hapsenju-ratnog-zlocinca HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:19.956 404 234 B 12ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FNewcastle_Local_Municipality
66.249.65.78 - - [25/Nov/2014:13:41:19 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FNewcastle_Local_Municipality HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=12 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:20.426 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Ftools.wmflabs.org%2Fgeohack%2Fgeohack.php%3Fpagename%3DRio_Grande_County%252C_Colorado%26params%3D37.61_N_-106.39_E_type%3Aadm2nd_region%3AUS-CO_source%3AUScensus1990
66.249.65.86 - - [25/Nov/2014:13:41:20 -0800] "GET /AddPageAction?url=http%3A%2F%2Ftools.wmflabs.org%2Fgeohack%2Fgeohack.php%3Fpagename%3DRio_Grande_County%252C_Colorado%26params%3D37.61_N_-106.39_E_type%3Aadm2nd_region%3AUS-CO_source%3AUScensus1990 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:20.763 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2F%23cite_ref-Istanbul_43-1
66.249.65.86 - - [25/Nov/2014:13:41:20 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2F%23cite_ref-Istanbul_43-1 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:21.166 404 234 B 10ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DHMAS%2520Pirie%26action%3Dhistory
66.249.65.86 - - [25/Nov/2014:13:41:21 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DHMAS%2520Pirie%26action%3Dhistory HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=10 cpu_ms=0 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16

 A 2014-11-25 19:41:21.571 404 234 B 11ms /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DUniversity_of_Engineering_and_Technology_Taxila_Chakwal_Campus_University_of_Engineering_and_Technology_Taxila_Chakwal_Campus%26action%3Dedit%26redlink%3D1
66.249.65.78 - - [25/Nov/2014:13:41:21 -0800] "GET /AddPageAction?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DUniversity_of_Engineering_and_Technology_Taxila_Chakwal_Campus_University_of_Engineering_and_Technology_Taxila_Chakwal_Campus%26action%3Dedit%26redlink%3D1 HTTP/1.1" 404 234 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "expoonews.com" ms=11 cpu_ms=23 cpm_usd=0.000026 instance=00c61b117c8ad4ca005d37349157867d41adaf app_engine_release=1.9.16 

这种模式可能也表明正在执行安全扫描,请参阅https://cloud.google.com/security-scanner/using-the-scanner - Dan Cornilescu
6个回答

1
似乎Googlebot正在捕获存储在您的网站本身或其他攻击者硬编码到其站点中并使用Googlebot发起攻击的注入。Web应用程序防火墙可以是一个很好的解决方案,它可以检测这些签名并明确拒绝此类请求。在Google中寻找Apache-ModSecurity或Nginx NAXSI!

0
在您的应用程序根目录(与 app.yaml 并列)中的 dos.yaml 文件可配置 DoS 保护服务黑名单。以下是一个 dos.yaml 文件例子:
blacklist:
- subnet: 1.2.3.4   description: a single IP address
- subnet: 1.2.3.4/24   description: an IPv4 subnet
- subnet: abcd::123:4567   description: an IPv6 address
- subnet: abcd::123:4567/48   description: an IPv6 subnet

https://cloud.google.com/appengine/docs/python/config/dos


但是我怎么能在不阻止真正的Googlebot请求的情况下完成这个操作呢? - Fulvius
A) 确定请求是否真实。B) 如果不是,请阻止它们。如果它们是真实的,那么如果您想被索引,请不要阻止它们。 - Paul Collingwood

0

你应该至少编写robots.txt来阻止真正的Googlebot访问旧的URL,他们会频繁地尝试访问已索引的URL,直到该URL返回404或任何其他被标记为已删除的方式。

我不确定它是否真的是一个假机器人,因为Googlebot本身就像垃圾邮件一样执行,短时间内访问太多页面。

为了减少来自Googlebot(假或真)的访问次数,可以考虑这样做:

#allows access 100times/m
dos_n = memcache.get(key=bot_ip)
if dos_n != None:
    if dos_n>100:
        self.abort(400)
    dos_n = memcache.incr(bot_ip)
else:
    memcache.add(key= bot_ip, value=0, time=60)

仅供参考,如果主机不在GAE上,您可以在网站管理员工具中更改爬行频率。 https://www.google.com/webmasters/tools/


0

好的,但这不是问题所在。我认为这可能是利用谷歌基础设施发起的攻击。 - Fulvius

0

我认为我通过删除接收参数的URL(指到另一页的url)解决了这个问题。

我想这个机器人试图弄清楚哪个Web URL是打开的,以便伪造访问某个站点(可能是为了增加访问量)。我的URL显然已经暴露出来了(只要传递地址就可以同时进行GET请求)。

但还是感谢大家的回答。


-1

这个可疑的函数与GoogleBot网络爬虫有关,如果您最近在网站上添加或更改了页面,您可以使用Fetch as Google工具请求Google重新索引它。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接