无法解析RSS订阅源。

4
我正在尝试使用Python中的feedparser解析来自URL的RSS源。
>>> import feedparser 
>>> d = feedparser.parse('http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801')
>>> d
{'feed': {'summary': u'<span><h1>Server Error in \'/mobile\' Application.<hr color="silver" size="1" width="100%" /></h1>\n\n            
<h2> <i>Attempted to divide by zero.</i> </h2></span>\n\n            <font face="Arial, Helvetica, Geneva, SunSans-Regular, sans-serif ">\n\n            <b> Description: </b>An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.\n\n            <br /><br />\n\n            <b> Exception Details: </b>System.DivideByZeroException: Attempted to divide by zero.<br /><br />\n\n            
<b>Source Error:</b> <br /><br />\n\n            <table bgcolor="#ffffcc" width="100%">\n               <tr>\n                  <td>\n                      <code>\n\nAn unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.</code>\n\n                  </td>\n               </tr>\n            </table>\n\n            <br />\n\n            <b>Stack Trace:</b> <br /><br />\n\n            <table bgcolor="#ffffcc" width="100%">\n               <tr>\n                  <td>\n                      <code><pre>\n\n[DivideByZeroException: Attempted to divide by zero.]\n   System.Decimal.FCallDivide(Decimal&amp; d1, Decimal&amp; d2) +0\n   System.Decimal.Divide(Decimal d1, Decimal d2) +17\n   Martjack.CMS.PageControlsModelComp.GetPluginDataEnt(PageControlEnt objPageControlEnt, MerchantENT MerchantEnt, PageControlModel&amp; objPageControlModel, ProductEnt_RE ProductEnt, String MobileVersion) +2324\n   
Martjack.CMS.PageControlsModelComp.GetPageControlOutputData(PageModel pagemodel, PageControlEnt objPageControlEnt, MerchantENT MerchantEnt, String seocid, String combiType, String MobileVersion, ProductEnt_RE ProductEnt, String siteurl) +694\n   Martjack.CMS.PageControlsModelComp.GetPageControlModels(PageModel Pagemodel, MerchantENT MerchantEnt, String seocid, String combiType, String MobileVersion, DNDPageControlViewCollection objDNDPageControlViewCollection, Boolean isdndrequest, Int64 pgcontrolid, String siteurl) +919\n   Martjack.CMS.PageModelComp.GetPageModel(MerchantENT MerchantEnt, Int32 predefinedPageId, Boolean isPredefined, ChannelType channel, String seocid, String Bid, String combiType, String MobileVersion, Boolean isDndRequest, 
DNDPageControlViewCollection ObjDNDPageControlViewCollection, Boolean ControlsInfo, Int64 pgcontrolid) +1717\n   MartJack.Facade.CMSFacade.GetPageModel(MerchantENT MerchantEnt, Int32 PageId, Boolean isPredefined, ChannelType channel, String seocid, String bid, String combitype, String mobileversion, Boolean isDndRequest, DNDPageControlViewCollection ObjDNDPageControlViewCollection, Boolean ControlsInfo, Int64 pgcontrolid) +119\n   MobileECommerce.MobileECommerce.ProductsController.GetPageModelByRequest(String seoid, String bid) +227\n   MobileECommerce.MobileECommerce.ProductsController.Index(String id, String seobrand, String category, String categoryparent) +54\n   lambda_method(Closure , ControllerBase , Object[] ) +272\n   
System.Web.Mvc.ActionMethodDispatcher.Execute(ControllerBase controller, Object[] parameters) +17\n   System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters) +212\n   System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +239\n   System.Web.Mvc.&lt;&gt;c__DisplayClass15.&lt;InvokeActionMethodWithFilters&gt;b__12() +56\n   System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodFilter(IActionFilter filter, ActionExecutingContext preContext, Func`1 continuation) +282\n   System.Web.Mvc.&lt;&gt;c__DisplayClass17.&lt;InvokeActionMethodWithFilters&gt;b__14() +20\n   System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodWithFilters(ControllerContext controllerContext, IList`1 filters, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +201\n   System.Web.Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext, String actionName) +351\n   System.Web.Mvc.Controller.ExecuteCore() +99\n   System.Web.Mvc.ControllerBase.Execute(RequestContext requestContext) +94\n   System.Web.Mvc.ControllerBase.System.Web.Mvc.IController.Execute(RequestContext requestContext) +10\n   
System.Web.Mvc.&lt;&gt;c__DisplayClassb.&lt;BeginProcessRequest&gt;b__5() +43\n   System.Web.Mvc.Async.&lt;&gt;c__DisplayClass1.&lt;MakeVoidDelegate&gt;b__0() +21\n   System.Web.Mvc.Async.&lt;&gt;c__DisplayClass8`1.&lt;BeginSynchronous&gt;b__7(IAsyncResult _) +12\n   System.Web.Mvc.Async.WrappedAsyncResult`1.End() +53\n   System.Web.Mvc.Async.AsyncResultWrapper.End(IAsyncResult asyncResult, Object tag) +28\n   System.Web.Mvc.Async.AsyncResultWrapper.End(IAsyncResult asyncResult, Object tag) +15\n   System.Web.Mvc.&lt;&gt;c__DisplayClasse.&lt;EndProcessRequest&gt;b__d() +34\n   System.Web.Mvc.SecurityUtil.&lt;GetCallInAppTrustThunk&gt;b__0(Action f) +7\n   System.Web.Mvc.SecurityUtil.ProcessInApplicationTrust(Action action) +23\n   System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult) +68\n   
System.Web.Mvc.MvcHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result) +9\n   System.Web.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +714\n   System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean&amp; completedSynchronously) +240\n</pre></code>\n\n                  </td>\n               </tr>\n            </table>\n\n            <br />\n\n            
<hr color="silver" size="1" width="100%" />\n\n            <b>Version Information:</b>\xa0Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.0.30319.272\n\n            </font>'}, 'status': 302, 'version': u'', 'encoding': u'utf-8', 'bozo': 1, 'headers': {'content-length': '11348', 'x-powered-by': 'ASP.NET', 'set-cookie': 'SERVERID=HAS14; path=/', 'originserver': 'HAS14', 'server': 'Microsoft-IIS/7.5', 'connection': 'close', 'cache-control': 'private', 'date': 'Tue, 16 Apr 2013 08:03:59 GMT', 'content-type': 'text/html; charset=utf-8', 'x-aspnet-version': '4.0.30319'}, 'href': 
u'http://www.shop.inonit.in/mobile/Products//NA/NA/0', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('not well-formed (invalid token)',)}

当我尝试使用scrapy爬取该网站的单个页面时,它可能会将我重定向到不存在的网址。因此,输出中没有任何内容,但如果您访问链接(http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801),则会显示大量内容!

如果有帮助,非常感谢!谢谢!


“输出中什么都没有”是什么意思?>>> len(d['feed']['summary']) 5601,这里有一个很好的“除以零”的消息。 - Steven Almeroth
1
啊,对不起,我所指的“nothing”是指没有任何相关信息,也就是诸如标题、价格等元素,并不能读取到源。但是如果您打开链接,就能看到所有的数据了。 - user_2000
1个回答

4

您是否正在使用代理? 如果是的话,可以按照以下方式进行操作 -

import urllib2, feedparser
proxy = urllib2.ProxyHandler({"http":"proxy:port"})
d = feedparser.parse('http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801', handlers = [proxy])

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接