使用BeautifulSoup时,我无法避免Python RuntimeError的最大递归深度。
我正在尝试递归处理嵌套的代码段并提取内容。美化后的HTML如下所示(不要问为什么它看起来像这样:)):
<div><code><code><code><code>Code in here</code></code></code></code></div>
我要把我的soup对象传递给的函数是:
def _strip_descendent_code(self, soup):
sys.setrecursionlimit(2000)
# soup = BeautifulSoup(html, 'lxml')
for code in soup.findAll('code'):
s = ""
for c in code.descendents:
if not isinstance(c, NavigableString):
if c.name != code.name:
continue
elif c.name == code.name:
if isinstance(c, NavigableString):
s += str(c)
else:
continue
code.append(s)
return str(soup)
您可以看到我正在尝试增加默认递归限制,但这不是一个解决方案。我已经增加到计算机内存限制的点,但上面的函数从来没有起作用。
任何帮助让它起作用并指出错误/问题将不胜感激。
堆栈跟踪重复显示如下内容:
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 529, in _find_all
i = next(generator)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1269, in descendants
stopNode = self._last_descendant().next_element
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 284, in _last_descendant
if is_initialized and self.next_sibling:
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 997, in __getattr__
return self.find(tag)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 529, in _find_all
i = next(generator)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1269, in descendants
stopNode = self._last_descendant().next_element
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 284, in _last_descendant
if is_initialized and self.next_sibling:
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 997, in __getattr__
return self.find(tag)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1234, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1255, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 512, in _find_all
strainer = SoupStrainer(name, attrs, text, **kwargs)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1548, in __init__
self.text = self._normalize_search_value(text)
File "/Users/almccann/.virtualenvs/evernoteghost/lib/python3.4/site-packages/bs4/element.py", line 1553, in _normalize_search_value
if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
RuntimeError: maximum recursion depth exceeded while calling a Python object