from bs4 import BeautifulSoup
import urllib
import re
soup = urllib.urlopen("http://atlanta.craigslist.org/cto/")
soup = BeautifulSoup(soup)
souped = soup.p
print souped
m = re.search("\\$.",souped)
print m.group(0)
我可以成功下载并打印HTML,但当我添加最后两行时,它总是出错。
我收到了这个错误:
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 323, in RunScript
debugger.run(codeObject, __main__.__dict__, start_stepping=0)
File "C:\Python27\Lib\site-packages\pythonwin\pywin\debugger\__init__.py", line 60, in run
_GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
File "C:\Python27\Lib\site-packages\pythonwin\pywin\debugger\debugger.py", line 655, in run
exec cmd in globals, locals
File "C:\Users\Zack\Documents\Scripto.py", line 1, in <module>
from bs4 import BeautifulSoup
File "C:\Python27\lib\re.py", line 142, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or buffer
感谢许多!
__str __()
方法将它们转换为字符串,这样它们可以被漂亮地打印出来(因为print
会自动完成),但它们实际上不是字符串,而re.search()
需要一个字符串。因此,您必须显式地将HTML转换为字符串,以便可以搜索它。 - kindall