我正在尝试在Google App Engine上使用Angular.js客户端和webapp2。
为了解决SEO问题,想法是使用一个无头浏览器在服务器端运行JavaScript,并将生成的HTML提供给爬虫。
是否有适用于Google App Engine的Python无头浏览器?
我正在尝试在Google App Engine上使用Angular.js客户端和webapp2。
为了解决SEO问题,想法是使用一个无头浏览器在服务器端运行JavaScript,并将生成的HTML提供给爬虫。
是否有适用于Google App Engine的Python无头浏览器?
现在可以使用自定义运行时在App Engine Flex上完成此操作,因此我正在添加此答案,因为在谷歌中搜索此问题是第一件事。
我基于我的其他GAE flex微服务构建了此自定义运行时,该微服务使用预构建的Python运行时。
项目结构:
webdrivers/
- geckodriver
app.yaml
Dockerfile
main.py
requirements.txt
app.yaml:
service: my-app-engine-service-name
runtime: custom
env: flex
entrypoint: gunicorn -b :$PORT main:app --timeout 180
Dockerfile:
FROM gcr.io/google-appengine/python
RUN apt-get update
RUN apt-get install -y xvfb
RUN apt-get install -y firefox
LABEL python_version=python
RUN virtualenv --no-download /env -p python
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /app/
RUN pip install -r requirements.txt
ADD . /app/
CMD exec gunicorn -b :$PORT main:app --timeout 180
requirements.txt:
Flask==0.12.2
gunicorn==19.7.1
selenium==3.13.0
pyvirtualdisplay==0.2.1
main.py
import os
import traceback
from flask import Flask, jsonify, Response
from selenium import webdriver
from pyvirtualdisplay import Display
app = Flask(__name__)
# Add the webdrivers to the path
os.environ['PATH'] += ':'+os.path.dirname(os.path.realpath(__file__))+"/webdrivers"
@app.route('/')
def hello():
return 'Hello!!'
@app.route('/test/', methods=['GET'])
def go_headless():
try:
display = Display(visible=0, size=(1024, 768))
display.start()
d = webdriver.Firefox()
d.get("http://www.python.org")
page_source = d.page_source.encode("utf-8")
d.close()
display.stop()
return jsonify({'success': True, "result": page_source[:500]})
except Exception as e:
print traceback.format_exc()
return jsonify({'success': False, 'msg': str(e)})
if __name__ == '__main__':
app.run(host='127.0.0.1', port=8080, debug=True)
从这里下载geckodriver(linux 64位):
https://github.com/mozilla/geckodriver/releases
其他注意事项:
WebDriverException: Message: Can't load the profile. Possible firefox version mismatch. You must use GeckoDriver instead for Firefox 48+. Profile Dir: /tmp/tmp 48P If you specified a log_file in the FirefoxBinary constructor, check it for details.
DesiredCapabilities().FIREFOX["marionette"] = False
https://github.com/SeleniumHQ/selenium/issues/5106display = Display(visible=0, size=(1024, 768))
需要修复此错误:如何修复Selenium WebDriverException:The browser appears to have exited before we could connect?本地测试:
docker build . -t my-docker-image-tag
docker run -p 8080:8080 --name=my-docker-container-name my-docker-image-tag
部署到应用引擎:
gcloud app deploy app.yaml --version dev --project my-app-engine-project-id