我希望能在企业代理后面使用外部软件包运行spark-shell。不幸的是,通过--packages
选项传递的外部软件包无法被解析。
例如,当运行以下命令时:
bin/spark-shell --packages datastax:spark-cassandra-connector:1.5.0-s_2.10
卡桑德拉连接器包未解决(卡在最后一行):
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
经过一段时间,连接超时并包含以下错误消息:
:::: ERRORS
Server access error at url https://repo1.maven.org/maven2/datastax/spark-cassandra-connector/1.5.0-s_2.10/spark-cassandra-connector-1.5.0-s_2.10.pom (java.net.ConnectException: Connection timed out)
当我关闭企业代理的VPN时,软件包立即被解析和下载。
到目前为止我尝试过:
将代理公开为环境变量:
export http_proxy=<proxyHost>:<proxyPort>
export https_proxy=<proxyHost>:<proxyPort>
export JAVA_OPTS="-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort>"
export ANT_OPTS="-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort>"
使用额外的Java选项运行spark-shell:
bin/spark-shell --conf "spark.driver.extraJavaOptions=-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort>" --conf "spark.executor.extraJavaOptions=-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort>" --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.10
还有其他的配置可能性我漏掉了吗?
https.proxyHost
和端口。 - Thomas Decaux