得票数最多 'apache-spark-mllib' 问题 - 第5页

关联标签

16得票1回答

Apache Spark Mllib中ALS机器学习算法中的rank是什么？

我想尝试一下ALS机器学习算法的例子。我的代码运行得很好，但是我不理解算法中使用的参数rank。以下是我的Java代码： // Build the recommendation model using ALS int rank = 10; int numIterati...

algorithmapache-sparkmachine-learningapache-spark-mllib

16得票1回答

Spark ML索引器无法解析带有点的DataFrame列名？

我有一个DataFrame，其中有一列名为a.b。当我将a.b作为输入列名指定给StringIndexer时，会抛出AnalysisException异常，错误信息为"cannot resolve 'a.b' given input columns a.b"。我正在使用Spark 1.6.0版...

javaapache-sparkapache-spark-mllibapache-spark-ml

16得票1回答

如何在PySpark数据帧中将ArrayType转换为DenseVector？

我在构建一个机器学习Pipeline时遇到了以下错误：pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Column features must be of type org.apache.spark.ml.lin...

pythonapache-sparkpysparkapache-spark-mllibapache-spark-ml

16得票4回答

PySpark计算相关性

我想使用 pyspark.mllib.stat.Statistics.corr 函数来计算 pyspark.sql.dataframe.DataFrame 对象中两列之间的相关性。 corr 函数希望接受一个 Vectors 对象的 rdd。如何将 df['some_name'] 列转换为 V...

pythonapache-sparkpysparkapache-spark-sqlapache-spark-mllib

16得票1回答

为什么spark.ml没有实现任何spark.mllib算法？

根据Spark MLlib Guide，Spark有两个机器学习库：spark.mllib（基于RDD）和spark.ml（基于DataFrames）。根据StackOverflow上的这个和这个问题，DataFrames比RDDs更好（也是较新的），应该尽可能使用它。但问题在于我想使用常...

machine-learningapache-sparkpysparkapache-spark-mllibapache-spark-ml

15得票2回答

从DataFrame到RDD[LabeledPoint]

我正在尝试使用Apache Spark MLlib实现文档分类器，但是在表示数据方面遇到了一些问题。我的代码如下： import org.apache.spark.sql.{Row, SQLContext} import org.apache.spark.sql.types.{StringT...

scalaapache-sparkapache-spark-mllib

15得票1回答

在Apache Spark Word2Vec中，迭代次数和分区数有什么关系？

根据 mllib.feature.Word2Vec - spark 1.3.1文档[1]，可以得知： def setNumIterations(numIterations: Int): Word2Vec.this.type 设置迭代次数（默认值为1），应该小于或等于分区数。 def s...

apache-sparkapache-spark-mllibword2vec

15得票2回答

如何更新Spark ALS的MatrixFactorizationModel

我用https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html中所述的方法为MovieLens数据库创建了一个简单的推荐系统。同时，我在显式训练方面遇到了问题，就像Apache Spar...

apache-sparkmachine-learningapache-spark-mllibcollaborative-filtering

15得票1回答

如何在Apache Spark中保存和加载MLLib模型？

我在Apache Spark中使用pyspark训练了一个分类模型。我将该模型存储在一个对象（LogisticRegressionModel）中。现在，我想对新数据进行预测。我希望能够存储这个模型，并在新程序中读取它以进行预测。有什么方法可以存储这个模型吗？我考虑使用pickle，但我对Pyt...

pythonapache-sparkpysparkapache-spark-mllib

15得票1回答

Spark LDA消耗过多内存。

我想使用Spark MLlib LDA来总结我的文档语料库。我的问题设置如下: - 约有100,000个文档 - 约有400,000个唯一单词 - 100个聚类我有16台服务器(每个服务器有20个内核和128GB内存)。当我使用OnlineLDAOptimizer执行LDA时，它会...

apache-sparkapache-spark-mlliblda