sklearn中的因子分析：解释方差

Question

sklearn中的因子分析：解释方差

8

在scikit-learn中，PCA有一个名为"explained_variance"的属性，用于捕获每个成分解释的方差。但是，我在scikit-learn的FactorAnalysis中没有看到类似的内容。我该如何计算每个成分解释的方差？

- vkmv

2个回答

3

在FA/PCA的常规命名中，scikit-learn输出的components_可以在其他地方称为loadings。例如，FactorAnalyzer包输出等效的loadings_，一旦您更改设置以匹配scikit-learn（即设置rotation = None，设置method ='ml'，并确保将数据标准化输入到scikit-learn函数中，因为FactorAnalyzer会在内部对数据进行标准化）。

与来自scikit-learn的PCA的components_相比，它们是单位长度的特征向量，FA的输出已经被缩放，因此可以通过求平方和来提取解释的方差。请注意，所解释的方差比例是根据原始变量的总方差而表达的，而不是像@Gaurav的答案中那样根据因子的方差。

from sklearn.decomposition import FactorAnalysis
k_fa = 3   # e.g.

fa_k = FactorAnalysis(n_components=k_fa).fit(X_in)

fa_loadings = fa_k.components_.T    # loadings

# variance explained
total_var = X_in.var(axis=0).sum()  # total variance of original variables,
                                    # equal to no. of vars if they are standardized

var_exp = np.sum(fa_loadings**2, axis=0)
prop_var_exp = var_exp/total_var
cum_prop_var_exp = np.cumsum(var_exp/total_var)

print(f"variance explained: {var_exp.round(2)}")
print(f"proportion of variance explained: {prop_var_exp.round(3)}")
print(f"cumulative proportion of variance explained: {cum_prop_var_exp.round(3)}")

# e.g. output:
#   variance explained: [3.51 0.73]
#   proportion of variance explained: [0.351 0.073]
#   cumulative proportion of variance explained: [0.351 0.425]

- SpinUp __ A Davis

如果我将数据集X_in的n_components设置为n_features，我期望cum_prop_var_exp达到100％，但它只能接近高90s。 - P4L

如果我将数据集X_in的n_components设置为n_features，我期望cum_prop_var_exp达到100％，但实际上只能接近90多个百分点。 - undefined

@P4L 不，你的期望是不正确的。即使你将n_components设置为n_features，仍然可能有一些方差由私有噪声成分解释。cum_prop_var_exp只捕捉共享方差，即使n_components = n_features。 - alireza

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Gaurav Dhama · Accepted Answer

以下是您可以操作的步骤：

首先，在执行因子分析后，获取组件矩阵和噪声方差，让fa为您拟合的模型。

m = fa.components_
n = fa.noise_variance_

将这个矩阵平方

m1 = m**2

计算矩阵m1每列的总和

m2 = np.sum(m1,axis=1)

现在第一因子解释的方差将为%variance。

pvar1 = (100*m2[0])/np.sum(m2)

同样，第二个因素。

pvar2 = (100*m2[1])/np.sum(m2)

然而，噪声组件也会导致方差的解释差异，如果您在解释方差时考虑了这一点，那么您需要计算

pvar1_with_noise = (100*m2[0])/(np.sum(m2)+np.sum(n))
pvar2_with_noise = (100*m2[1])/(np.sum(m2)+np.sum(n))

等等。希望这能帮到你。