如何在Pandas中生成多个交互项？

Question

如何在Pandas中生成多个交互项？

pythonpandasscikit-learnstatsmodels

10

我想使用许多与年份、人口统计数据等虚拟变量进行交互的方法来估计一个IV回归模型。在Pandas中，我找不到明确的方法来做这件事，很好奇是否有人有什么提示。

我正在考虑尝试scikit-learn和这个函数：

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

- pdevar

3

使用Patsy公式 http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/formulas.html - Josef

我添加了一个维基百科链接来解释IV的缩写。 - Wolf

顺便提一下：statsmodels在沙盒中有IV（IV2SLS和IVGMM）。 - Josef

2个回答

6

你可以使用sklearn的PolynomialFeatures函数。以下是一个例子：

假设这是你的设计（即特征）矩阵：

x = array([[ 3, 20, 11],
       [ 6,  2,  7],
       [18,  2, 17],
       [11, 12, 19],
       [ 7, 20,  6]])


x_t = PolynomialFeatures(2, interaction_only=True, include_bias=False).fit_transform(x)

以下是结果：

array([[   3.,   20.,   11.,   60.,   33.,  220.],
       [   6.,    2.,    7.,   12.,   42.,   14.],
       [  18.,    2.,   17.,   36.,  306.,   34.],
       [  11.,   12.,   19.,  132.,  209.,  228.],
       [   7.,   20.,    6.,  140.,   42.,  120.]])

前三个特征是原始的特征，接下来的三个是原始特征的交互作用。

- motam79

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Marcus V. · Accepted Answer

我现在面临一个类似的问题，需要一种灵活的方式来创建特定的交互，并查看 StackOverflow。我遵循了上面@user333700的评论，并由于他找到了 patsy (http://patsy.readthedocs.io/en/latest/overview.html) ，并在 Google 搜索后发现这个scikit-learn集成 patsylearn (https://github.com/amueller/patsylearn)。

因此，通过阅读 @motam79 的示例，可以实现以下内容：

import numpy as np
import pandas as pd
from patsylearn import PatsyModel, PatsyTransformer
x = np.array([[ 3, 20, 11],
   [ 6,  2,  7],
   [18,  2, 17],
   [11, 12, 19],
   [ 7, 20,  6]])
df = pd.DataFrame(x, columns=["a", "b", "c"])
x_t = PatsyTransformer("a:b + a:c + b:c", return_type="dataframe").fit_transform(df)

这将返回以下内容：

     a:b    a:c    b:c
0   60.0   33.0  220.0
1   12.0   42.0   14.0
2   36.0  306.0   34.0
3  132.0  209.0  228.0
4  140.0   42.0  120.0

我在这里回答了一个类似的问题，提供了另一个关于分类变量的示例：如何从分类变量创建交互设计矩阵？