使用rpy2与pandas数据框架

4

我希望能够将一些 R 函数应用于 pandas 数据框中

df = pd.DataFrame( np.random.randn(5,2), # 5 rows, 2 columns
               columns = ["A","B"], # name of columns
               index = ["Max", "Nathy", "Tom", "Joe", "Kathy"] )

我该如何在R中使用summary函数?

我有以下代码:

import numpy as np
import pandas as pd

import rpy2
# print(rpy2.__version__) ## 2.9.4

from rpy2.rinterface import R_VERSION_BUILD
# print(R_VERSION_BUILD) ## ('3', '5.1', '', 74947)

from rpy2.robjects.packages import importr
# import R's "base" package
base = importr('base')
1个回答

4

您已经接近成功了。为了运行R函数,您需要将pandas Dataframe转换为R Dataframe。一旦我们拥有了R对象,我们就可以像下面展示的那样调用函数。

import rpy2
from rpy2.robjects.packages import importr # import R's "base" package
base = importr('base')

from rpy2.robjects import pandas2ri # install any dependency package if you get error like "module not found"
pandas2ri.activate()

# Create pandas df
df = pd.DataFrame( np.random.randn(5,2), # 5 rows, 2 columns
               columns = ["A","B"], # name of columns
               index = ["Max", "Nathy", "Tom", "Joe", "Kathy"] )

# Convert pandas to r
r_df = pandas2ri.py2ri(df)
type(r_df)

#calling function under base package
print(base.summary(r_df))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接