如何使用随机字典值填充pandas数据框列

Question

如何使用随机字典值填充pandas数据框列

12

我是Pandas的新手，希望使用随机文本数据进行实验。我试图向DataFrame df添加2个新列，每个列都将由从字典中随机选择的键（newcol1）+值（newcol2）填充。

countries = {'Africa':'Ghana','Europe':'France','Europe':'Greece','Asia':'Vietnam','Europe':'Lithuania'}

我的df已经有2列，我想要像这样：

    Year Approved Continent    Country
0   2016      Yes    Africa      Ghana
1   2016      Yes    Europe  Lithuania
2   2017       No    Europe     Greece

我可以使用for循环或while循环填充df ['Continent']和df ['Country']，但我感觉 .apply() 和np.random.choice可能提供了更简单、更“pandorable”的解决方案。

我肯定可以使用for循环或while循环来填充df ['Continent']和df ['Country']，但我觉得使用.apply()和np.random.choice可能会提供更简单且更符合Pandas风格的解决方案。

- ozaarm

2个回答

0

你也可以尝试使用 DataFrame.sample()：

df.join(
    pd.DataFrame(list(countries.items()), columns=["continent", "country"])
    .sample(len(df), replace=True)
    .reset_index(drop=True)
)

如果您的大陆/国家地图已经是数据框架，那么可以更快地完成。

如果你使用的是Python 3.6，另一种方法是使用random.choices()：

df.join(
    pd.DataFrame(choices([*countries.items()], k=len(df)), columns=["continent", "country"])
)

random.choices()类似于numpy.random.choice()，但是你可以传递一个键值对元组列表，而numpy.random.choice()只接受1-D数组。

- eugenhu

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- cs95 · Accepted Answer

没错，你是正确的。你可以使用np.random.choice和map：

df

    Year Approved
0   2016      Yes
1   2016      Yes
2   2017       No

df['Continent'] = np.random.choice(list(countries), len(df))
df['Country'] = df['Continent'].map(countries)

df

    Year Approved Continent    Country
0   2016      Yes    Africa      Ghana
1   2016      Yes      Asia    Vietnam
2   2017       No    Europe  Lithuania

你从country键列表中随机选择len(df)个键，并使用country字典作为映射器来查找先前选定的键的国家等价物。