替换pandas数据框列中的前n个元素

5

我想要用我保存的另一个pd.series替换数据框中一列的前n个元素。例如:

        category   price    store  testscore
0       Cleaning   11.42  Walmart        NaN
1       Cleaning   23.50      Dia        NaN
2  Entertainment   19.99  Walmart        NaN
3  Entertainment   15.95     Fnac        NaN
4           Tech   55.75      Dia        NaN
5           Tech  111.55  Walmart        NaN

我希望替换testscore中的前三个NaN值为新的字符串。

假设我有一个变量:

cats = pd.Series(df['category'][0:2])

那么我能否将其放置在testscore列中...

        category   price    store      testscore
0       Cleaning   11.42  Walmart       Cleaning
1       Cleaning   23.50      Dia       Cleaning
2  Entertainment   19.99  Walmart  Entertainment
3  Entertainment   15.95     Fnac            NaN
4           Tech   55.75      Dia            NaN
5           Tech  111.55  Walmart            NaN

但是每当我尝试这样做时,它都不起作用。
创建此虚假数据集的代码:
import pandas as pd
import numpy as np

df = pd.DataFrame({'category': ['Cleaning', 'Cleaning', 'Entertainment', 'Entertainment', 'Tech', 'Tech'],
                        'store': ['Walmart', 'Dia', 'Walmart', 'Fnac', 'Dia','Walmart'],
                        'price':[11.42, 23.50, 19.99, 15.95, 55.75, 111.55],
                        'testscore': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})

print(df)

df2 = pd.DataFrame({'category': ['Cleaning', 'Cleaning', 'Entertainment', 'Entertainment', 'Tech', 'Tech'],
                        'store': ['Walmart', 'Dia', 'Walmart', 'Fnac', 'Dia','Walmart'],
                        'price':[11.42, 23.50, 19.99, 15.95, 55.75, 111.55],
                        'testscore': ['Cleaning', 'Cleaning', 'Entertainment', np.nan, np.nan, np.nan]})

print(df2)

每当我尝试做这件事时,它都无法正常工作。你得到了什么错误或结果? - Evan
@Evan 我没有收到错误信息,只是它不会输入元素。 - conv3d
2个回答

6

只需使用df.loc

import pandas as pd
import numpy as np

df = pd.DataFrame({'category': ['Cleaning', 'Cleaning', 'Entertainment', 'Entertainment', 'Tech', 'Tech'],
                        'store': ['Walmart', 'Dia', 'Walmart', 'Fnac', 'Dia','Walmart'],
                        'price':[11.42, 23.50, 19.99, 15.95, 55.75, 111.55],
                        'testscore': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})


cats = pd.Series(df['category'][:3]) # 3 elements

df.loc[:3,'testscore'] = cats # Assign first 3

print(df)

然后你会得到:

        category   price    store      testscore
0       Cleaning   11.42  Walmart       Cleaning
1       Cleaning   23.50      Dia       Cleaning
2  Entertainment   19.99  Walmart  Entertainment
3  Entertainment   15.95     Fnac            NaN
4           Tech   55.75      Dia            NaN
5           Tech  111.55  Walmart            NaN

我遇到了“ValueError: Must have equal len keys and value when setting with an iterable”错误。 - conv3d
你复制了我的代码吗?你有最新的panda版本吗? - Anton vBR

2

使用参数为limitfillna函数:

df['testscore'] = df.testscore.fillna(df.category, limit=3)
df 

输出:

        category   price    store      testscore
0       Cleaning   11.42  Walmart       Cleaning
1       Cleaning   23.50      Dia       Cleaning
2  Entertainment   19.99  Walmart  Entertainment
3  Entertainment   15.95     Fnac            NaN
4           Tech   55.75      Dia            NaN
5           Tech  111.55  Walmart            NaN

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接