使用字典更新 pandas DataFrame 中的一行

Question

使用字典更新 pandas DataFrame 中的一行

8

我在pandas的数据框中发现了一个行为，但我不理解。

df = pd.DataFrame(np.random.randint(1, 10, (3, 3)), index=['one', 'one', 'two'], columns=['col1', 'col2', 'col3'])
new_data = pd.Series({'col1': 'new', 'col2': 'new', 'col3': 'new'})
df.iloc[0] = new_data
# resulting df looks like:

#       col1    col2    col3
#one    new     new     new
#one    9       6       1
#two    8       3       7

但如果我尝试添加一个字典，会得到如下结果：

new_data = {'col1': 'new', 'col2': 'new', 'col3': 'new'}
df.iloc[0] = new_data
#
#         col1  col2    col3
#one      col2  col3    col1
#one      2     1       7
#two      5     8       6

为什么会发生这种情况？在撰写这个问题的过程中，我意识到df.loc很可能只从new_data中获取键，这也解释了值为什么是无序的。但是，为什么会这样呢？如果我尝试从字典创建一个DataFrame，它会将键处理为列：

pd.DataFrame([new_data])

#    col1   col2    col3
#0  new     new     new

为什么df.loc的默认行为不是这样呢？

- J Jones

4个回答

3

如何做到这一点

这是一种简洁的方法，用于完成您的任务。我删除了您的df的索引，因为"one"出现两次，这会防止唯一索引。

>>> df = pd.DataFrame(np.random.randint(1, 10, (3, 3)), columns=['col1', 'col2', 'col3'])
>>> new_data = {'col1': 'new', 'col2': 'new', 'col3': 'new'}
>>> 
>>> df
   col1  col2  col3
0     1     6     1
1     4     2     3
2     6     2     3
>>> new_data
{'col1': 'new', 'col2': 'new', 'col3': 'new'}
>>> 
>>> df.loc[0, new_data.keys()] = new_data.values()
>>> df
  col1 col2 col3
0  new  new  new
1    4    2    3
2    6    2    3

- Markus Dutschke

1

与变体 df.loc[0] = pd.Series(new_data) 相比，如果 new_data 的键不匹配 df 的列，则此方法也适用 - 如果需要，将添加新列。 - rouckas

对我来说，如果不在.keys()和.values()周围加上list()，这个方法就无法正常工作。 - Joe Flack

0

对于我在Python 3.9上，使用pandas 1.5.3，这样做可以成功:df.loc[INDEX, list(MY_DICT.keys())] = list(MY_DICT.values())

- Joe Flack

0

一种简洁的方式

使用中间转换为pd.Series

>>> import pandas as pd
>>> df = pd.DataFrame(np.random.randint(1, 10, (3, 3)), columns=['col1', 'col2', 'col3'])
>>> new_data = {'col1': 'new1', 'col2': 'new2', 'col3': 'new3'}
>>> 
>>> df
   col1  col2  col3
0     5     7     9
1     8     7     8
2     5     3     3
>>> new_data
{'col1': 'new1', 'col2': 'new2', 'col3': 'new3'}
>>> 
>>> df.loc[0] = pd.Series(new_data)
>>> df
   col1  col2  col3
0  new1  new2  new3
1     8     7     8
2     5     3     3

- Markus Dutschke

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- piRSquared · Accepted Answer

这是字典迭代和pandas series处理方式的不同之处。当将pandas series分配给一行时，它会将其索引与列匹配；而当将其分配给一列时，它会将其索引与行匹配。之后，它会分配与匹配的索引或列对应的值。

如果一个对象不是具有方便的索引对象以进行匹配的pandas对象，则pandas将遍历该对象。字典遍历其键，这就是为什么在那些行槽中看到字典键的原因。字典没有排序，这就是为什么在那一行中看到混洗的键的原因。