Pandas系列数据类型作为命名元组

Question

Pandas系列数据类型作为命名元组

pythonpandas

5

我正在访问pandas数据框的行，结果得到pandas系列。我的解析例程接受namedtuple。是否可能将pandas系列转换为namedtuple？

- Mateusz

1

注意，df.itertuples() 返回一个迭代器，默认情况下为每行返回一个 namedtuple 对象... - juanpa.arrivillaga

1

请提供一些样例输入和（期望的）样例输出。 - Willem Van Onsem

hit = master.data_frame.loc[invoice.Id] - series 现在我想将 hit 转换为 namedtuple。 - Mateusz

@Mateusz，将来请在问题本身中提供一个 [mcve]。 - juanpa.arrivillaga

3个回答

5

您可以尝试使用df.itertuples来完成您的需求：

In [5]: df
Out[5]:
     c0    c1    c2    c3    c4    c5    c6    c7    c8    c9
0   8.0   2.0   1.0   4.0   4.0   3.0   1.0  19.0   5.0   9.0
1   7.0   7.0   0.0   4.0  14.0   7.0   9.0   0.0   0.0   9.0
2  19.0  10.0   6.0  13.0  12.0  11.0   8.0   4.0  11.0  13.0
3  14.0   0.0  16.0  19.0   3.0   8.0   8.0   9.0  17.0  13.0
4  18.0  16.0  10.0   8.0  15.0   9.0  18.0   9.0   5.0  10.0
5  15.0   7.0  16.0   3.0  18.0  14.0   3.0   6.0   0.0   9.0
6  14.0  14.0  18.0   4.0   4.0   0.0   8.0  15.0   8.0  12.0
7  19.0  16.0  15.0  16.0   1.0  12.0  14.0   1.0  10.0  15.0
8   8.0  17.0  10.0  18.0   7.0  13.0  13.0  12.0   6.0  11.0
9  15.0  13.0  13.0  17.0   2.0   0.0   6.0  10.0   5.0   5.0

In [6]: rows = df.itertuples(name='Row')

In [7]: r0 = next(rows)

In [8]: r0
Out[8]: Row(Index=0, c0=8.0, c1=2.0, c2=1.0, c3=4.0, c4=4.0, c5=3.0, c6=1.0, c7=19.0, c8=5.0, c9=9.0)

In [9]: r0.c0
Out[9]: 8.0

否则，你将不得不自己动手做，类似于以下操作：

In [10]: from collections import namedtuple

In [11]: df.columns
Out[11]: Index(['c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9'], dtype='object')

In [12]: Row = namedtuple('Row', df.columns)

In [13]: df.iloc[0]
Out[13]:
c0     8.0
c1     2.0
c2     1.0
c3     4.0
c4     4.0
c5     3.0
c6     1.0
c7    19.0
c8     5.0
c9     9.0
Name: 0, dtype: float64

In [14]: Row(*df.iloc[0])
Out[14]: Row(c0=8.0, c1=2.0, c2=1.0, c3=4.0, c4=4.0, c5=3.0, c6=1.0, c7=19.0, c8=5.0, c9=9.0)

请注意，此版本没有 index 字段...

- juanpa.arrivillaga

我想这样做，但问题是我得到了一个系列（一行）：master.data_frame.loc[invoice.Id] - series。我想将其转换为命名元组。 - Mateusz

@Mateusz 确定了，添加了一个快速且简单的方法的示例。 - juanpa.arrivillaga

1

如果手头已经有一个Pandas序列，并且您正在将其用作函数的输入，那么另一种方法是按原样解包该序列。

>>> df = pd.DataFrame({'name': ['John', 'Sally'], 'date': ['2020-01-01', '2020-02-01'], 'value': ['A', 'B']})
>>> df
    name        date value
0   John  2020-01-01     A
1  Sally  2020-02-01     B
>>> row = df.iloc[0]
>>> type(row)
<class 'pandas.core.series.Series'>
>>> print({**row})  # unpacks as a dictionary
{'name': 'John', 'date': '2020-01-01', 'value': 'A'}
>>> myfunc(**row)   # ergo, unpacks as keyword args

这是因为Pandas的Series已经是类似于namedtuple的对象了（而且它正是df.itertuples返回的对象）。无论如何，对于我尝试解决的问题，我只需要获取数据框中的特定行，而不是遍历整个数据框，所以我不需要转换为named tuple。

- kjekk

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- piRSquared · Accepted Answer

通用功能，将任何系列转换为namedtuple

def namedtuple_me(s, name='S'):
    return namedtuple(name, s.index)(*s)

namedtuple_me(pd.Series([1, 2, 3], list('abc')))
S(a=1, b=2, c=3)

为了更好的实现，感谢@juanpa.arrivillaga的贡献。

import functools
from collections import namedtuple

@functools.lru_cache(maxsize=None)  # add memoization to increase speed
def _get_class(fieldnames, name):
    """Create a new namedtuple class."""
    return namedtuple(name, fieldnames)

def namedtuple_me(series, name='S'):
    """Convert the series to a namedtuple."""
    klass = _get_class(tuple(series.index), name)
    return klass._make(series)