使用布尔序列将Pandas数据帧转换为结构化数组

Question

使用布尔序列将Pandas数据帧转换为结构化数组

3

我有一个Pandas数据框，希望将其转换为NumPy记录数组或结构化数组。我使用的是Python 3.6 / Pandas 0.19.2 / NumPy 1.11.3。

df = pd.DataFrame(data=[[True, 1, 2],[False, 10, 20]], columns=['a','b','c'])

print(df.dtypes)

a     bool
b    int64
c    int64
dtype: object

我的尝试如下：

# record array
res1 = df.to_records(index=False)

# structured array
s = df.dtypes
res2 = np.array([tuple(x) for x in df.values], dtype=list(zip(s.index, s)))

然而，在这些结果的 dtype 属性中，布尔类型似乎并不明显：

print(res1.dtype)

(numpy.record, [('a', '?'), ('b', '<i8'), ('c', '<i8')])

print(res2.dtype)

[('a', '?'), ('b', '<i8'), ('c', '<i8')]

为什么会这样？更普遍地说，这是唯一的例外吗，还是每次都需要手动检查以确保 dtype 转换已按预期进行处理？

编辑：另一方面，看起来转换是正确的：

print(res1.a.dtype)     # bool
print(res2['a'].dtype)  # bool

这只是一个显示问题吗？

- jpp

2

'?' 是 numpy 中的布尔类型 https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html。我不理解。 - bobrobbob

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jpp · Accepted Answer

有趣的是，NumPy选择使用?表示布尔值。来自数据类型对象（dtype）：

'?' boolean
'b' (signed) byte
'B' unsigned byte
'i' (signed) integer
'u' unsigned integer
'f' floating-point
'c' complex-floating point
'm' timedelta
'M' datetime
'O' (Python) objects
'S', 'a'    zero-terminated bytes (not recommended)
'U' Unicode string
'V' raw data (void)

令人困惑的是，用于从C扩展访问的NumPy 数组接口使用不同的映射：

t   Bit field (following integer gives the number of bits in the bit field).
b   Boolean (integer type where all values are only True or False)
i   Integer
u   Unsigned integer
f   Floating point
c   Complex floating point
m   Timedelta
M   Datetime
O   Object (i.e. the memory contains a pointer to PyObject)
S   String (fixed-length sequence of char)
U   Unicode (fixed-length sequence of Py_UNICODE)
V   Other (void * – each item is a fixed-size chunk of memory)

感谢@bobrobbob在文档中找到此内容。