Pandas：什么是dtype = <U64，如何将其转换为字符串？

Question

Pandas：什么是dtype = <U64，如何将其转换为字符串？

4

我有一个表格，其中一列从 CSV 文件中作为 np.str 加载。但 dtype 显示为奇怪的 U64（我猜是表示无符号整数 64 位？），使用 astype 进行转换不起作用。

stringIDs = extractedBatch.ID.astype(np.str)

在使用 astype 后，dtype 会变为 'object'。

- user1581390

1

'U64' 是一个包含 64 个 Unicode 字符的字符串。在 Py3 的 numpy 中，这是一个普通的字符串。你为什么觉得需要进行转换呢？astype(np.str) 和 astype(object) 是一样的。这种转换会生成一个包含 Python 字符串的对象 dtype 数组。 - hpaulj

当我将值添加到“real”字符串时，出现以下错误：ufunc'add'不包含与类型dtype('<U64') dtype('<U64') dtype('<U64')匹配的签名循环。 - user1581390

2个回答

0

Pandas不使用str dtype，它使用object（即使底层值是str）：

In [11]: s = pd.Series(['a'], dtype='U64')

In [12]: type(s[0])
Out[12]: str

- Andy Hayden

当我将对象添加到字符串中时，出现以下错误：ValueError: operation parameter must be str。 - user1581390

@user1581390 哦，我看错了，它是Unicode 64（即已经是字符串）。 - Andy Hayden

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- hpaulj · Accepted Answer

In [313]: arr = np.array(['one','twenty'])                                                               
In [314]: arr                                                                                            
Out[314]: array(['one', 'twenty'], dtype='<U6')
In [315]: arr.astype(object)                                                                             
Out[315]: array(['one', 'twenty'], dtype=object)

np.char 将字符串方法应用于字符串类型的数组元素：

最初的回答。

In [316]: np.char.add(arr, ' foo')                                                                       
Out[316]: array(['one foo', 'twenty foo'], dtype='<U10')

add对于numpy字符串类型未定义：

最初的回答

In [317]: np.add(arr, ' foo')                                                                            
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-317-eff87c160b77> in <module>
----> 1 np.add(arr, ' foo')

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U6') dtype('<U6') dtype('<U6')

这里使用np.add将字符串'foo'转换为数组，然后再进行操作。它试图将'U6'字符串添加到'U6'字符串。

当应用于对象dtype数组时，np.add将操作委托给元素的相应方法。由于Python字符串定义了add方法，因此它可以正常工作。

In [318]: np.add(arr.astype(object), ' foo')                                                             
Out[318]: array(['one foo', 'twenty foo'], dtype=object)

这个模式适用于所有的numpy ufunc。它们是针对特定的dtypes定义的。如果给定object类型的dtypes，它们将委托处理 - 这可能会起作用也可能不起作用，这取决于元素的方法。

无论是object还是np.char方法都做了类似于列表推导的操作，并且速度大约相同：

In [324]: [i+' foo' for i in arr]                                                                        
Out[324]: ['one foo', 'twenty foo']

使用字符串复制的示例*

In [319]: arr*2                                                                                          
TypeError: ufunc 'multiply' did not contain a loop with signature matching types dtype('<U6') dtype('<U6') dtype('<U6')

In [320]: arr.astype(object)*2                                                                           
Out[320]: array(['oneone', 'twentytwenty'], dtype=object)

In [322]: np.char.multiply(arr,2)                                                                        
Out[322]: array(['oneone', 'twentytwenty'], dtype='<U12')