Python numpy:将字符串转换为numpy数组

5

我有以下字符串:

v1fColor = '2,4,14,5,0,0,0,0,0,0,0,0,0,0,12,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,6,0,0,0,0,1,0,0,0,0,0,0,0,0,0,20,9,0,0,0,2,2,0,0,0,0,0,0,0,0,0,13,6,0,0,0,1,0,0,0,0,0,0,0,0,0,0,10,8,0,0,0,1,2,0,0,0,0,0,0,0,0,0,17,17,0,0,0,3,6,0,0,0,0,0,0,0,0,0,7,5,0,0,0,2,0,0,0,0,0,0,0,0,0,0,4,3,0,0,0,1,1,0,0,0,0,0,0,0,0,0,6,6,0,0,0,2,3'

我把它当作一个向量处理:简而言之,它是图像直方图的前景色:

我有以下lambda函数来计算两个图像的余弦相似度,所以我尝试将其转换为numpy.array,但失败了:

这是我的lambda函数:

import numpy as NP
import numpy.linalg as LA
cx = lambda a, b : round(NP.inner(a, b)/(LA.norm(a)*LA.norm(b)), 3)

因此,我尝试了以下方法将此字符串转换为NumPy数组:

v1fColor = NP.array([float(v1fColor)], dtype=NP.uint8)

但是我最终遇到了以下错误:
    v1fColor = NP.array([float(v1fColor)], dtype=NP.uint8)
ValueError: invalid literal for float(): 2,4,14,5,0,0,0,0,0,0,0,0,0,0,12,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,6,0,0,0,0,1,0,0,0,0,0,0,0,0,0,20,9,0,0,0,2,2,0,0,0,0,0,0,0,0,0,13,6,0,0,0,1,0,0,0,0,0,0,0,0,0,0,10,8,0,0,0,1,2,0,0,0,0,0,0,0,0,0,17,17,
4个回答

11
您必须首先按逗号拆分字符串:
NP.array(v1fColor.split(","), dtype=NP.uint8)

4
我几周前才了解到numpy可以隐式地进行字符串转换,我认为这是最酷的事情。 - mgilson

7

您可以使用 numpy.fromstring 来实现此操作,而无需使用 Python 字符串方法:

>>> numpy.fromstring(v1fColor, dtype='uint8', sep=',')
array([ 2,  4, 14,  5,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 12,  4,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 15,  6,  0,  0,
        0,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0, 20,  9,  0,  0,  0,
        2,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0, 13,  6,  0,  0,  0,  1,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 10,  8,  0,  0,  0,  1,  2,
        0,  0,  0,  0,  0,  0,  0,  0,  0, 17, 17,  0,  0,  0,  3,  6,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  7,  5,  0,  0,  0,  2,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  4,  3,  0,  0,  0,  1,  1,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  6,  6,  0,  0,  0,  2,  3], dtype=uint8)

我认为这是解决问题的方法。(今天我通过另一个问题才学会了fromstring) - mgilson

6
你可以这样做:
lst = v1fColor.split(',')  #create a list of strings, splitting on the commas.
v1fColor = NP.array( lst, dtype=NP.uint8 ) #numpy converts the strings.  Nifty!

更简洁地说:
v1fColor = NP.array( v1fColor.split(','), dtype=NP.uint8 )

请注意,更习惯的做法是这样的:

import numpy as np

import numpy as NP相比

编辑

今天我刚学到一个名为numpy.fromstring的函数,它也可以用来解决这个问题:

NP.fromstring( "1,2,3" , sep="," , dtype=NP.uint8 )

1
我经常听到这句话。选择@mgilson,他知道将隐式转换为浮点数。 - David Robinson
@mgilson非常清楚被接受的答案不计入200声望限制。他试图欺骗你。 - David Robinson
1
@DavidRobinson 叹气 你本应该只是顺着它走...不过还是谢谢。 - mgilson
@mgilson,我得到了一些奇怪的值,请更新你之前回答中包含float或long的版本,我想那是你最初的答案,在David指出这个版本之前。谢谢。 - add-semi-colons
1
@Null-Hypothesis -- 怎么了?看着编辑历史,我并没有改变任何内容。DavidRobinson发布的版本大概是np.array(map(int,v1fColor.split(','))),但那应该等同于我发布的版本。你可以随时将dtype从np.uint8更改为intnp.int32...(np.uint8应该限制在0-255范围内)。 - mgilson
显示剩余8条评论

0

我写这篇答案是为了以后参考:我不确定在这种情况下什么是正确的解决方案,但我认为@David Robinson最初发布的答案是正确的,因为一个原因:余弦相似度值不能大于1,当我使用NP.array(v1fColor.split(","), dtype=NP.uint8)选项时,我得到奇怪的值,这些值超过了两个向量之间的余弦相似度的1.0。

所以我写了一个简单的示例代码来尝试:

import numpy as np
import numpy.linalg as LA

def testFunction():
    value1 = '2,3,0,80,125,15,5,0,0,0,0,0,0,0,0,0,0,0,0,0,2,4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'
    value2 = '2,137,0,4,96,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'
    cx = lambda a, b : round(np.inner(a, b)/(LA.norm(a)*LA.norm(b)), 3)
    #v1fColor = np.array(map(int,value1.split(',')))
    #v2fColor =  np.array(map(int,value2.split(',')))
    v1fColor = np.array( value1.split(','), dtype=np.uint8 )
    v2fColor = np.array( value2.split(','), dtype=np.uint8 )
    print v1fColor
    print v2fColor
    cosineValue = cx(v1fColor, v2fColor)
    print cosineValue

if __name__ == '__main__':
    testFunction()

如果您运行此代码,应该会得到以下输出:enter image description here

现在让我们取消注释这两行并使用 David 的初始解决方案运行代码:
v1fColor = np.array(map(int,value1.split(',')))
v2fColor =  np.array(map(int,value2.split(','))) 

请注意,如上所示,余弦相似度值超过了1.0,但是当我们使用map函数并进行int转换时,我们得到了以下正确的值:

enter image description here

幸运的是,我正在绘制最初得到的值,并且一些余弦值超过了1.0,我将这些向量的输出手动输入Python控制台,并通过我的Lambda函数发送它,得到了正确的答案,所以我非常困惑。然后我编写了测试脚本来查看发生了什么,并很高兴发现了这个问题。我不是Python专家,无法准确地说明两种方法为什么会给出两个不同的答案。但我将这留给@David Robinson或@mgilson。


1
快速评论。在“David”解决方案中,做类似于np.array(map(int,v1fColor.split(',')),dtype=np.uint8))的事情,我猜你会得到与我的解决方案相同的结果。我猜问题是数据类型(我们只是从您的原始问题中保留)。可能发生的情况是,当您对两个数组进行内积时,您会得到大于255的数字--例如(2*128)可能会导致0。 - mgilson

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接