在Python中将来自CSV的混合数据上传到NumPy数组

Question

在Python中将来自CSV的混合数据上传到NumPy数组

4

我有一个包含10列和6行的csv文件，我想将其转换为numpy数组。虽然它已经被加载，但我现在无法使用数据，我认为我缺少了一步。

我当前的代码如下：

import numpy as np

filename = "test_ioc.csv"

# open file
f=open(filename)

# initialize this 
myfile = [] 

# Convert to numpy array
mat = np.vstack([signal for signal in f.readlines()])
print mat

另外，我也做了这个：

import numpy as np

filename = "test_ioc.csv"

# open file
f=open(filename)

# initialize this 
myfile = [] # empty nested list that is the big container, this contains all the rows

# f.readlines, and for each line,
for line in f.readlines():

#create a list for each row
row = [] # empty list for row items, each row has 2 lists

# line.strip, line.split, and for each i in this:
for eye in line.strip().split():

    # convert elements into floats

    row.append(eye) # append each item to list 'row'


# append all the parts row to the list myfile that you created
myfile.append(row) #append list to my file

print myfile

# now that you have your gigantic list myfile, convert to it to numpy array
a = np.array(myfile) #convert the list into a numpy array

# slice accordingly!
x = a[:,0] #first column
y = a[:,1] #second column
f.close()

第一个给我输出的结果如下：

print a
    [['2043l0.wav,0.115,0.169,0.222,0.23,2043l0.wav,0.21,0.169,0.238,0.23']
 [ 'dn2001l0.wav,0.105,0.161,0.242,0.222,dn2001l0.wav,0.153,0.176,0.207,0.207']
 ['2694l0.wav,0.13,0.192,0.33,0.314,2694l0.wav,0.192,0.184,0.207,0.238']
 ['2641l0.wav,0.123,0.146,0,0.407,2641l0.wav,0.199,0.199,0.199,0.176']
 ['2622l0.wav,0.284,0.353,0.582,0.582,2622l0.wav,0.268,0.161,0.176,0.184']
 ['dn2047l0.wav,0.12,0.23,0.368,0.322,dn2047l0.wav,0.369,0.169,0.207,0.222']]

我需要将我的行进一步分成两组4个，将每行中的每个数字转换为浮点数，但我对Python还很陌生，只想能够对我的数据进行一些基本操作，并使用Matplotlib绘制图表。感谢您的帮助！

- Diana

2个回答

2

好的，我用Pandas解决了我的问题。

import pandas as pd
filename = 'test_ioc.csv'
headings = 'filename', 'l0', 'l1', 'l2', 'l3', 'filename', 'r0', 'r1', 'r2', 'r3'

#data
data = pd.read_csv(filename, names=headings)

print data

- Diana

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Padraic Cunningham · Accepted Answer

问题在于你读取整行而没有按逗号分隔数据，因此每个数组都以一行结束。你需要使用分隔符进行分割，将其分割成单独的元素：

mat = np.vstack([signal.split(",") for signal in f)])

或者让csv库来解析：

import  csv
mat = np.vstack(csv.reader(f))))

但是，有一种使用np.loadtxt的numpy方法可以从文件中读取数据：

import  numpy as np

arr = np.loadtxt("in.csv",delimiter=",",dtype=object)

print(arr)

这将为您提供一个数组的数组：

[['2043l0.wav' '0.115' '0.169' '0.222' '0.23' '2043l0.wav' '0.21' '0.169'
  '0.238' '0.23']
 ['dn2001l0.wav' '0.105' '0.161' '0.242' '0.222' 'dn2001l0.wav' '0.153'
  '0.176' '0.207' '0.207']
 ['2694l0.wav' '0.13' '0.192' '0.33' '0.314' '2694l0.wav' '0.192' '0.184'
  '0.207' '0.238']
 ['2641l0.wav' '0.123' '0.146' '0' '0.407' '2641l0.wav' '0.199' '0.199'
  '0.199' '0.176']
 ['2622l0.wav' '0.284' '0.353' '0.582' '0.582' '2622l0.wav' '0.268' '0.161'
  '0.176' '0.184']
 ['dn2047l0.wav' '0.12' '0.23' '0.368' '0.322' 'dn2047l0.wav' '0.369'
  '0.169' '0.207' '0.222']]

还有genfromtxt，它提供了更多选项，包括创建结构化数组。

import numpy as np

headings = [('filename1', "|S20"), ('l0', float), ('l1', float), ('l2', float), ('l3', float),
            ('filename2', "|S10"), ('r0', float), ('r1', float), ('r2', float), ('r3', float)]

arr = np.genfromtxt("in.csv", delimiter=",", dtype=headings)

print(arr)
[ ('2043l0.wav', 0.115, 0.169, 0.222, 0.23, '2043l0.wav', 0.21, 0.169, 
0.238, 0.23)
 ('dn2001l0.wav', 0.105, 0.161, 0.242, 0.222, 'dn2001l0.w', 0.153, 0.176, 0.207, 0.207)
 ('2694l0.wav', 0.13, 0.192, 0.33, 0.314, '2694l0.wav', 0.192, 0.184, 0.207, 0.238)
 ('2641l0.wav', 0.123, 0.146, 0.0, 0.407, '2641l0.wav', 0.199, 0.199, 0.199, 0.176)
 ('2622l0.wav', 0.284, 0.353, 0.582, 0.582, '2622l0.wav', 0.268, 0.161, 0.176, 0.184)
 ('dn2047l0.wav', 0.12, 0.23, 0.368, 0.322, 'dn2047l0.w', 0.369, 0.169, 0.207, 0.222)]

您可以像Pandas一样按列名查找，例如arr["filename1"]等。