我需要对一份庞大的CSV文件进行裁剪,以便用于机器学习。我已经找到了将该文件分解为我所需的两行数据的方法,但是我遇到了一个问题。
基本上我的文件结构如下。
基本上我的文件结构如下。
"David", "Red"
"David", "Ford"
"David", "Blue"
"David", "Aspergers"
"Steve", "Red"
"Steve", "Vauxhall"
而我需要数据看起来更像这样...
"David, "Red", "Ford", "Blue", "Aspergers"
"Steve", "Red", "Vaxhaull"
我目前有这个来剥离CSV文件
import csv
cr = csv.reader(open("traits.csv","rb"), delimiter=',', lineterminator='\n')
cr.next() #skipping header line, no point in removing it as I need to standardise data manipuation.
# Print out the id of species and trait values
print 'Stripping input'
vals = [(row[1], row[4]) for row in cr]
print str(vals) + '\n'
with open("output.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(vals)
print 'Sucessfully written to file output.csv'
#for row in cr:
#print row
AttributeError: 'list' object has no attribute 'setdefault'
- KeironOd
的列表(并在d={}
之后定义它),我更改了它的名称!请尝试编辑后的答案! - Mazdak"David", "Orange", "Purple", "Red"
我需要忽略第二个和第三个,同时也要按照我在原帖中所说的去做。 - KeironO