我想从慈善机构注册信息抽取数据中提取数据,并将其导出为Excel文件(.csv)。
这个错误让我感到震惊:
它给我留下了深刻的印象:错误:
这个人在Github上发布了他的代码,通过使用import.py文件来实现。
以下是我两种方法的操作:
第一种方法:
我从上面链接下载文件:RegPlusExtract_November_2015.zip,并将其放置在C:\Python27(我也安装了Python的地方)
我打开下面的代码文件(import.py)和之后的一个(bcp.py)在IDLE中运行import.py(使用F5)。我将这两个.py文件放在C:\Python27中。
#!/usr/bin/env python
import bcp
import zipfile
import sys
cc_files = {
"extract_acct_submit": [
"regno",
"submit_date",
"arno",
"fyend"
],
"extract_aoo_ref": [
"aootype",
"aookey",
"aooname",
"aoosort",
"welsh",
"master",
"code"
],
"extract_ar_submit": [
"regno",
"arno",
"submit_date"
],
"extract_charity": [
"regno",
"subno",
"name",
"orgtype",
"gd",
"aob",
"aob_defined",
"nhs",
"ha_no",
"corr",
"add1",
"add2",
"add3",
"add4",
"add5",
"postcode",
"phone",
"fax",
],
"extract_charity_aoo": [
"regno",
"aootype",
"aookey",
"welsh",
"master"
],
"extract_class": [
"regno",
"class"
],
"extract_class_ref": [
"classno",
"classtext",
],
"extract_financial": [
"regno",
"fystart",
"fyend",
"income",
"expend"
],
"extract_main_charity": [
"regno",
"coyno",
"trustees",
"fyend",
"welsh",
"incomedate",
"income",
"grouptype",
"email",
"web"
],
"extract_name": [
"regno",
"subno",
"nameno",
"name"
],
"extract_objects": [
"regno",
"subno",
"seqno",
"object"
],
"extract_partb": [
"regno",
"artype",
"fystart",
"fyend",
"inc_leg",
"inc_end",
"inc_vol",
"inc_fr",
"inc_char",
"inc_invest",
"inc_other",
"inc_total",
"invest_gain",
"asset_gain",
"pension_gain",
"exp_vol",
"exp_trade",
"exp_invest",
"exp_grant",
"exp_charble",
"exp_gov",
"exp_other",
"exp_total",
"exp_support",
"exp_dep",
"reserves",
"asset_open",
"asset_close",
"fixed_assets",
"open_assets",
"invest_assets",
"cash_assets",
"current_assets",
"credit_1",
"credit_long",
"pension_assets",
"total_assets",
"funds_end",
"funds_restrict",
"funds_unrestrict",
"funds_total",
"employees",
"volunteers",
"cons_acc",
"charity_acc"
],
"extract_registration": [
"regno",
"subno",
"regdate",
"remdate",
"remcode"
],
"extract_remove_ref": [
"code",
"text"
],
"extract_trustee": [
"regno",
"trustee"
]
}
def import_zip(zip_file):
zf = zipfile.ZipFile(zip_file, 'r')
print 'Opened zip file: %s' % zip_file
for filename in cc_files:
try:
bcp_filename = filename + '.bcp'
csv_filename = filename + '.csv'
bcpdata = zf.read(bcp_filename)
bcp.convert(bcpdata, csvfilename=csv_filename, col_headers=cc_files[filename])
print 'Converted: %s' % bcp_filename
except KeyError:
print 'ERROR: Did not find %s in zip file' % bcp_filename
def main():
zip_file = sys.argv[1]
import_zip(zip_file)
if __name__ == '__main__':
main()
#!/usr/bin/env python
import sys
import csv
def convert(bcpdata, csvfilename="", lineterminator='*@@*', delimiter='@**@', quote='"', newdelimiter=',', col_headers=None, escapechar='\\', newline='\n'):
bcpdata = bcpdata.replace(escapechar, escapechar + escapechar)
bcpdata = bcpdata.replace(quote, escapechar + quote)
bcpdata = bcpdata.replace(delimiter, quote + newdelimiter + quote)
bcpdata = bcpdata.replace(lineterminator, quote + newline + quote)
if csvfilename=="":
csvfilename = 'converted.csv'
with open(csvfilename, 'wb') as csvfile:
if(col_headers):
writer = csv.writer(csvfile)
writer.writerow(col_headers)
csvfile.write('"')
csvfile.write(bcpdata)
csvfile.write('"')
def main():
bcp_filename = sys.argv[1]
try:
csv_filename = sys.argv[2]
except IndexError:
csv_filename = bcp_filename.replace('.bcp', '.csv')
with open(bcp_filename, 'rb') as bcpfile:
bcpdata = bcpfile.read()
convert(bcpdata, csv_filename)
if __name__ == '__main__':
main()
这个错误让我感到震惊:
>>> ================================ RESTART ================================
>>>
Traceback (most recent call last):
File "C:\Python27\bcp.py", line 31, in <module>
main()
File "C:\Python27\bcp.py", line 21, in main
bcp_filename = sys.argv[1]
IndexError: list index out of range
>>> ================================ RESTART ================================
>>>
Traceback (most recent call last):
File "C:\Python27\import.py", line 175, in <module>
main()
File "C:\Python27\import.py", line 171, in main
zip_file = sys.argv[1]
IndexError: list index out of range
>>>
请问有谁能指出哪里出了问题吗?
第二种方法:
然后我尝试在Windows中使用命令提示符来运行该文件: 首先我将路径设置为保存所有文件的位置(C:\python27) 然后在命令提示符中运行
python import RegPlusExtract_November_2015.zip
它给我留下了深刻的印象:错误:
File"<stdin>", line 1
python import RegPlusExtract_November_2015.zip
请问有人能指出我哪里错了,或者向我展示如何从上面的数据链接中提取csv文件吗?
python import.py RegPlusExtract_February_2015.zip
。 - Hugh Bothwell