将BCP文件导出为CSV格式

3
我想从慈善机构注册信息抽取数据中提取数据,并将其导出为Excel文件(.csv)。

这个人在Github上发布了他的代码,通过使用import.py文件来实现。

以下是我两种方法的操作:

第一种方法:

  1. 我从上面链接下载文件:RegPlusExtract_November_2015.zip,并将其放置在C:\Python27(我也安装了Python的地方)

  2. 我打开下面的代码文件(import.py)和之后的一个(bcp.py)在IDLE中运行import.py(使用F5)。我将这两个.py文件放在C:\Python27中。

#!/usr/bin/env python
import bcp
import zipfile
import sys

cc_files = {
    "extract_acct_submit": [
      "regno",
      "submit_date",
      "arno",
      "fyend"
    ], 
    "extract_aoo_ref": [
      "aootype",
      "aookey",
      "aooname",
      "aoosort",
      "welsh",
      "master",
      "code"
    ], 
    "extract_ar_submit": [
      "regno",
      "arno",
      "submit_date"
    ], 
    "extract_charity": [
      "regno",
      "subno",
      "name",
      "orgtype",
      "gd",
      "aob",
      "aob_defined",
      "nhs",
      "ha_no",
      "corr",
      "add1",
      "add2",
      "add3",
      "add4",
      "add5",
      "postcode",
      "phone",
      "fax",
    ], 
    "extract_charity_aoo": [
      "regno",
      "aootype",
      "aookey",
      "welsh",
      "master"
    ], 
    "extract_class": [
      "regno",
      "class"
    ], 
    "extract_class_ref": [
      "classno",
      "classtext",
    ], 
    "extract_financial": [
      "regno",
      "fystart",
      "fyend",
      "income",
      "expend"
    ], 
    "extract_main_charity": [
      "regno",
      "coyno",
      "trustees",
      "fyend",
      "welsh",
      "incomedate",
      "income",
      "grouptype",
      "email",
      "web"
    ], 
    "extract_name": [
      "regno",
      "subno",
      "nameno",
      "name"
    ], 
    "extract_objects": [
      "regno",
      "subno",
      "seqno",
      "object"
    ], 
    "extract_partb": [
      "regno",
      "artype",
      "fystart",
      "fyend",
      "inc_leg",
      "inc_end",
      "inc_vol",
      "inc_fr",
      "inc_char",
      "inc_invest",
      "inc_other",
      "inc_total",
      "invest_gain",
      "asset_gain",
      "pension_gain",
      "exp_vol",
      "exp_trade",
      "exp_invest",
      "exp_grant",
      "exp_charble",
      "exp_gov",
      "exp_other",
      "exp_total",
      "exp_support",
      "exp_dep",
      "reserves",
      "asset_open",
      "asset_close",
      "fixed_assets",
      "open_assets",
      "invest_assets",
      "cash_assets",
      "current_assets",
      "credit_1",
      "credit_long",
      "pension_assets",
      "total_assets",
      "funds_end",
      "funds_restrict",
      "funds_unrestrict",
      "funds_total",
      "employees",
      "volunteers",
      "cons_acc",
      "charity_acc"
    ], 
    "extract_registration": [
      "regno",
      "subno",
      "regdate",
      "remdate",
      "remcode"
    ], 
    "extract_remove_ref": [
      "code",
      "text"
    ], 
    "extract_trustee": [
      "regno",
      "trustee"
    ]
}

def import_zip(zip_file):
    zf = zipfile.ZipFile(zip_file, 'r')
    print 'Opened zip file: %s' % zip_file
    for filename in cc_files:
        try:
            bcp_filename = filename + '.bcp'
            csv_filename = filename + '.csv'
            bcpdata = zf.read(bcp_filename)
            bcp.convert(bcpdata, csvfilename=csv_filename, col_headers=cc_files[filename])
            print 'Converted: %s' % bcp_filename
        except KeyError:
            print 'ERROR: Did not find %s in zip file' % bcp_filename

def main():
    zip_file = sys.argv[1]
    import_zip(zip_file)

if __name__ == '__main__':
    main()

#!/usr/bin/env python
import sys
import csv

def convert(bcpdata, csvfilename="", lineterminator='*@@*', delimiter='@**@', quote='"', newdelimiter=',', col_headers=None, escapechar='\\', newline='\n'):
    bcpdata = bcpdata.replace(escapechar, escapechar + escapechar)
    bcpdata = bcpdata.replace(quote, escapechar + quote)
    bcpdata = bcpdata.replace(delimiter, quote + newdelimiter + quote)
    bcpdata = bcpdata.replace(lineterminator, quote + newline + quote)
    if csvfilename=="":
        csvfilename = 'converted.csv'
    with open(csvfilename, 'wb') as csvfile:
        if(col_headers):
            writer = csv.writer(csvfile)
            writer.writerow(col_headers)
        csvfile.write('"')
        csvfile.write(bcpdata)
        csvfile.write('"')

def main():
    bcp_filename = sys.argv[1]
    try:
        csv_filename = sys.argv[2]
    except IndexError:
        csv_filename = bcp_filename.replace('.bcp', '.csv')
    with open(bcp_filename, 'rb') as bcpfile:
        bcpdata = bcpfile.read()
        convert(bcpdata, csv_filename)

if __name__ == '__main__':
    main()

这个错误让我感到震惊:
>>> ================================ RESTART ================================
>>> 

Traceback (most recent call last):
  File "C:\Python27\bcp.py", line 31, in <module>
    main()
  File "C:\Python27\bcp.py", line 21, in main
    bcp_filename = sys.argv[1]
IndexError: list index out of range
>>> ================================ RESTART ================================
>>> 

Traceback (most recent call last):
  File "C:\Python27\import.py", line 175, in <module>
    main()
  File "C:\Python27\import.py", line 171, in main
    zip_file = sys.argv[1]
IndexError: list index out of range
>>> 

请问有谁能指出哪里出了问题吗?

第二种方法:

然后我尝试在Windows中使用命令提示符来运行该文件: 首先我将路径设置为保存所有文件的位置(C:\python27) 然后在命令提示符中运行

python import RegPlusExtract_November_2015.zip

它给我留下了深刻的印象:错误:
 File"<stdin>", line 1

python import RegPlusExtract_November_2015.zip

请问有人能指出我哪里错了,或者向我展示如何从上面的数据链接中提取csv文件吗?


1
它看起来需要文件名作为命令行参数,但你没有提供。尝试使用 python import.py RegPlusExtract_February_2015.zip - Hugh Bothwell
非常感谢,它给了我:文件“<stdin>”。第1行 语法错误:无效的语法 - Khan
嗨@pnuts:你能解释得更详细一些吗?我不明白这里的回滚是什么意思。 - Khan
嗨,非常感谢。是的,我指的是CSV文件。是的。谢谢! - Khan
请您说明输入文件(bcp格式)的名称,并展示如何启动脚本。如果没有这些信息,我们将无法帮助您。 - Serge Ballesta
嗨@SergeBallesta,正如您建议的那样,我已编辑我的帖子以包含更多信息。非常感谢; - Khan
1个回答

0

当您从IDLE运行脚本时,无法在sys.argv中传递参数。因此,在这种情况下,错误是正常的。但是在F5显示错误后,您应该能够直接调用:

zip_file = 'RegPlusExtract_November_2015.zip'
import_zip(zip_file)

它应该允许您处理数据。

对于命令提示符中的第二种方式,您必须提供脚本文件的确切名称。命令应该是:

python import.py RegPlusExtract_November_2015.zip

无论如何,在Python目录中放置自己的脚本和其他数据文件都是不好的做法。 C:\ Python27 应仅包含来自初始Python发行版和其他常规实用程序的文件,而不是您的本地脚本。 通常的方法是将c:\ Python 添加到您的PATH环境中,并使用专用目录进行数据慈善处理


你好,非常感谢。我可能听起来很傻,但是“直接调用”是什么意思呢?我应该将它添加到导入文件还是在 ISLE 中输入它?我尝试了两种方法都出现了这个错误:NameError: name 'import_zip' is not defined。 - Khan
然而,第二种方法可行。该脚本仅适用于Python 2.7(我也安装了3.5,并已卸载)。在命令中,我也加入了路径(C:\ python27 \ import / py ...),最终它可以工作了,谢谢! - Khan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接