“csv”模块似乎建议使用csv sniffer解决此问题。
他们给出了以下示例,我已根据您的情况进行了适应。
with open('example.csv', 'rb') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024), delimiters=";,")
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
让我们试试看。
[9:13am][wlynch@watermelon /tmp] cat example
import csv
def parse(filename):
with open(filename, 'rb') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';,')
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
for line in reader:
print line
def main():
print 'Comma Version:'
parse('comma_separated.csv')
print
print 'Semicolon Version:'
parse('semicolon_separated.csv')
print
print 'An example from the question (kingdom.csv)'
parse('kingdom.csv')
if __name__ == '__main__':
main()
以及我们的样本输入
[9:13am][wlynch@watermelon /tmp] cat comma_separated.csv
test,box,foo
round,the,bend
[9:13am][wlynch@watermelon /tmp] cat semicolon_separated.csv
round;the;bend
who;are;you
[9:22am][wlynch@watermelon /tmp] cat kingdom.csv
ReleveAnnee;ReleveMois;NoOrdre;TitreRMC;AdopCSRegleVote;AdopCSAbs;AdoptCSContre;NoCELEX;ProposAnnee;ProposChrono;ProposOrigine;NoUniqueAnnee;NoUniqueType;NoUniqueChrono;PropoSplittee;Suite2LecturePE;Council PATH;Notes
1999;1;1;1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC;U;;;31999D0083;1998;577;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
1999;1;2;1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes;U;;;31999D0081;1998;184;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
如果我们执行示例程序:
[9:14am][wlynch@watermelon /tmp] ./example
Comma Version:
['test', 'box', 'foo']
['round', 'the', 'bend']
Semicolon Version:
['round', 'the', 'bend']
['who', 'are', 'you']
An example from the question (kingdom.csv)
['ReleveAnnee', 'ReleveMois', 'NoOrdre', 'TitreRMC', 'AdopCSRegleVote', 'AdopCSAbs', 'AdoptCSContre', 'NoCELEX', 'ProposAnnee', 'ProposChrono', 'ProposOrigine', 'NoUniqueAnnee', 'NoUniqueType', 'NoUniqueChrono', 'PropoSplittee', 'Suite2LecturePE', 'Council PATH', 'Notes']
['1999', '1', '1', '1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC', 'U', '', '', '31999D0083', '1998', '577', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document']
['1999', '1', '2', '1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes', 'U', '', '', '31999D0081', '1998', '184', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document']
值得一提的是,我正在使用的Python版本。
[9:20am][wlynch@watermelon /tmp] python -V
Python 2.7.2