我有一个名为file.csv的.CSV文件,其中所有数据都用双引号括起来。文件的示例格式如下:
column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","13-OCT-11","232"
第9个字段是日期字段,格式为"DD-MMM-YY"。我需要将其转换为YYYY/MM/DD格式。我尝试使用下面的代码,但没有用。
awk -F, '
BEGIN {
split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
for (i=1; i<=12; i++) mdigit[month[i]]=i
}
{ m=substr($9,4,3)
$9 = sprintf("%02d/%02d/"20"%02d",mdigit[m],substr($9,1,2),substr($9,8,20))
print
}' OFS="," file.csv > temp_file.csv
执行上述代码后,temp_file.csv文件的输出如下所示。
column1,column2,column3,column4,column5,column6,column7,Column8,00/00/2000,Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1,00/00/2000,"890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455",00/00/2002, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3,00/00/2000,"333","22","13-OCT-11","232"
根据我的理解,问题在于双引号中的逗号,因为我的代码也将其考虑在内... 请回答以下问题:
1)将所有字段的值都用双引号括起来是否有任何区别?如果有区别,如何除去其中除包含逗号的字符串之外的所有值中的双引号? 2)是否需要修改我的代码以便将第9个字段的格式从"DD-MMM-YYYY"转换为YYYY/MM/DD?
NF-1
- kevcsvfix
。它内置了日期操作函数。 - Jonathan LefflerNF-1
**?我对linux
和awk
都很陌生。 @JonathanLeffler 我尝试了man csvfix
,但没有任何手册页面... 请详细说明如何让它工作... - Dhruuv