如何强制Ruby的CSV输出中的一个字段用双引号括起来？

Question

如何强制Ruby的CSV输出中的一个字段用双引号括起来？

26

我正在使用Ruby内置的CSV生成一些CSV输出。一切正常，但客户希望输出中的名称字段具有包裹双引号的换行符，以便输出看起来像输入文件。例如，输入看起来像这样：

1,1.1.1.1,"Firstname Lastname",more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields

CSV文件的输出是正确的，看起来像这样：

1,1.1.1.1,Firstname Lastname,more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields

我知道CSV做得很好，不会因为第三个字段存在空格而进行双引号包装，并在第三个字段中有嵌入逗号时使用双引号进行包装。但是我希望能告诉CSV始终对第三个字段进行双引号包装，以让客户感到温馨舒适。

我尝试在我的to_a方法中将该字段用双引号包装，这样就会创建一个"Firstname Lastname"字段传递给CSV，但CSV嘲笑了我的微不足道的尝试并输出了"""Firstname Lastname"""。这是正确的操作，因为它对双引号进行了转义，所以这种方法不起作用。

然后我尝试在open方法中设置CSV的:force_quotes => true，这样就可以将所有字段都用双引号进行包装，但客户不喜欢那样做，我也预料到了。所以也没用。

我查看了Table和Row文档，没有找到任何可以让我访问“生成字符串字段”的方法，或者可以设置“对于第n个字段始终使用引号”标志的方式。

我即将深入源代码中查看是否有一些超级秘密的调整方法，或者是否有一种方法可以猴子补丁CSV并使其按照我的意愿工作，但想知道是否有人具有特殊知识或曾经遇到过这种情况。

是的，我知道我可以自己编写CSV输出，但我不喜欢重新发明轮子。此外，我也知道FasterCSV；它现在是Ruby 1.9.2的一部分，而我正在使用它，因此明确地使用FasterCSV对我来说没有任何特别之处。此外，我没有使用Rails，也没有重写它，所以除非你有一种使用Rails的小子集实现它的可爱方法，否则不要费心了。如果你没有读到这里就建议使用任何这些方式中的任何一种，我会给你投反对票。

- the Tin Man

这绝对是我找到的最简单的方法：https://superuser.com/a/318421/314320 - Eric Norcross

7个回答

7

这篇文章有些老旧，但我不敢相信没有人想到这个方法。

为什么不这样做：

csv = CSV.generate :quote_char => "\0" do |csv|

如果\0是一个空字符，那么只需要在需要的每个字段中添加引号：

csv << [product.upc, "\"" + product.name + "\"" # ...

然后在最后你可以执行一个

csv.gsub!(/\0/, '')

- Tom Grushka

3

除了其他答案已经建议这种解决方案，但是如果你正在处理大型CSV文件，并且必须缓冲或重新加载以修复黑客问题，则后处理CSV文件的输出非常痛苦。 CSV本身就是一种丑陋的格式。最好修复根本原因，即使这意味着触及或覆盖某些代码，因为这样流程才能正确运行。 - the Tin Man

然而，如果你是在事实多年后从谷歌过来的，并且不想与古板的CSV类（为什么他们不能只测试字符串长度是否小于等于1，而不是等于1？）作斗争，并且拥有的代码不幸已经改变了每个字段，那么这是一个可行的方法。 - Marc Bollinger

5

我不确定经过这么长时间后，这个方法是否能让客户有温暖舒适的感觉，但似乎它有效：

require 'csv'
#prepare a lambda which converts field with index 2 
quote_col2 = lambda do |field, fieldinfo|
  # fieldinfo has a line- ,header- and index-method
  if fieldinfo.index == 2 && !field.start_with?('"') then 
    '"' + field + '"'
  else
    field
  end
end

# specify above lambda as one of the converters
csv =  CSV.read("test1.csv", :converters => [quote_col2])
p csv 
# => [["aaa", "bbb", "\"ccc\"", "ddd"], ["fff", "ggg", "\"hhh\"", "iii"]]
File.open("test1.txt","w"){|out| csv.each{|line|out.puts line.join(",")}}

- steenslag

你不能将转换器应用于generate方法，对吗？原帖的问题是关于编写CSV文件的。 - jwadsack

@jwadsack 添加了一行代码，将数组写入文件。 - steenslag

但是如果字段中有逗号，那么这种方法行不通，对吧？ - jwadsack

5

CSV有一个force_quotes选项，它可以强制引用所有字段（当您最初发布此内容时可能不存在）。我知道这不完全是您所建议的，但这是较少进行猴子补丁的方法。

2.1.0 :008 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields']
1,1.1.1.1,Firstname Lastname,more,fields
2.1.0 :009 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields'], force_quotes: true
"1","1.1.1.1","Firstname Lastname","more","fields"

缺点是第一个整数值最终会被列为字符串，这会在导入Excel时改变事情。

- jwadsack

3

“...第一个整数值最终以字符串形式列出...”就像你所说的那样，这会将该值更改为字符串。此外，我需要一个特定的字段始终用引号括起来，而不是每个字段都用引号括起来。 - the Tin Man

2

虽然很久没有更新了，但自从CSV库被修补以来，如果有人现在遇到这个问题，这可能会有所帮助：

require 'csv'

# puts CSV::VERSION # this should be 3.1.9+

headers = ['id', 'ip', 'name', 'foo', 'bar']
data = [
[1, '1.1.1.1','Firstname Lastname','more','fields'],
[2, '2.2.2.2','Firstname Lastname, Jr.','more','fields']
]

quoter = Proc.new do |field, field_meta|
 # the index starts at zero, that's why the third field would be 2:
 field = '"' + field + '"' if field_meta.index == 2 && fields_meta.index > 1
 field = '"' + field + '"' if field.is_a?(String) && field.include?(',')
 # ^ CSV format needs to escape fields containing comma(s): ,
 field
end

file = CSV.generate(headers: true, quote_char: '', write_converters: quoter) do |csv|
    csv << headers
    data.each { |row| csv << row }
end

puts file

输出将是：

id,ip,name,foo,bar
1,1.1.1.1,"Firstname Lastname",more,fields
2,2.2.2.2,"Firstname Lastname, Jr.",more,fields

- Ali Ghanavatian

1

看起来除了猴子补丁/重写现有的CSV实现之外，似乎没有其他方法可以做到这一点。

但是，假设您对源数据具有完全控制权，则可以执行以下操作：

为每行在问题字段的末尾附加一个自定义字符串包括逗号（即在数据中永远不会自然找到的字符串），例如“FORCE_COMMAS，”。
生成CSV输出。
现在，您已经获得了带有每个字段每行引号的CSV输出，请删除自定义字符串：csv.gsub！(/FORCE_COMMAS，/，“”)
客户感觉温暖而舒适。

- Dylan Markow

一旦获得输出CSV，by_col！方法也可能有所帮助！ - Zabba

这段代码并不十分优雅，也没有遵循 Ruby 的编程方式。我想我需要再次深入研究源代码。 - the Tin Man

0

CSV 在 Ruby 2.1 中有所改变，正如 @jwadsack 所提到的，但这里是 @the-tin-man 的 MyCSV 的一个工作版本。稍微修改了一下，你可以通过选项设置 forced_quote_fields。

MyCSV.generate(forced_quote_fields: [1]) do |_csv|...

修改后的代码

require 'csv'

class MyCSV < CSV

  def <<(row)
    # make sure headers have been assigned
    if header_row? and [Array, String].include? @use_headers.class
      parse_headers  # won't read data for Array or String
      self << @headers if @write_headers
    end

    # handle CSV::Row objects and Hashes
    row = case row
          when self.class::Row then row.fields
          when Hash            then @headers.map { |header| row[header] }
          else                      row
          end

    @headers =  row if header_row?
    @lineno  += 1

    output = row.map.with_index(&@quote).join(@col_sep) + @row_sep  # quote and separate
    if @io.is_a?(StringIO)             and
       output.encoding != (encoding = raw_encoding)
      if @force_encoding
        output = output.encode(encoding)
      elsif (compatible_encoding = Encoding.compatible?(@io.string, output))
        @io.set_encoding(compatible_encoding)
        @io.seek(0, IO::SEEK_END)
      end
    end
    @io << output

    self  # for chaining
  end

  def init_separators(options)
    # store the selected separators
    @col_sep    = options.delete(:col_sep).to_s.encode(@encoding)
    @row_sep    = options.delete(:row_sep)  # encode after resolving :auto
    @quote_char = options.delete(:quote_char).to_s.encode(@encoding)
    @forced_quote_fields = options.delete(:forced_quote_fields) || []

    if @quote_char.length != 1
      raise ArgumentError, ":quote_char has to be a single character String"
    end

    #
    # automatically discover row separator when requested
    # (not fully encoding safe)
    #
    if @row_sep == :auto
      if [ARGF, STDIN, STDOUT, STDERR].include?(@io) or
         (defined?(Zlib) and @io.class == Zlib::GzipWriter)
        @row_sep = $INPUT_RECORD_SEPARATOR
      else
        begin
          #
          # remember where we were (pos() will raise an exception if @io is pipe
          # or not opened for reading)
          #
          saved_pos = @io.pos
          while @row_sep == :auto
            #
            # if we run out of data, it's probably a single line
            # (ensure will set default value)
            #
            break unless sample = @io.gets(nil, 1024)
            # extend sample if we're unsure of the line ending
            if sample.end_with? encode_str("\r")
              sample << (@io.gets(nil, 1) || "")
            end

            # try to find a standard separator
            if sample =~ encode_re("\r\n?|\n")
              @row_sep = $&
              break
            end
          end

          # tricky seek() clone to work around GzipReader's lack of seek()
          @io.rewind
          # reset back to the remembered position
          while saved_pos > 1024  # avoid loading a lot of data into memory
            @io.read(1024)
            saved_pos -= 1024
          end
          @io.read(saved_pos) if saved_pos.nonzero?
        rescue IOError         # not opened for reading
          # do nothing:  ensure will set default
        rescue NoMethodError   # Zlib::GzipWriter doesn't have some IO methods
          # do nothing:  ensure will set default
        rescue SystemCallError # pipe
          # do nothing:  ensure will set default
        ensure
          #
          # set default if we failed to detect
          # (stream not opened for reading, a pipe, or a single line of data)
          #
          @row_sep = $INPUT_RECORD_SEPARATOR if @row_sep == :auto
        end
      end
    end
    @row_sep = @row_sep.to_s.encode(@encoding)

    # establish quoting rules
    @force_quotes   = options.delete(:force_quotes)
    do_quote        = lambda do |field|
      field         = String(field)
      encoded_quote = @quote_char.encode(field.encoding)
      encoded_quote                                +
      field.gsub(encoded_quote, encoded_quote * 2) +
      encoded_quote
    end
    quotable_chars = encode_str("\r\n", @col_sep, @quote_char)

    @quote         = if @force_quotes
      do_quote
    else
      lambda do |field, index|
        if field.nil?  # represent +nil+ fields as empty unquoted fields
          ""
        else
          field = String(field)  # Stringify fields
          # represent empty fields as empty quoted fields
          if field.empty? or
             field.count(quotable_chars).nonzero? or
             @forced_quote_fields.include?(index)
            do_quote.call(field)
          else
            field  # unquoted field
          end
        end
      end
    end
  end
end

- kakoni

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- the Tin Man · Accepted Answer

嗯，有一种方法可以做到这一点，但它并不像我希望的CSV代码那样干净。

我必须对CSV进行子类化，然后覆盖CSV :: Row.<<=方法，并添加另一个方法forced_quote_fields=，以便定义我想要强制引用的字段，再从其他方法中提取两个lambda表达式。至少对于我想要的功能它是有效的：

require 'csv'

class MyCSV < CSV
    def <<(row)
      # make sure headers have been assigned
      if header_row? and [Array, String].include? @use_headers.class
        parse_headers  # won't read data for Array or String
        self << @headers if @write_headers
      end

      # handle CSV::Row objects and Hashes
      row = case row
        when self.class::Row then row.fields
        when Hash            then @headers.map { |header| row[header] }
        else                      row
      end

      @headers = row if header_row?
      @lineno  += 1

      @do_quote ||= lambda do |field|
        field         = String(field)
        encoded_quote = @quote_char.encode(field.encoding)
        encoded_quote                                +
        field.gsub(encoded_quote, encoded_quote * 2) +
        encoded_quote
      end

      @quotable_chars      ||= encode_str("\r\n", @col_sep, @quote_char)
      @forced_quote_fields ||= []

      @my_quote_lambda ||= lambda do |field, index|
        if field.nil?  # represent +nil+ fields as empty unquoted fields
          ""
        else
          field = String(field)  # Stringify fields
          # represent empty fields as empty quoted fields
          if (
            field.empty?                          or
            field.count(@quotable_chars).nonzero? or
            @forced_quote_fields.include?(index)
          )
            @do_quote.call(field)
          else
            field  # unquoted field
          end
        end
      end

      output = row.map.with_index(&@my_quote_lambda).join(@col_sep) + @row_sep  # quote and separate
      if (
        @io.is_a?(StringIO)             and
        output.encoding != raw_encoding and
        (compatible_encoding = Encoding.compatible?(@io.string, output))
      )
        @io = StringIO.new(@io.string.force_encoding(compatible_encoding))
        @io.seek(0, IO::SEEK_END)
      end
      @io << output

      self  # for chaining
    end
    alias_method :add_row, :<<
    alias_method :puts,    :<<

    def forced_quote_fields=(indexes=[])
      @forced_quote_fields = indexes
    end
end

那是代码。调用它：

data = [ 
  %w[1 2 3], 
  [ 2, 'two too',  3 ], 
  [ 3, 'two, too', 3 ] 
]

quote_fields = [1]

puts "Ruby version:   #{ RUBY_VERSION }"
puts "Quoting fields: #{ quote_fields.join(', ') }", "\n"

csv = MyCSV.generate do |_csv|
  _csv.forced_quote_fields = quote_fields
  data.each do |d| 
    _csv << d
  end
end

puts csv

结果为：

# >> Ruby version:   1.9.2
# >> Quoting fields: 1
# >> 
# >> 1,"2",3
# >> 2,"two too",3
# >> 3,"two, too",3