如何在Python中漂亮地打印CSV文件

20

如何使用Python漂亮地打印CSV文件,不使用任何外部工具?

例如,我有这个CSV文件:

title1|title2|title3|title4
datalongdata|datalongdata|data|data

data|data|data|datalongdatadatalongdatadatalongdatadatalongdatadatalongdata
data|data'data|dat

我希望将其视觉上转化为类似表格的样式。例如,类似于这样的东西:

+ --------------------------------------------------------------------------------------------------- +
| title1       | title2       | title3 | title4                                                       |
+ --------------------------------------------------------------------------------------------------- +
| datalongdata | datalongdata | data   | data                                                         |
|              |              |        |                                                              |
| data         | data         | data   | datalongdatadatalongdatadatalongdatadatalongdatadatalongdata |
| data         | data'data    | dat    |                                                              |
+ --------------------------------------------------------------------------------------------------- +

注意:我在另一篇帖子的评论中看到了一个用户的问题,他问到了与Notepad ++相关的问题。我以回答自己的问题的精神提出了这个问题。我希望这也能引起其他人的兴趣! - psxls
2个回答

23

用法:

pretty.pretty_file(文件名, ***选项*)

读取CSV文件,并将数据以表格的形式可视化地打印到新文件中。 文件名 是给定的CSV文件。可选的 ***选项* 关键字参数是Python标准库csv模块Dialects and Formatting Parameters和以下列表的并集:

  • new_delimiter: 新的列分隔符(默认值为“ | ”)
  • border: 布尔值,如果要打印表格边框,则为True(默认值为True)
  • border_vertical_left: 表格左边框(默认值为“| ”)
  • border_vertical_right: 表格右边框(默认值为“ |”)
  • border_horizontal: 表格上下边框(默认值为“-”)
  • border_corner_tl: 表格左上角(默认值为“+ ”)
  • border_corner_tr: 表格右上角(默认值为“ +”)
  • border_corner_bl: 表格左下角(默认值border_corner_tl相同)
  • border_corner_br: 表格右下角(默认值border_corner_tr相同)
  • header: 布尔值,如果第一行是表头,则为True(默认值为True)
  • border_header_separator: 表头和表格之间的边框(默认值border_horizontal相同)
  • border_header_left: 表头左边框(默认值border_corner_tl相同)
  • border_header_right: 表头右边框(默认值border_corner_tr相同)
  • new_filename: 新文件的文件名(默认值为“new_”+ filename
  • newline: 定义表格行之间的分隔符(默认值为“\n”)

例子:

import pretty_csv
pretty_csv.pretty_file("test.csv", header=False, border=False, delimiter="|")

Python 3:

这是一个Python 2的实现。对于Python 3,您需要将行open(filename, "rb") as input:的2个出现更改为open(filename, "r", newline="") as input:,因为Python 3中的csv.reader希望文件以文本模式打开。

模块:

import csv
import os

def pretty_file(filename, **options):
    """
    @summary:
        Reads a CSV file and prints visually the data as table to a new file.
    @param filename:
        is the path to the given CSV file.
    @param **options:
        the union of Python's Standard Library csv module Dialects and Formatting Parameters and the following list:
    @param new_delimiter:
        the new column separator (default " | ")
    @param border:
        boolean value if you want to print the border of the table (default True)
    @param border_vertical_left:
        the left border of the table (default "| ")
    @param border_vertical_right:
        the right border of the table (default " |")
    @param border_horizontal:
        the top and bottom border of the table (default "-")
    @param border_corner_tl:
        the top-left corner of the table (default "+ ")
    @param border_corner_tr:
        the top-right corner of the table (default " +")
    @param border_corner_bl:
        the bottom-left corner of the table (default same as border_corner_tl)
    @param border_corner_br:
        the bottom-right corner of the table (default same as border_corner_tr)
    @param header:
        boolean value if the first row is a table header (default True)
    @param border_header_separator:
        the border between the header and the table (default same as border_horizontal)
    @param border_header_left:
        the left border of the table header (default same as border_corner_tl)
    @param border_header_right:
        the right border of the table header (default same as border_corner_tr)
    @param newline:
        defines how the rows of the table will be separated (default "\n")
    @param new_filename:
        the new file's filename (*default* "/new_" + filename)
    """

    #function specific options
    new_delimiter           = options.pop("new_delimiter", " | ")
    border                  = options.pop("border", True)
    border_vertical_left    = options.pop("border_vertical_left", "| ")
    border_vertical_right   = options.pop("border_vertical_right", " |")
    border_horizontal       = options.pop("border_horizontal", "-")
    border_corner_tl        = options.pop("border_corner_tl", "+ ")
    border_corner_tr        = options.pop("border_corner_tr", " +")
    border_corner_bl        = options.pop("border_corner_bl", border_corner_tl)
    border_corner_br        = options.pop("border_corner_br", border_corner_tr)
    header                  = options.pop("header", True)
    border_header_separator = options.pop("border_header_separator", border_horizontal)
    border_header_left      = options.pop("border_header_left", border_corner_tl)
    border_header_right     = options.pop("border_header_right", border_corner_tr)
    newline                 = options.pop("newline", "\n")

    file_path = filename.split(os.sep)
    old_filename = file_path[-1]
    new_filename            = options.pop("new_filename", "new_" + old_filename)

    column_max_width = {} #key:column number, the max width of each column
    num_rows = 0 #the number of rows

    with open(filename, "rb") as input: #parse the file and determine the width of each column
        reader=csv.reader(input, **options)
        for row in reader:
            num_rows += 1
            for col_number, column in enumerate(row):
                width = len(column)
                try:
                    if width > column_max_width[col_number]:
                        column_max_width[col_number] = width
                except KeyError:
                    column_max_width[col_number] = width

    max_columns = max(column_max_width.keys()) + 1 #the max number of columns (having rows with different number of columns is no problem)

    if max_columns > 1:
        total_length = sum(column_max_width.values()) + len(new_delimiter) * (max_columns - 1)
        left = border_vertical_left if border is True else ""
        right = border_vertical_right if border is True else ""
        left_header = border_header_left if border is True else ""
        right_header = border_header_right if border is True else ""

        with open(filename, "rb") as input:
            reader=csv.reader(input, **options)
            with open(new_filename, "w") as output:
                for row_number, row in enumerate(reader):
                    max_index = len(row) - 1
                    for index in range(max_columns):
                        if index > max_index:
                            row.append(' ' * column_max_width[index]) #append empty columns
                        else:
                            diff = column_max_width[index] - len(row[index])
                            row[index] = row[index] + ' ' * diff #append spaces to fit the max width

                    if row_number==0 and border is True: #draw top border
                        output.write(border_corner_tl + border_horizontal * total_length + border_corner_tr + newline)
                    output.write(left + new_delimiter.join(row) + right + newline) #print the new row
                    if row_number==0 and header is True: #draw header's separator
                        output.write(left_header + border_header_separator * total_length + right_header + newline)
                    if row_number==num_rows-1 and border is True: #draw bottom border
                        output.write(border_corner_bl + border_horizontal * total_length + border_corner_br)

1
由于我在回答自己的问题,因此我将其设置为社区维基,以便我不会获得任何声望。此外,我是Python初学者,因此我很希望看到人们为维基做出贡献,纠正任何错误(包括英语语法!)或改进代码以学习最佳实践。 - psxls
1
жҲ‘зңӢеҲ°зҡ„дёҖ件дәӢжҳҜж–Ү件没жңүиў«жӯЈзЎ®ең°жү“ејҖпјҢд»ҘдҫҝCSVжЁЎеқ—иҝӣиЎҢиҜ»еҸ–гҖӮжӮЁйңҖиҰҒдҪҝз”ЁдәҢиҝӣеҲ¶ж–Ү件模ејҸпјҲеңЁPython 2дёӯпјүжҲ–newlines =""пјҲеңЁPython 3дёӯпјүгҖӮеҗҰеҲҷпјҢеҪ“Pythonе’ҢcsvжЁЎеқ—еҗҢж—¶е°қиҜ•еӨ„зҗҶйҖҡз”ЁжҚўиЎҢз¬Ұж—¶пјҢжҹҗдәӣCSVж–Ү件еҸҜиғҪдјҡеј•иө·й—®йўҳгҖӮ - Blckknght
@Blckknght非常感谢您的评论。请告诉我,我所做的更新是否解决了问题,或者我是否理解错误! - psxls
@psxls:你的更新应该解决Python 2的问题。为了编写适用于Python 2.6+和Python 3的代码,可以使用io.open()代替内置的open(),它具有相同的mode参数(接受'b'表示二进制模式)。或者,可以编写一个简单的帮助函数,在二进制模式下打开文件,根据使用的Python版本执行正确的操作。例如,可以查看我在这个答案中提供的open_csv()函数。 - martineau

2

自从这个问题被提出以来,很多事情已经发生了变化。我想给出一个更新的替代解决方案。您提到了“没有任何外部工具”,但我认为使用pip包是公平的。例如,这正是tabulate解决的确切问题。根据他们的文档,您可以执行以下操作:

import csv ; from StringIO import StringIO
table = list(csv.reader(StringIO("spam, 42\neggs, 451\n")))
print(tabulate(table))

并获得一个Markdown用户友好的表格。如果您有pandas可用,那么它可能会更容易:

print(pandas.read_csv(filename).to_markdown(index=False))

然后你会得到类似这样的东西:

| food   |   amount |
|:-------|---------:|
| apple  |       12 |
| pear   |       34 |

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接