将ls命令输出转换为CSV格式

15

我该如何转换:

$ find . -ls > /tmp/files.txt

这使我得到了类似这样的东西:

908715       40 -rwxrwxr-x    1 david            staff               16542 Nov 15 14:12 ./dump_info.py
908723        0 drwxr-xr-x    2 david            staff                  68 Nov 20 17:35 ./metadata

转换为CSV输出?它将看起来像:

908715,40,-rwxrwxr-x,1,david,staff,16542,Nov 15 14:12,./dump_info.py
908723,0,drwxr-xr-x,2,david,staff,68,Nov 20 17:35,./metadata

这是一个文件名中包含空格的示例标题:

652640,80,-rw-rw-r--,1,david,staff,40036,Nov,6,15:32,./v_all_titles/V Catalog Report 11.5.xlsx
7个回答

7
如果您不关心日期中的空格:
$ find . -ls | tr -s ' ' ,

如果您在意这些空格:
$ find . -ls | awk '{printf( "%s,%s,%s,%s,%s,%s,%s,%s %s %s,%s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11 )}'

如果您的文件名中包含任何空格,则这两种方法都不会起作用。为了解决文件名中的空格问题,您可以尝试以下方法:
 ... | sed 's/,/ /8g'

为了除去除第一个逗号外的所有逗号(假设你的sed支持非标准的8g选项,就像gnu sed一样)。当然,这不会处理文件名中的逗号。

1
文件名中如果有空格怎么办? - David542
2
tr 解决方案将会用逗号替换文件名中的空格。awk 解决方案在空格后无法打印文件名的任何部分。如果文件名包含换行符,则输出将为多行。此外,如果任何文件名包含逗号,则您的 CSV 将格式不正确。如果这些问题是相关的,则需要更多的工作。也许只需将输出管道传输到 sed 并删除不需要的逗号即可处理文件名中的空格... - William Pursell
@WilliamPursell 使用sed也可能行不通...他的文件名可能是foo bar .t xt...啊!!!我的意思是有多个空格,foo_ _ _ bar.tx _ _t - Kent
1
此外,您可以使用 ls --time-style=long-iso 来避免日期中的空格。 - Hans-Helge

7

在命令行中输入有点长,但它可以正确保留文件名中的空格(并且还加上了引号!)

find . -ls | python -c '
import sys
for line in sys.stdin:
    r = line.strip("\n").split(None, 10)
    fn = r.pop()
    print ",".join(r) + ",\"" + fn.replace("\"", "\"\"") + "\""
'

当,那真的很酷(还需要在文件名末尾替换")。 - David542
有没有办法在bash脚本中编写多行Python代码?还是所有代码都必须写在一行里? - David542
1
这是一个适合直接包含在bash脚本中的多行Python脚本(因为它用'分隔)。 - nneonneo
1
顺便提一下,如果你想保留文件名中的空格,可以在最后一行使用 print ",".join(r[:7]) + "," + " ".join(r[7:]) + ",\"" + fn.replace... - nneonneo

2

还有另一种变体。请参阅find的手册中“-printf格式”部分进行定制。

$ find . -type f -fprintf /tmp/files.txt "%i,%b,%M,%n,%u,%g,%s,%CY-%Cm-%Cd %CT,%p\n"

示例输出:

$ less /tmp/files.txt

3414558,40,-rw-rw-r--,1,webwurst,webwurst,16542,2014-09-18 15:54:36.9232917780,./dump_info.py
3414559,8,-rw-rw-r--,1,webwurst,webwurst,68,2014-09-18 15:54:51.1752922580,./metadata

1
这是一份我起草的Python脚本...
#!/opt/app/python/bin/python
# Convert ls output to clean csv    Paolo Villaflores 2015-03-16
#
# Sample usage: ls -l | ls2csv.py
#
# Features:
#   accepts -d argument to change dates to yyyy-mm-dd_hhmm format
#   input is via stdin
#   separate file/directory field
#   handle -dils type input (find -ls) versus -l
#   handle space in filename, by applying quotes around filename
#   handle date - format into something excel can handle correctly, whether it is from current year or not.
#   adds a header
#   handle symlinks - type l



import sys
from datetime import datetime

b0=True

def is_f(s):
  if s == '-':
    return 'f'
  return s

for line in sys.stdin:
    if len(line) < 40:
      continue
    if b0:
      b1=line[0] in ['-', 'd', 'c', 'l'] # c is for devices e.g. /devices/pseudo/pts@0:5, l is for symbolic link
      b0=False
      if b1:  # true when shorter ls -l style 8/9 columns. 9 for symlink
        cols=7
        print "d,perms,#links,owner,group,size,modtime,name,symlink"
      else:
        cols=9
        print "inode,bsize,d,perms,#links,owner,group,size,modtime,name,symlink"
    r = line.strip("\n").split(None, cols+1)
    if len(r) < cols+1:
      continue
    if r[cols-7][0] == 'c':
       continue  # ignore c records: devices
    fn = r.pop()
    if b1:
      c = ''
    else:
      c = ",".join(r[0:2]) + ","
    z = 0
    z = r[cols].find(':')
    if z < 0:
      d = r[cols - 1] + "/" + r[cols - 2] + "/" + r[cols]
    else:
      n = str(datetime.now()  )
      d = ''
      # handle the case where the timestamp has no year field
      tm=datetime.strptime(r[cols-2]+ " " + r[cols-1]+ " " + n[:4] +" " + r[cols], "%b %d %Y %H:%M")
      if (tm-datetime.now()).days > 0:
        d = r[cols - 1] + "/" + r[cols - 2] + "/" + str((datetime.now().year-1)) + " " + r[cols]
        tm=datetime.strptime(r[cols-2]+ " " + r[cols-1]+ " " + str(int(n[:4])-1) +" " + r[cols], "%b %d %Y %H:%M")
      else:
        d = r[cols - 1] + "/" + r[cols - 2] + "/" + " ".join([n[:4], r[cols] ] )
      if len(sys.argv) > 1 and sys.argv[1] == '-d':
        d=tm.strftime("%Y-%m-%d_%H%M")

    y = fn.find(">")
    symlink=''
    if y > 0:
       symlink = ',\"' + fn[y+2:] + '"'
       fn = fn[:y-2]
    if  fn.find( " ") <0:
      if fn.find('"') <0:
        fn2=fn
      else:
        fn2="'" + fn + "'"
    else:
      fn2="'" + fn + "'"
    print c+ is_f(r[cols-7][0]) + ",\"" + r[cols-7][1:] + "\"," + ",".join(
      r[cols-6:cols-2]) + "," + d + "," + fn2 + symlink

0

这应该能完成任务

 find . -ls|awk 'BEGIN{OFS=","}$1=$1'

1
请看我之前提出的问题——如果文件名中有空格怎么办?(实际上是有的) - David542

0
ls target 

boto3-1.11.3-py2.py3-none-any.whl
engagment-states-batch-rds-loader-0.1.27.whl
mypy_extensions-0.4.3-py2.py3-none-any.whl
mysql_connector_python-8.0.15-cp36-cp36m-macosx_10_13_x86_64.whl
pandas-0.25.3-cp36-cp36m-macosx_10_9_x86_64.whl
retrying-1.3.3-py3-none-any.whl
structlog-19.2.0-py2.py3-none-any.whl
typing-3.7.4.1-py3-none-any.whl


echo $(ls target) | tr ' ' ,

boto3-1.11.3-py2.py3-none-any.whl,engagment-states-batch-rds-loader-0.1.27.whl,mypy_extensions-0.4.3-py2.py3-none-any.whl,mysql_connector_python-8.0.15-cp36-cp36m-macosx_10_13_x86_64.whl,pandas-0.25.3-cp36-cp36m-macosx_10_9_x86_64.whl,retrying-1.3.3-py3-none-any.whl,structlog-19.2.0-py2.py3-none-any.whl,typing-3.7.4.1-py3-none-any.whl

0
你可以使用 sed -r
(
_space_="\ *";
type=".";
perm="[^\ ]*";
hlinks=$perm;
user=$perm;
group=$perm;
size="[0-9]*";
modified=".{12}";
name=".*";
ls -l /etc | sed -r s/"^($type)($perm)$_space_($hlinks)$_space_($user)$_space_($group)$_space_($size)$_space_($modified)$_space_($name)"/'"\1","\2","\3","\4","\5","\6","\7","\8"'/g
)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接