4个回答

24

这个答案所示,pyparsing似乎是解决这个问题的合适工具:

inputdata = '(1 ((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))'

from pyparsing import OneOrMore, nestedExpr

data = OneOrMore(nestedExpr()).parseString(inputdata)
print data

# [['1', [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']], [['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]]]

为了完整性,以下是如何使用texttable格式化结果的方法(使用texttable):
from texttable import Texttable

tab = Texttable()
for row in data.asList()[0][1:]:
    row = dict(row)
    tab.header(row.keys())
    tab.add_row(row.values())
print tab.draw()
要将这些数据转换回lisp符号表达式,请执行以下操作:
(timesig 12 keysig 1 st 8 pitch 67 dur 4 fermata 0) (timesig 12 keysig 1 st 12 pitch 67 dur 8 fermata 0)
def lisp(x):
    return '(%s)' % ' '.join(lisp(y) for y in x) if isinstance(x, list) else x

d = lisp(d[0])

1
这绝对是正确答案,因为Op要求“一个Python 模块智能地解析这个”。 - Bakuriu

2

如果您知道数据是正确的并且格式统一(乍一看似乎是这样),如果您只需要这些数据而不需要解决一般性问题...那么为什么不用空格替换每个非数字字符,然后使用split函数呢?

import re
data = open("chorales.lisp").read().split("\n")
data = [re.sub("[^-0-9]+", " ", x) for x in data]
for L in data:
    L = map(int, L.split())
    i = 1  # first element is chorale number
    while i < len(L):
        st, pitch, dur, keysig, timesig, fermata = L[i:i+6]
        i += 6
        ... your processing goes here ...

1

使用正则表达式将其分成一对:

In [1]: import re

In [2]: txt = '(((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))'

In [3]: [p.split() for p in re.findall('\w+\s+\d+', txt)]
Out[3]: [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0'], ['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]

然后将其制作成字典:

dct = {}
for p in data:
    if not p[0] in dct.keys():
        dct[p[0]] = [p[1]]
    else:
        dct[p[0]].append(p[1])

结果如下:

In [10]: dct
Out[10]: {'timesig': ['12', '12'], 'keysig': ['1', '1'], 'st': ['8', '12'], 'pitch': ['67', '67'], 'dur': ['4', '8'], 'fermata': ['0', '0']}

打印:

print 'time pitch duration keysig timesig fermata'
for t in range(len(dct['st'])):
    print dct['st'][t], dct['pitch'][t], dct['dur'][t], 
    print dct['keysig'][t], dct['timesig'][t], dct['fermata'][t]

适当的格式化留给读者自行练习...


0
由于数据已经在Lisp中,可以使用Lisp本身将数据操作成一个众所周知的格式,例如CSV或TSV:
    (let ((input '(1 ((ST 8) (PITCH 67) (DUR 4) (KEYSIG 1) (TIMESIG 12) (FERMATA 0))
                    ((ST 12) (PITCH 67) (DUR 8) (KEYSIG 1) (TIMESIG 12) (FERMATA 0)))))
               (let*
                   ((headers (mapcar #'first (cadr input)))
                    (rows (cdr input))
                    (row-data (mapcar (lambda (row) (mapcar #'second row)) rows))
                    (csv (cons headers row-data)))
                 (format t "~{~{~A~^,~}~^~%~}" csv)))

ST,PITCH,DUR,KEYSIG,TIMESIG,FERMATA
8,67,4,1,12,0
12,67,8,1,12,0

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接