基本上我使用Python定时任务从网页中读取数据,并将其以CSV列表的形式放置:
.....
###1309482902.37
entry1,36,257.21,16.15,16.168
entry2,4,103.97,16.36,16.499
entry3,2,114.83,16.1,16.3
entry4,130.69,15.6737,16.7498
entry5,5.20,14.4,17
$$$
###1309482902.37
entry1,36,257.21,16.15,16.168
entry2,4,103.97,16.36,16.499
entry3,2,114.83,16.1,16.3
entry4,130.69,15.6737,16.7498
entry5,5.20,14.4,17
$$$
我的代码基本上是做一个正则表达式搜索,然后遍历###和$$$之间的所有匹配项,然后逐行处理每个匹配项,将每一行按逗号分割。正如您所看到的,有些条目有4个逗号,而有些有5个。这是因为我很傻,没有意识到网络源在其4位数字中放置了逗号。例如:
entry1,36,257.21,16.15,16.168
实际上应该是
entry1,36257.21,16.15,16.168
我已经收集了大量数据,不想重写代码,所以我想到了一个麻烦的解决方法。是否有更Pythonic的方法来解决这个问题?
===
contents = ifp.read()
#Pull all entries from the market data
for entry in re.finditer("###(.*\n)*?\$\$\$",contents):
dataSet = contents[entry.start():entry.end()]
dataSet = dataSet.split('\n');
timeStamp = dataSet[0][3:]
print timeStamp
for i in xrange(1,8):
splits = dataSet[i].split(',')
if(len(splits) == 5):
remove = splits[1]
splits[2] = splits[1] + splits[2]
splits.remove(splits[1])
print splits
## DO SOME USEFUL WORK WITH THE DATA ##
===
csv
。运行 - Ignacio Vazquez-Abrams