循环访问SQLite数据库时,Python脚本被锁定

3
请查看我的解析器代码。它在循环中访问网页并抓取一些统计信息,然后将指定的记录放入SQLite3数据库中。一切都进行得很顺利,直到第87行(SQL语句),此时进程占用了所有的CPU资源,并且实际上被阻塞了。
文件"./parser.py"的第86行是:
while (j < i):
代码开头的数据库文件结构正确,所以问题出现在循环中。主循环for season in season_list:的内部块运行得非常好。这是我的脚本的全部代码:
#!/usr/bin/env python
from bs4 import BeautifulStoneSoup
from urllib2 import urlopen
import re
import sqlite3
from time import gmtime, strftime

# Print start time 
print "We started at ", strftime("%Y-%m-%d %H:%M:%S", gmtime())

# Create DB
print "Trying to create DB"
con = sqlite3.connect('england.db')
cur = con.cursor()
sql = """\
CREATE TABLE english_premier_league (
    id_match INTEGER PRIMARY KEY AUTOINCREMENT,
    season TEXT,
    tour INTEGER,
    date TEXT,
    home TEXT,
    visitor TEXT,
    home_score INTEGER,
    visitor_score INTEGER
    );
"""
try:
    cur.executescript(sql)
except sqlite3.DatabaseError as err:
    print "Error creating database: ", err
else:
    print "Succesfully created your database..."
    con.commit()
    cur.close()
    con.close()

# list of variables
postfix = 2011
threshold = 1999
season_list = []
while postfix >= threshold:
    end = (postfix + 1) % 2000
    if (end >= 10):
        season = str(postfix) + str(end)
    else:
        season = str(postfix) + str(0) + str(end)
    season_list.append(season)
    postfix -= 1
print season_list

# main loop
for season in season_list:
    href = 'http://www.stat-football.com/en/a/eng.php?b=10&d='+season+'&c=51'
    print href
    xml = urlopen(href).read()
    xmlSoup = BeautifulStoneSoup(xml)
    tablet = xmlSoup.find(attrs={"class" : "bd5"})

    #Access DB      
    con = sqlite3.connect('england.db')
    cur = con.cursor()

    #Parse site
    tour = tablet.findAll(attrs = { "class" : re.compile(r"^(s3|cc s3)$") })
    date = tablet.findAll(text = re.compile(r"(0[1-9]|[12][0-9]|3[01])\.(0[1-9]|1[012])\.(19|20)\d\d"))
    home = tablet.findAll(attrs = {"class" : "nw"})
    guest = tablet.findAll(attrs = {"class" : "s1"})
    score = tablet.findAll(attrs = {"class" : "nw pr15"})

    #
    def parse_string(sequence):
        result=[]
        for unit in sequence:
            text = ''.join(unit.findAll(text=True))
            result.append(text.strip())
        return result

    tour_list=parse_string(tour)
    home_list=parse_string(home)
    guest_list=parse_string(guest)
    score_list=parse_string(score)

    #Loop over found records to put them into sqlite3 DB
    i = len(tour_list)
    j = 0
    while (j < i):
        sql_add = 'INSERT INTO english_premier_league (season, tour, date, home, visitor, home_score, visitor_score) VALUES (?, ?, ?, ?, ?, ?, ?)'
        match = (season, int(tour_list[j]), date[j], home_list[j], guest_list[j], int(score_list[j][0:1]), int(score_list[j][2:3]))
        try:
            cur.executemany(sql_add, match)
        except sqlite3.DatabaseError as err:
            print "Error matching the record: ", err
        else:
            con.commit()
        part = float(j)/float(i)*100
        if (part%10 == 0):
            print (int(part)), "%"
        j += 1
    cur.close()
    con.close()

此外,查看strace输出的末尾可能会很有用:
getcwd(“/ home / vitaly / football_forecast / epl”,512)= 35 stat(“/ home / vitaly / football_forecast / epl / england.db”, {st_mode = S_IFREG | 0644,st_size = 24576,...})= 0 打开(“/ home / vitaly / football_forecast / epl / england.db”,O_RDWR | O_CREAT, 0644)= 3fcntl(3,F_GETFD)= 0 fcntl(3, F_SETFD,FD_CLOEXEC)= 0 fstat(3,{st_mode = S_IFREG | 0644, st_size = 24576,...})= 0 lseek(3,0,SEEK_SET)= 0 read(3,“SQLite格式3 \ 0 \ 4 \ 0 \ 1 \ 1 \ 0 @ \ 0 \ 0 \ 1〜\ 0 \ 0 \ 0 \ 30” ...,100)= 100
我在Ubuntu 12.04上运行Python 2.7。非常感谢。

executemany 可能只需要改为 execute。 - Radio-
很遗憾,脚本没有到达这个点。它在 while 循环的开头被阻塞了(第 86-87 行)。 - Vitaly Isaev
你为什么这样认为?使用替换后,脚本在我这里可以正常工作。 - Radio-
非常感谢,我修复了你指出的错误,并使用python2.7运行了脚本(在之前的情况下,我不知为何错用了python2)。 - Vitaly Isaev
1个回答

1

cur.executemany(sql_add, match)替换为cur.execute(sql_add, match)executemany()用于在值的可迭代对象上执行多次相同的操作。例如,如果您有以下内容:

match = [ (season1, tour1, date1, home1, visitor1, home_score1, visitor_score1),
          (season2, tour2, date2, home2, visitor2, home_score2, visitor_score2),
          (season3, tour3, date3, home3, visitor3, home_score3, visitor_score3) ]

cur.executemany(sql_add, match)

如果光标可以在match中迭代元组并对每个元组执行插入操作,那么这将是合适的。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接