循环访问SQLite数据库时，Python脚本被锁定

Question

循环访问SQLite数据库时，Python脚本被锁定

linuxpython-2.7sqlite

3

请查看我的解析器代码。它在循环中访问网页并抓取一些统计信息，然后将指定的记录放入SQLite3数据库中。一切都进行得很顺利，直到第87行（SQL语句），此时进程占用了所有的CPU资源，并且实际上被阻塞了。

文件"./parser.py"的第86行是：

while (j < i):

代码开头的数据库文件结构正确，所以问题出现在循环中。主循环for season in season_list:的内部块运行得非常好。这是我的脚本的全部代码：

#!/usr/bin/env python
from bs4 import BeautifulStoneSoup
from urllib2 import urlopen
import re
import sqlite3
from time import gmtime, strftime

# Print start time 
print "We started at ", strftime("%Y-%m-%d %H:%M:%S", gmtime())

# Create DB
print "Trying to create DB"
con = sqlite3.connect('england.db')
cur = con.cursor()
sql = """\
CREATE TABLE english_premier_league (
    id_match INTEGER PRIMARY KEY AUTOINCREMENT,
    season TEXT,
    tour INTEGER,
    date TEXT,
    home TEXT,
    visitor TEXT,
    home_score INTEGER,
    visitor_score INTEGER
    );
"""
try:
    cur.executescript(sql)
except sqlite3.DatabaseError as err:
    print "Error creating database: ", err
else:
    print "Succesfully created your database..."
    con.commit()
    cur.close()
    con.close()

# list of variables
postfix = 2011
threshold = 1999
season_list = []
while postfix >= threshold:
    end = (postfix + 1) % 2000
    if (end >= 10):
        season = str(postfix) + str(end)
    else:
        season = str(postfix) + str(0) + str(end)
    season_list.append(season)
    postfix -= 1
print season_list

# main loop
for season in season_list:
    href = 'http://www.stat-football.com/en/a/eng.php?b=10&d='+season+'&c=51'
    print href
    xml = urlopen(href).read()
    xmlSoup = BeautifulStoneSoup(xml)
    tablet = xmlSoup.find(attrs={"class" : "bd5"})

    #Access DB      
    con = sqlite3.connect('england.db')
    cur = con.cursor()

    #Parse site
    tour = tablet.findAll(attrs = { "class" : re.compile(r"^(s3|cc s3)$") })
    date = tablet.findAll(text = re.compile(r"(0[1-9]|[12][0-9]|3[01])\.(0[1-9]|1[012])\.(19|20)\d\d"))
    home = tablet.findAll(attrs = {"class" : "nw"})
    guest = tablet.findAll(attrs = {"class" : "s1"})
    score = tablet.findAll(attrs = {"class" : "nw pr15"})

    #
    def parse_string(sequence):
        result=[]
        for unit in sequence:
            text = ''.join(unit.findAll(text=True))
            result.append(text.strip())
        return result

    tour_list=parse_string(tour)
    home_list=parse_string(home)
    guest_list=parse_string(guest)
    score_list=parse_string(score)

    #Loop over found records to put them into sqlite3 DB
    i = len(tour_list)
    j = 0
    while (j < i):
        sql_add = 'INSERT INTO english_premier_league (season, tour, date, home, visitor, home_score, visitor_score) VALUES (?, ?, ?, ?, ?, ?, ?)'
        match = (season, int(tour_list[j]), date[j], home_list[j], guest_list[j], int(score_list[j][0:1]), int(score_list[j][2:3]))
        try:
            cur.executemany(sql_add, match)
        except sqlite3.DatabaseError as err:
            print "Error matching the record: ", err
        else:
            con.commit()
        part = float(j)/float(i)*100
        if (part%10 == 0):
            print (int(part)), "%"
        j += 1
    cur.close()
    con.close()

此外，查看strace输出的末尾可能会很有用：

getcwd（“/ home / vitaly / football_forecast / epl”，512）= 35 stat（“/ home / vitaly / football_forecast / epl / england.db”， {st_mode = S_IFREG | 0644，st_size = 24576，...}）= 0 打开（“/ home / vitaly / football_forecast / epl / england.db”，O_RDWR | O_CREAT， 0644）= 3fcntl（3，F_GETFD）= 0 fcntl（3， F_SETFD，FD_CLOEXEC）= 0 fstat（3，{st_mode = S_IFREG | 0644， st_size = 24576，...}）= 0 lseek（3,0，SEEK_SET）= 0 read（3，“SQLite格式3 \ 0 \ 4 \ 0 \ 1 \ 1 \ 0 @ \ 0 \ 0 \ 1〜\ 0 \ 0 \ 0 \ 30” ...，100）= 100

我在Ubuntu 12.04上运行Python 2.7。非常感谢。

- Vitaly Isaev

executemany 可能只需要改为 execute。 - Radio-

很遗憾，脚本没有到达这个点。它在 while 循环的开头被阻塞了（第 86-87 行）。 - Vitaly Isaev

你为什么这样认为？使用替换后，脚本在我这里可以正常工作。 - Radio-

非常感谢，我修复了你指出的错误，并使用python2.7运行了脚本（在之前的情况下，我不知为何错用了python2）。 - Vitaly Isaev

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Radio- · Accepted Answer

将cur.executemany(sql_add, match)替换为cur.execute(sql_add, match)。 executemany()用于在值的可迭代对象上执行多次相同的操作。例如，如果您有以下内容：

match = [ (season1, tour1, date1, home1, visitor1, home_score1, visitor_score1),
          (season2, tour2, date2, home2, visitor2, home_score2, visitor_score2),
          (season3, tour3, date3, home3, visitor3, home_score3, visitor_score3) ]

cur.executemany(sql_add, match)

如果光标可以在match中迭代元组并对每个元组执行插入操作，那么这将是合适的。