使用Python、SQLAlchemy和Sqlite设置/插入多对多数据库

Question

使用Python、SQLAlchemy和Sqlite设置/插入多对多数据库

pythonsqliteinsertsqlalchemymany-to-many

7

我正在学习Python，作为第一个项目，我正在获取Twitter的RSS源，解析数据，并将数据插入到sqlite数据库中。我已经成功地将每个feed条目解析成一个content变量（例如，“你应该低价购买...”），一个url变量（例如u'http://bit.ly/HbFwL'），以及一个hashtag列表（例如#stocks'，u'#stockmarket'，u'#finance'，u'#money'，u'#mkt'）。我还成功地将这三个信息插入到sqlite“RSSEntries”表中的三个单独的列中，其中每一行都是不同的rss条目/推文。

然而，我想建立一个数据库，在该数据库中，每个rss feed条目（即每个推文）与与每个条目相关联的hashtag之间存在多对多的关系。因此，我使用sqlalchemy设置了以下表（第一个表只包括我想要下载和解析的Twitter用户的rss feed url）：

RSSFeeds = schema.Table('feeds', metadata,
    schema.Column('id', types.Integer, 
        schema.Sequence('feeds_seq_id', optional=True), primary_key=True),
    schema.Column('url', types.VARCHAR(1000), default=u''),
)

RSSEntries = schema.Table('entries', metadata,
    schema.Column('id', types.Integer, 
        schema.Sequence('entries_seq_id', optional=True), primary_key=True),
    schema.Column('feed_id', types.Integer, schema.ForeignKey('feeds.id')),
    schema.Column('short_url', types.VARCHAR(1000), default=u''),
    schema.Column('content', types.Text(), nullable=False),
    schema.Column('hashtags', types.Unicode(255)),
)

tag_table = schema.Table('tag', metadata,
    schema.Column('id', types.Integer,
       schema.Sequence('tag_seq_id', optional=True), primary_key=True),
    schema.Column('tagname', types.Unicode(20), nullable=False, unique=True)
)

entrytag_table = schema.Table('entrytag', metadata,
    schema.Column('id', types.Integer,
        schema.Sequence('entrytag_seq_id', optional=True), primary_key=True),
    schema.Column('entryid', types.Integer, schema.ForeignKey('entries.id')),
    schema.Column('tagid', types.Integer, schema.ForeignKey('tag.id')),
)

到目前为止，我已经成功地将三个主要信息输入RSSEntries表格中，使用以下代码（在此进行了缩写...）

engine = create_engine('sqlite:///test.sqlite', echo=True)
conn = engine.connect()
.........
conn.execute('INSERT INTO entries (feed_id, short_url, content, hashtags) VALUES 
    (?,?,?,?)', (id, tinyurl, content, hashtags))

现在，这里有一个重要的问题。如何将数据插入到hasthag变量是列表的feedtag和tagname表中？对我来说，这是一个真正的难点，因为每个Feed条目可能包含0到6个不等的标签，我知道如何将整个列表插入到单个列中，但不知道如何将列表的元素分别插入到不同的列（或者在这个例子中是行）。更大的难点是如何将单个标签插入到tagname表中，当一个标签可以用于多个不同的Feed条目时，然后如何使“关联”在feedtag表中正确显示。

简而言之，我确切地知道当它们全部完成时每个表应该看起来像什么，但我不知道如何编写代码将数据放入tagname和feedtag表中。整个“多对多”的设置对我来说是新的。

我真的需要你的帮助。提前感谢任何建议。

-Greg

附注 - 编辑 - 由于Ants Aasma的出色建议，我已经能够几乎让整个过程工作了。具体来说，第1和第2个建议的代码块现在运行良好，但我在实施第3个代码块时遇到了问题。我收到以下错误：

Traceback (most recent call last):
  File "RSS_sqlalchemy.py", line 242, in <module>
    store_feed_items(id, entries)
  File "RSS_sqlalchemy.py", line 196, in store_feed_items
    [{'feedid': entry_id, 'tagid': tag_ids[tag]} for tag in hashtags2])
NameError: global name 'entry_id' is not defined

接着，因为我无法确定Ants Aasma从哪里得到了“entry_id”部分，所以我尝试将其替换为“entries.id”，认为这样可以插入“entries”表中的“id”。然而，这种情况下我会收到以下错误提示：

Traceback (most recent call last):
  File "RSS_sqlalchemy.py", line 242, in <module>
    store_feed_items(id, entries)
  File "RSS_sqlalchemy.py", line 196, in store_feed_items
    [{'feedid': entries.id, 'tagid': tag_ids[tag]} for tag in hashtags2])
AttributeError: 'list' object has no attribute 'id'

我不太确定问题出在哪里，也不太明白“entry_id”部分的作用，因此我将所有相关的“插入”代码都粘贴在下面。有人能帮我看看哪里出了问题吗？请注意，我刚刚注意到我错误地称我的最后一个表为“feedtag_table”，而不是“entrytag_table”。这与我最初声明的目标不符，即将单个feed条目与hashtags相关联，而不是将feeds与hashtags相关联。我已经更正了上面的代码。

feeds = conn.execute('SELECT id, url FROM feeds').fetchall()

def store_feed_items(id, items):
    """ Takes a feed_id and a list of items and stored them in the DB """
    for entry in items:
        conn.execute('SELECT id from entries WHERE short_url=?', (entry.link,))
        s = unicode(entry.summary) 
        test = s.split()
        tinyurl2 = [i for i in test if i.startswith('http://')]
        hashtags2 = [i for i in s.split() if i.startswith('#')]
        content2 = ' '.join(i for i in s.split() if i not in tinyurl2+hashtags2)
        content = unicode(content2)
        tinyurl = unicode(tinyurl2)
        hashtags = unicode (hashtags2)
        date = strftime("%Y-%m-%d %H:%M:%S",entry.updated_parsed)

        conn.execute(RSSEntries.insert(), {'feed_id': id, 'short_url': tinyurl,
            'content': content, 'hashtags': hashtags, 'date': date})    

        tags = tag_table
        tag_id_query = select([tags.c.tagname, tags.c.id], tags.c.tagname.in_(hashtags))
        tag_ids = dict(conn.execute(tag_id_query).fetchall())
        for tag in hashtags:
            if tag not in tag_ids:
                result = conn.execute(tags.insert(), {'tagname': tag})
                tag_ids[tag] = result.last_inserted_ids()[0]

        conn.execute(entrytag_table.insert(),
            [{'feedid': id, 'tagid': tag_ids[tag]} for tag in hashtags2])

- Gregory Saxton

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ants Aasma · Accepted Answer

首先，你应该使用SQLAlchemy SQL构建器进行插入操作，这样可以让SQLAlchemy更好地了解你的操作。

 result = conn.execute(RSSEntries.insert(), {'feed_id': id, 'short_url': tinyurl,
        'content': content, 'hashtags': hashtags, 'date': date})
 entry_id = result.last_insert_ids()[0]

为了在模式中插入标签关联，您需要首先查找您的标识符并创建任何不存在的标识符：

tags = tag_table
tag_id_query = select([tags.c.tagname, tags.c.id], tags.c.tagname.in_(hashtags))
tag_ids = dict(conn.execute(tag_id_query).fetchall())
for tag in hashtags:
    if tag not in tag_ids:
        result = conn.execute(tags.insert(), {'tagname': tag})
        tag_ids[tag] = result.last_inserted_ids()[0]

然后将相关的id插入feedtag_table中。你可以通过向execute方法传递字典列表来使用executemany支持。

conn.execute(feedtag_table.insert(),
    [{'feedid': entry_id, 'tagid': tag_ids[tag]} for tag in hashtags])