如何使用Python进行批量插入Oracle数据库？

Question

如何使用Python进行批量插入Oracle数据库？

pythonoraclepython-2.7cx-oraclebatch-insert

16

我有一些每月的天气数据，想要将它们插入到Oracle数据库表中，但是我想要批量插入相应记录以提高效率。请问如何在Python中实现这个功能？

举个例子，假设我的表有四个字段：一个站点ID、一个日期和两个数值字段。记录通过站点ID和日期字段（组合键）进行唯一标识。每个站点需要插入的数值将保存在一个列表中，其中包含X年的完整数据，例如如果有两年的数值，则数值列表将包含24个数值。

我假设如果我想逐个插入记录，以下是我所要做的：

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12
for i in range(number_of_years):
    for j in range(12):
        # make a date for the first day of the month
        date_value = datetime.date(start_year + i, j + 1, 1)
        index = (i * 12) + j
        sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, %s, %s, %s)', (station_id, date_value, temps[index], precips[index]))
        cursor.execute(sql_insert)
connection.commit()

有没有一种批量插入的方法可以提高效率，类似我上面所做的那样？另外，我的经验是使用Java / JDBC / Hibernate，所以如果有人能给出与Java方法相比较的解释/示例，那就更好了。

编辑：也许我需要使用cursor.executemany()，如此处所述？

非常感谢您提供任何建议、评论等。

- James Adams

2

使用 .executemany() 方法调用来代替？ - Martijn Pieters

是的，Martijn，我刚刚发现了这个（并更新/编辑了问题以包括它），它似乎是解决这个问题的方法。感谢您的建议！ - James Adams

2

.executemany() 不是批量更新;你将执行与你有的游标一样多的DML语句...它可以使你的Python代码更加干净，但对于数据库来说并不更容易处理。您的原始数据是什么样子；它是由用户输入并存储在代码中的变量中，还是在文本文件中等等？ - Ben

是的 @alldayremix，但是 OCIstmtExecute 不支持批处理： "对于非 SELECT 语句，执行此语句的次数等于 iters - rowoff。" 在 c 源代码中，Cursor_ExecuteMany 将行数传递给 Cursor_InternalExecute，然后传递给 OCIStmtExecute。 - Ben

这个页面与该陈述相矛盾：“大量插入操作不需要许多单独的插入，因为Python完全支持使用cx_Oracle.Cursor.executemany方法一次性插入多行。限制执行操作的数量可以显著提高程序性能，而且在编写大量INSERT操作的应用程序时应该是要考虑的第一件事情。”那么哪个是正确的？我无法在任何地方找到cx_Oracle基准测试。 - alldayremix

显示剩余5条评论

5个回答

9

请使用Cursor.prepare()和Cursor.executemany()方法。

根据cx_Oracle文档：

Cursor.prepare(statement[, tag])

在调用execute()之前使用此方法定义将要执行的语句。当这样做时，如果调用execute()时传入None或与语句相同的字符串对象，则不会执行准备阶段。[...]

Cursor.executemany(statement, parameters)

为一条语句准备执行并针对参数映射或序列中的所有参数执行它。该语句的管理方式与execute()方法相同。

因此，使用上述两个函数，您的代码将变为：

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12

# list comprehension of dates for the first day of the month
date_values = [datetime.date(start_year + i, j + 1, 1) for i in range(number_of_years) for j in range(12)]

# second argument to executemany() should be of the form:
# [{'1': value_a1, '2': value_a2}, {'1': value_b1, '2': value_b2}]
dict_sequence = [{'1': date_values[i], '2': temps[i], '3': precips[i]} for i in range(1, len(temps))]

sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, :1, :2, :3)', station_id)
cursor.prepare(sql_insert)
cursor.executemany(None, dict_sequence)
connection.commit()

此外，还可以查看Oracle的掌握Oracle+Python系列文章。

- alldayremix

谢谢您。我尝试了上述方法，但由于executemany()只接受两个参数，所以不起作用。而且似乎仍然需要创建一组行作为序列参数的参数，如这里所述的“一次处理多个”部分：http://www.oracle.com/technetwork/articles/dsl/prez-python-queries-101587.html，以及描述使用executemany()的章节：http://www.oracle.com/technetwork/articles/dsl/prez-python-queries-101587.html - James Adams

另外，使用prepare()方法并不清楚能带来什么好处，因为根据cx_Oracle.Cursor.execute()文档的描述：“游标将保留对该语句的引用。如果再次传入None或相同的字符串对象，则游标将再次执行该语句，而无需执行准备或重新绑定和重新定义。”但是我可能在这里误解了什么... - James Adams

2

使用prepare()并没有什么好处，它只是让你的代码更清晰；无论哪种方法，底层C代码都是相同的。至于你的第一个评论，我更新了我的答案来解决这个问题：executemany()将字典序列作为第二个参数。 - alldayremix

6

作为其中一个评论所说的，可以考虑使用 INSERT ALL。据说这比使用 executemany() 要快得多。

例如：

INSERT ALL
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
SELECT * FROM dual;

http://www.techonthenet.com/oracle/questions/insert_rows.php

- interestedparty333

1

这是一个正确的答案。多个插入操作可以通过一次往返数据库来处理。 - YoYo

6

我的测试结果如下：

我插入了5000行，每行3列。

运行5000次插入操作，耗时1.24分钟。
使用executemany运行，只需0.125秒。
使用一段全部插入的代码运行，需要4.08分钟。

Python代码设置SQL语句如下：

insert all into t(a,b,c) select :1, :2, :3 from dual union all select :4, :5: :6 from daul...

设置这个长SQL语句的Python代码耗时0.145329秒。

我在一台非常老旧的Sun机器上进行了测试。CPU频率为1415MH。

在第三种情况下，我检查了数据库端，等待事件是“SQL * Net来自客户端的更多数据”，这意味着服务器正在等待来自客户端的更多数据。

对于我来说，第三种方法的结果是不可思议的，如果没有测试的话。

因此，我的短期建议就是只使用executemany。

- zhihuifan

3

我会使用union创建一个大型的SQL插入语句：

insert into mytable(col1, col2, col3)
select a, b, c from dual union
select d, e, f from dual union
select g, h, i from dual

你可以在Python中构建字符串，并将其作为一个语句传递给Oracle执行。

- Derrick

2

这是一个正确的答案。多个插入通过一次往返到数据库来处理。一个问题 - 你可能想要使用UNION ALL而不是普通的UNION。 - YoYo

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- James Adams · Accepted Answer

这是我想出来的方法，看起来效果很好（如果有改进的方法，请评论）：

# build rows for each date and add to a list of rows we'll use to insert as a batch 
rows = [] 
numberOfYears = endYear - startYear + 1
for i in range(numberOfYears):
    for j in range(12):
        # make a date for the first day of the month
        dateValue = datetime.date(startYear + i, j + 1, 1)
        index = (i * 12) + j
        row = (stationId, dateValue, temps[index], precips[index])
        rows.append(row)

# insert all of the rows as a batch and commit
ip = '192.1.2.3' 
port = 1521
SID = 'my_sid'
dsn = cx_Oracle.makedsn(ip, port, SID)
connection = cx_Oracle.connect('username', 'password', dsn)
cursor = cx_Oracle.Cursor(connection)
cursor.prepare('insert into ' + database_table_name + ' (id, record_date, temp, precip) values (:1, :2, :3, :4)')
cursor.executemany(None, rows)
connection.commit()
cursor.close()
connection.close()