如何使用pymysql将MySQL查询结果存储到pandas DataFrame中?

17

我正在尝试使用pymysql将mySQL查询结果存储在pandas DataFrame中,但构建DataFrame时遇到了错误。发现了类似的问题herehere,但看起来会抛出pymysql特定的错误:

import pandas as pd
import datetime
import pymysql

# dummy values 
connection = pymysql.connect(user='username', password='password', databse='database_name', host='host')

start_date = datetime.datetime(2017,11,15)
end_date = datetime.datetime(2017,11,16)

try:
    with connection.cursor() as cursor:
    query = "SELECT * FROM orders WHERE date_time BETWEEN %s AND %s"

    cursor.execute(query, (start_date, end_date)) 

    df = pd.DataFrame(data=cursor.fetchall(), index = None, columns = cursor.keys())
finally:
    connection.close()

返回:AttributeError: 'Cursor'对象没有属性'keys'

如果我省略indexcolumns参数:

try:
    with connection.cursor() as cursor:
    query = "SELECT * FROM orders WHERE date_time BETWEEN %s AND %s"

    cursor.execute(query, (start_date, end_date)) 

    df = pd.DataFrame(cursor.fetchall())
finally:
    connection.close()

返回ValueError: DataFrame 构造函数调用不正确!

先行致谢!

3个回答

38

使用 Pandas.read_sql() 来完成此操作:

query = "SELECT * FROM orders WHERE date_time BETWEEN ? AND ?"
df = pd.read_sql(query, connection,  params=(start_date, end_date))

1
pandas.read_sql() 通常表现良好。但是,如果命令是执行需要更新表格(并提交表格的更新行)的存储过程,该怎么办?在这种情况下,如何确保提交已完成?pd.read_sql是否允许提交? - Nodame
2
@Nodame,你可以先使用SQLAlchemy调用存储过程,然后使用pd.read_sql读取结果。 - MaxU - stand with Ukraine
pymysql连接如何与pandas read_sql一起使用,因为pymysql Connection没有继承任何sqlalchemy类? - Ludvig W

2

Try This:

import pandas as pd
import pymysql

mysql_connection = pymysql.connect(host='localhost', user='root', password='', db='test', charset='utf8')
                    
sql = "SELECT * FROM `brands`"
df = pd.read_sql(sql, mysql_connection, index_col='brand_id')
print(df)

1
感谢您建议使用pandas.read_sql()。它也可以执行存储过程!我在MSSQL 2017环境中进行了测试。
以下是一个示例(希望对其他人有所帮助):
def database_query_to_df(connection, stored_proc, start_date, end_date):
    # Define a query
    query ="SET NOCOUNT ON; EXEC " + stored_proc + " ?, ? " + "; SET NOCOUNT OFF"

    # Pass the parameters to the query, execute it, and store the results in a data frame
    df = pd.read_sql(query, connection, params=(start_date, end_date))
    return df

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接