从pyodbc读取数据到pandas

88

我正在查询SQL数据库,想使用pandas处理数据,但不确定如何移动数据。以下是我的输入和输出。

import pyodbc
import pandas
from pandas import DataFrame

cnxn = pyodbc.connect(r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:\users\bartogre\desktop\CorpRentalPivot1.accdb;UID="";PWD="";')
crsr = cnxn.cursor()
for table_name in crsr.tables(tableType='TABLE'):
    print(table_name)
cursor = cnxn.cursor()
sql = "Select sum(CYTM), sum(PYTM), BRAND From data Group By BRAND"
cursor.execute(sql)
for data in cursor.fetchall():
    print (data)

('C:\\users\\bartogre\\desktop\\CorpRentalPivot1.accdb', None, 'Data', 'TABLE', None)
('C:\\users\\bartogre\\desktop\\CorpRentalPivot1.accdb', None, 'SFDB', 'TABLE', None)
(Decimal('78071898.71'), Decimal('82192672.29'), 'A')
(Decimal('12120663.79'), Decimal('13278814.52'), 'B')

1
我刚刚想通了。请看答案。 - polonius11
3个回答

196

更短、更简洁的答案

import pyodbc
import pandas as pd

cnxn = pyodbc.connect(r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};'
                      r'DBQ=C:\users\bartogre\desktop\data.mdb;')
sql = "Select sum(CYTM), sum(PYTM), BRAND From data Group By BRAND"
data = pd.read_sql(sql,cnxn)  # without parameters [non-prepared statement]

# with a prepared statement, use list/tuple/dictionary of parameters depending on DB
#data = pd.read_sql(sql=sql, con=cnxn, params=query_params) 

这种方法需要进行修改:https://dev59.com/eVEG5IYBdhLWcg3wLmAg - Peej1226

15

我想太多了!

cnxn = pyodbc.connect(r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:\users\bartogre\desktop\CorpRentalPivot1.accdb;UID="";PWD="";')
crsr = cnxn.cursor()
for table_name in crsr.tables(tableType='TABLE'):
    print(table_name)
cursor = cnxn.cursor()
sql = "Select sum(CYTM), sum(PYTM), BRAND From data Group By BRAND"
cursor.execute(sql)
data = cursor.fetchall()
print(data)
Data = pandas.DataFrame(data)
print(Data)

19
这难道不是把所有东西都排成一列吗?! - AER
6
是的,但这似乎可以修复它: Data = pandas.DataFrame.from_records(cursor.fetchall(), columns=[col[0] for col in cursor.description]) 注:该代码行使用 pandas 库从数据库游标获取数据,并将其转换为 DataFrame 格式。 - Bungler

12

另外一种更快的方法。请参见 data = pd.read_sql(sql, cnxn)

import pyodbc
import pandas as pd
from pandas import DataFrame
from pandas.tools import plotting
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

cnxn = pyodbc.connect(r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)}; DBQ=C:\users\bartogre\desktop\data.mdb;UID="";PWD="";')
crsr = cnxn.cursor()
for table_name in crsr.tables(tableType='TABLE'):
    print(table_name)
cursor = cnxn.cursor()
sql = "Select *"
sql = sql + " From data"
print(sql)
cursor.execute(sql)
data = pd.read_sql(sql, cnxn)

5
我认为这里不需要使用cursor.execute(sql) - Tim Smith
1
从with crsr = cnxn.cursor()到cursor.execute(sql)这部分,您执行了查询两次。只需要data = pd.read_sql(sql, cnxn)就可以了。 - MERT DOĞAN

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接