Pandas如何最简单地读取Access表格?

3

我有一个名为DB_IMPORT_2020.accdb的Access数据库,它只包含一个名为DB_IMPORT_2020_PM的表。我一直在尝试将该表导入到Pandas中。

目前为止,我所做的是:

# define components of our connection string
driver = '{Microsoft Access Driver (*.mdb, *.accdb)}'
filepath = r"C:\Users\corra\Desktop\DB_IMPORT_2020.accdb"

# create a connection to the database
cnxn = pyodbc.connect(driver = driver, dbq = filepath, autocommit = True)

crsr = cnxn.cursor()

# define the components of a query
table_name = 'DB_IMPORT_2020_PM'

# define query

query = "SELECT * FROM {}".format(table_name)

# execute the query

crsr.execute(query)

data = crsr.fetchall()

df = pd.DataFrame(data)

接下来我遇到了这样一种情况,我有一个带有单列和每行一个列表的pandas数据框。

0
________________________________________________________
0   [86232, 2019-09-12, INTERNET, ... , N ]
1   [86233, 2019-09-12, INTERNET, ... , M ]
2   [86234, 2019-09-12, MEZZO LIBERO, ...  , Q ]
3   ...

我觉得这不是正确的方法,而且过于复杂。有没有人知道用Pandas在Access表格中读取数据的更简单方法?

这是使用crsr.fetchall()获取到的数据列表。

[(86232, datetime.datetime(2019, 9, 12, 0, 0), 'INTERNET', 'A.M Web', 'Brand_SMX', 0.0, 'gen', '20_FCST', 'OnLine', 'dipendente s', 'Low Rev.', 'STX', 'A.M', 'INTERNET', 'Brand_SMX', 'dipendente s', 'STORICI', 'TIER 1', 1.0, 'TIER 1', 'ALIMENTARI', '04_SRF', 'SMX', 'ALTRI', 'STC', 'Reservation', 'Off + On', 'Online_Res', 'TIER 1', None, None, None, None),
 (86233, datetime.datetime(2019, 9, 12, 0, 0), 'INTERNET', 'A.M Web', 'Brand_SMX', 0.0, 'feb', '20_FCST', 'OnLine', 'dipendente s', 'Low Rev.', 'STC', 'A. M', 'INTERNET', 'Brand_SMX', 'dipendente s', 'STORICI', 'TIER 1', 1.0, 'TIER 1', 'ALIMENTARI', '04_SRF', 'SMX', 'ALTRI', 'STX', 'Reservation', 'Off + On', 'Online_Res', 'TIER 1', None, None, None, None),
 (86234, datetime.datetime(2019, 9, 12, 0, 0), 'MEZZO LIBERO', 'S ITALIA SRL', 'S ELECTRONICS', 0.0, 'gen', '20_FCST', 'OffLine', 'BO / CI', 'Low Rev.', 'STX', 'S Italia Srl', 'MEZZO LIBERO', 'S', 'BEN BOT', 'STORICI', 'INTERCx', 1.0, 'INTERCx', 'INFORMATICA/FOTOGRAFIA', '04_SRF', 'SMX', 'ALTRI', 'STC', 'Reservation', 'Off + On', 'Offline_Res', 'INTX', None, None, None, None),...]

data = crsr.fetchall() 是什么类型? - Let's try
类型是列表。 - coelidonum
我认为可能 df = pd.DataFrame(data[0]) 可以解决问题。否则,请添加 data 的打印输出? - Let's try
不,它不起作用,我已经添加了列表的前两行。 - coelidonum
2个回答

4
使用pandas和Access数据库进行工作的最简单方法是使用我维护的sqlalchemy-access方言。
有没有人知道在Pandas中读取Access表格数据的更简单的方法?
只需使用pandas的read_sql_table方法即可。
import pandas as pd
import sqlalchemy as sa

table_name = 'DB_IMPORT_2020_PM'

engine = sa.create_engine("access+pyodbc://@my_accdb_dsn")
df = pd.read_sql_table(table_name, engine)

3
根据《如何避免成为垃圾信息发布者》(https://stackoverflow.com/help/promotion),您可能需要在此答案中披露所属关系并添加一些更多的内容(例如如何使用它的快速示例)。 尽管答案已经很好了。 - Erik A

0

你的数据是一个由tuples组成的list,在创建数据框时需要按照这里所述添加列:

df = pd.DataFrame(data,columns = ["col1","col2",...,"coln"])

我遇到了这个错误 ValueError: Shape of passed values is (1914390, 1), indices imply (1914390, 33)。我有33列,并添加了df = pd.DataFrame(data,columns = ["col1","col2","col3","col4","col5","col6","col7","col8","col9","col10","col11","col12","col13","col14","col15","col16","col17","col18","col19","col20","col21","col22","col23","col24","col25","col26","col27","col28","col29","col30","col31","col32","col33"])。 - coelidonum
如果我更改为df = pd.DataFrame([data],columns = ["col1","col2","col3","col4","col5","col6","col7","col8","col9","col10","col11","col12","col13","col14","col15","col16","col17","col18","col19","col20","col21","col22","col23","col24","col25","col26","col27","col28","col29","col30","col31","col32","col33"]),我会得到以下错误:ValueError: 传递了33列,但数据中有1914390列。 - coelidonum
实际上,我已经尝试过 pd.DataFrame(data) 并且它可以使用你提供的示例数据。你用 df = pd.DataFrame(data[0]) 时遇到了什么错误?你有任何包含 '" 和逗号之类的字段吗?否则我真的不知道发生了什么。 - Let's try

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接