将带有时间的Pandas数据框附加到SQLite3数据库并还原

Question

将带有时间的Pandas数据框附加到SQLite3数据库并还原

5

我正在尝试这个操作：

import pandas as pd
import sqlite3
import datetime, pytz

#nowtime=datetime.datetime.now(pytz.utc)
nowtime=datetime.datetime.now()

print(nowtime)
df = pd.DataFrame(columns=list('ABCD'))
df.loc[0]=(3,0.141,"five-nine",nowtime)
df.loc[1]=(1,0.41,"four-two",nowtime)

print(df)

db = sqlite3.connect(':memory:')
c = db.cursor()
c.execute('create table if not exists ABCD ( A integer, B real, C text, D timestamp );')
c.execute('insert into ABCD (A,B,C, D) values (?,?,?,?);',(1,2.2,'4',nowtime))
c.executemany('insert into ABCD (A,B,C, D) values (?,?,?,?);',df.to_records(index=False))

db.commit()

print(pd.read_sql('select * from ABCD;',db))

并获得以下结果：

 2018-03-07 19:09:58.584953
   A      B          C                          D
0  3  0.141  five-nine 2018-03-07 19:09:58.584953
1  1  0.410   four-two 2018-03-07 19:09:58.584953
   A      B          C                           D
0  1  2.200          4  2018-03-07 19:09:58.584953
1  3  0.141  five-nine    b'\xa8hx?\t\xb9\x19\x15'
2  1  0.410   four-two    b'\xa8hx?\t\xb9\x19\x15'

理想情况下，我希望能够将一些带有时间戳的数据推送到sqlite3中，并能够在pandas/python/numpy之间进行恢复。

我看到了一个关于添加数据的问题：Appending Pandas dataframe to sqlite table by primary key，但是我不确定如何处理datetime.datetime、pandas Timestamps或numpy.datetime64类型的数据。

另外，还有一个问题是如何在pandas中实现How to read datetime back from sqlite as a datetime instead of string in Python?。

我花了很多时间研究了这个问题：https://dev59.com/I2Yr5IYBdhLWcg3waJdT#21916253，其中的to_datetime()函数让我感到困惑。

那么，有没有一种好的方法来处理时间、sqlite3和pandas呢？

我尝试了以下更改：

db = sqlite3.connect(':memory:',detect_types=sqlite3.PARSE_DECLTYPES)

#...
for index,row in df.iterrows():
    print(row)
    c.execute('insert into ABCD (A,B,C,D) values (?,?,?,?);',(row.A,row.B,row.C,row.D.to_pydatetime()))


x = pd.read_sql('select *  from ABCD;',db)

print('Type of a pd.read_sql(SQLite3) timestamp  : ',type(x['D'][0]))

x = c.execute('select * from ABCD').fetchall()

print(x)
print('Type of a sqlite.execute(SQLite3) timestamp  : ',type(x[0][3]))

使用SQLite3数据类型并测试返回值：

Type of a pd.read_sql(SQLite3) timestamp  :  <class 'pandas._libs.tslib.Timestamp'>
[(1, 2.2, '4', datetime.datetime(2018, 3, 8, 14, 46, 2, 520333)), (3, 141.0, 'five-nine', datetime.datetime(2018, 3, 8, 14, 46, 2, 520333)), (1, 41.0, 'four-two', datetime.datetime(2018, 3, 8, 14, 46, 2, 520333))]
Type of a sqlite.execute(SQLite3) timestamp  :  <class 'datetime.datetime'>

此外，我尝试使用datetime.datetime.now(pytz.utc)获取UTC-aware时间，但它破坏了很多东西。使用datetime.datetime.utcnow()效果更好，因为它返回一个不受时区影响的非时区感知对象。

还请注意Python sqlite3文档中关于sqlite3.connect(detect_types=...)参数的说明。启用detect_types=PARSE_DECLTYPES|PARSE_COLNAMES参数可以让Python在系统之间传递数据时运行转换器。

https://docs.python.org/3/library/sqlite3.html#sqlite3.PARSE_DECLTYPES用于create table ... xyzzy timestamp, ...转换
https://docs.python.org/3/library/sqlite3.html#sqlite3.PARSE_COLNAMES用于select ... date as "dateparsed [datetime]"...转换

- Dave X

2个回答

3

问题源于pandas的to_records()方法，它将您的日期时间字段转换为带有T分隔符的ISO时间戳：

print(df.to_records(index=False))
# [(3, 0.141, 'five-nine', '2018-03-07T20:40:39.808427000')
#  (1, 0.41 , 'four-two', '2018-03-07T20:40:39.808427000')]

考虑将日期时间列转换为字符串，然后运行游标executemany()：

df.D = df.D.astype('str')

print(df.to_records(index=False))
# [(3, 0.141, 'five-nine', '2018-03-07 20:40:39.808427')
#  (1, 0.41 , 'four-two', '2018-03-07 20:40:39.808427')]

总之：

db = sqlite3.connect(':memory:')
c = db.cursor()
c.execute('create table if not exists ABCD ( A integer, B real, C text, D timestamp );')
c.execute('insert into ABCD (A,B,C, D) values (?,?,?,?);',(1,2.2,'4',nowtime))

df['D'] = df['D'].astype('str')
c.executemany('insert into ABCD (A,B,C, D) values (?,?,?,?);',df.to_records(index=False))

db.commit()
print(pd.read_sql('select * from ABCD;',db))

#    A      B          C                           D
# 0  1  2.200          4  2018-03-07 20:47:15.031130
# 1  3  0.141  five-nine  2018-03-07 20:47:15.031130
# 2  1  0.410   four-two  2018-03-07 20:47:15.031130

- Parfait

1

你在 read_sql 之后检查了 dtypes 吗？它应该是从 SQLite 类型派生而来的。将其转换为字符串只是为了迁移到 SQLite。否则不要使用 to_records()，请查看 iterrows。尝试使用带有 SQLAlchemy 连接的 to_sql 方法。 - Parfait

我发现返回的数据类型都是 str，除非我在 db.connect() 中添加一个 detect_types=sqlite3.PARSE_DECLTYPES 参数。有了这个参数，c.execute(...).fetch... 返回 datetime.datetime，而 pd.read_sql 返回 pandas._libs.tslib.Timestamp。除非设置了该参数，否则 SQLite 类型似乎完全被忽略。 - Dave X

将其保存到文件中，并使用“SQLite DB浏览器”检查文件，显示在我尝试的所有情况下，SQLite将日期保存为字符字符串。 - Dave X

1

作为一个轻量级的文件级数据库，SQLite只有很少的数据类型：TEXT、NUMERIC、INTEGER、REAL、BLOB。它没有timestamp类型，可能会被保存到最接近文本的亲和类中。 - Parfait

1

@DaveX，它不仅在内部存储为“TEXT”，SQLite没有其他类型的概念。当您从具有这些类型的Pandas返回PARSE_DECLTYPES时，您会看到类似于datetime.datetime的东西。正如您所发现的那样，它们将始终以字符串形式存储在SQLite中。 SQLite方面没有解决方案，您的问题存在于将SQLite读回Pandas时，因此数据类型转换需要在Pandas方面进行。 - Davos

显示剩余2条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Davos · Accepted Answer

主要问题是SQLite没有datetime数据类型。

PARSE_DECLTYPES 无法帮助读取SQLite，因为在SQLite中列的声明数据类型永远不会是datetime。

由于您控制Pandas dataframe，因此您知道在将它们保存回SQLite时的类型。

您正在使用的read_sql方法...

是read_sql_table和read_sql_query（向后兼容）的便捷包装器，并根据提供的输入（数据库表名或SQL查询）委派到特定的函数。

在您提供的示例中，您提供了一个查询，因此它正在委派给read_sql_query方法https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_query.html#pandas.read_sql_query

这有一个参数parse_dates，可以是：

字典形式的 {列名: 参数字典}，其中参数字典对应于 pandas.to_datetime() 的关键字参数。 在没有本地Datetime支持的数据库中特别有用，例如SQLite。

由于您预先知道哪些列是数据类型，因此可以将其存储为与此 parse_dates 期望的结构相匹配的字典，并将其传递到 read_sql 方法中。

在其他情况下，当我将pandas df保存回csv或其他文件时，我使用类似这样的方法来保存模式，以便在将csv加载回pandas时重新引入。 read_csv 方法具有一个 dbtypes 参数，该参数正好采用以下结构。

def getPandasSchema(df):
    ''' 
    takes a pandas dataframe and returns the dtype dictionary
    useful for applying types when reloading that dataframe from csv etc
    '''
    return dict(zip(df.columns.tolist(),df.dtypes.tolist()))