有没有一种方法可以将pandas数据框存储到Teradata表中?

3
我是一名能翻译文本的助手。
我已经创建了一个名为“df”的Pandas数据框,正在尝试使用Teradata-SQL助手将其存储在“table”中。
连接字符串 -
conn = pyodbc.connect(
         "DRIVER=Teradata;DBCNAME=tdprod;Authentication=LDAP;UID=" + username + ";PWD=" + password + ";QUIETMODE=YES",
        autocommit=True, unicode_results=True)

cursor = conn.cursor().execute(sql)

尝试使用:df.to_sql('table', con=conn)

这不起作用。

有没有更简单的方法将数据框存储到表中。

任何帮助都将不胜感激。

谢谢。

Traceback (most recent call last):

 File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\sqlalchemy\engine\base.py", line 2158, in _wrap_pool_connect
return fn()
 File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\sqlalchemy\pool.py", line 410, in connect
return _ConnectionFairy._checkout(self, self._threadconns)
 File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\sqlalchemy\pool.py", line 788, in _checkout
fairy = _ConnectionRecord.checkout(pool)
  File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\sqlalchemy\pool.py", line 529, in checkout
rec = pool._do_get()
  File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\sqlalchemy\pool.py", line 1096, in _do_get
c = self._create_connection()
  File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\sqlalchemy\pool.py", line 347, in _create_connection
return _ConnectionRecord(self)
  File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\sqlalchemy\pool.py", line 474, in __init__
self.__connect(first_connect_check=True)
 File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\sqlalchemy\pool.py", line 671, in __connect
connection = pool._invoke_creator(self)
 File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\sqlalchemy\engine\strategies.py", line 106, in connect
 return dialect.connect(*cargs, **cparams)
 File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\sqlalchemy\engine\default.py", line 412, in connect
return self.dbapi.connect(*cargs, **cparams)
  File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\teradata\tdodbc.py", line 454, in __init__
checkStatus(rc, hDbc=self.hDbc, method="SQLDriverConnectW")
 File "C:\Users\tripata\PycharmProjects\NLP\venv\lib\site-packages\teradata\tdodbc.py", line 231, in checkStatus
raise DatabaseError(i[2], u"[{}] {}".format(i[0], msg), i[0])
teradata.api.DatabaseError: (8017, '[28000] [Teradata][ODBC Teradata Driver][Teradata Database] The UserId, Password or Account is invalid. , [Teradata][ODBC Teradata Driver][Teradata Database] The UserId, Password or Account is invalid. ')
3个回答

2

to_sql的文档中:

Parameters
----------
name : string
    Name of SQL table.
con : sqlalchemy.engine.Engine or sqlite3.Connection
    Using SQLAlchemy makes it possible to use any DB supported by that
    library. Legacy support is provided for sqlite3.Connection objects.

你需要使用sqlalchemy或sqlite3,但不需要pyodbc。

以下是创建Teradata引擎所需的内容:

from sqlalchemy import create_engine

engine = create_engine(f'teradata://{username}:{password}@tdprod:22/')

然后你可以像这样使用它

df.to_sql('table', engine)

解决方案似乎无法运行。它抛出一个错误消息 - sqlalchemy.exc.DatabaseError:(teradata.api.DatabaseError)。由于某种原因,它说用户ID、密码或帐户无效。我交叉检查了我的帐户,它可以正常工作。 - Analyst17
你能否更新问题,提供完整的堆栈跟踪信息?仅有错误名称很难进行调试。 - chthonicdaemon
似乎用户名或密码错误。只需尝试修改 SQLAlchemy 连接字符串,直到它能够正常工作。 - chthonicdaemon
所以,我玩了一下,这似乎是有效的- create_engine('teradata://' +user+':'+ password + '@'+host+':1025/'+'/'+'?authentication=LDAP')。我猜我们需要在连接字符串中同时包含主机名和身份验证。 - Analyst17
1
如何使用df.to_sql将datalab和表名添加在一起? - Jonathan Lam

1

我查了一些资料,这个解决方案可以快速地完成任务 - 使用 Python teradata 模块:

import teradata
import numpy as np
import pandas as pd


num_of_chunks = 100  #breaking the data into chunks is optional - use if you have many rows or would like to view status updates

query = 'insert into SomeDB.SomeTeraDataTable'
df = someDataframe

#set host, user, password params
host,username,password = 'hostName_or_IPaddress','username', 'password'

#connet to DB using UdaExec
udaExec = teradata.UdaExec (appName="IMC", version="1.0", logConsole=False)


with udaExec.connect(method="odbc",system=host, username=username,
                            password=password, driver="Teradata") as connect:


    df_chunks = np.array_split(df, num_of_chunks)

    for i,_ in enumerate(df_chunks):

        data = [tuple(x) for x in df_chunks[i].to_records(index=False)]

        connect.executemany(query, data,batch=True)

基于以下Stack Overflow文章的解决方案


0

create_engine('teradata://' +user+':'+ password + '@'+host+':1025/'+'/'+'?authentication=LDAP') 将主机名和身份验证都添加到连接字符串中对我很有帮助。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接