我正在尝试将上传到Django的.csv文件读入DataFrame。
我遵循Django REST Framework页面上上传文件的说明。当我PUT
一个.csv文件到定义的端点时,我得到了一个Django UploadedFile对象,特别是一个TemporaryUploadedFile
。
我正在尝试使用read_csv
将此对象读入pandas Dataframe,但是在临时上传的文件周围有附加格式。我想知道如何读取已上传的原始.csv文件。
根据DRF文档,我已经分配:
file_obj = request.data['file']
在Python调试控制台中,我看到:
ipdb> file_obj
<TemporaryUploadedFile: foobar.csv (multipart/form-data; boundary=--------------------------044608164241682586561733)>
我尝试过的事情。
使用原始文件路径,我可以像这样将其读入pandas。
dataframe = pd.read_csv(open("foobar.csv", "rb"))
然而,在上传过程中,Django添加了额外的元数据到原始文件中。
ipdb> pd.read_csv(open(file_obj.temporary_file_path(), "rb"))
*** pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 32
如果我尝试使用
UploadedFile.read()
方法,我会遇到以下问题。ipdb> dataframe = pd.read_csv(file_obj.read())
*** OSError: Expected file path name or file-like object, got <class 'bytes'> type
感谢您! 附注:原始文件的前几行如下。
SPID,SA_ID,UOM,DIR,DATE,RS,NAICS,APCT,1:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00,10:00,11:00,12:00,13:00,14:00,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00,0:00:00
(Blanked),123456789,KWH,R,5/2/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0.144,1.064,3.07,4.531,4.013,5.205,4.751,4.647,3.142,2.464,1.173,0.023,0,0,0,0,0
(Blanked),123456789,KWH,R,3/10/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0,0.007,0.622,0.179,0.003,0.274,0.167,0.014,0.004,0.028,0.139,0,0,0,0,0,0
当我查看临时文件的内容时,我看到了这个。
----------------------------789873173211443224653494
Content-Disposition: form-data; name="file"; filename="foobar.csv"
Content-Type: File
SPID,SA_ID,UOM,DIR,DATE,RS,NAICS,APCT,1:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00,10:00,11:00,12:00,13:00,14:00,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00,0:00:00
(Blanked),123456789,KWH,R,5/2/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0.144,1.064,3.07,4.531,4.013,5.205,4.751,4.647,3.142,2.464,1.173,0.023,0,0,0,0,0
(Blanked),123456789,KWH,R,3/10/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0,0.007,0.622,0.179,0.003,0.274,0.167,0.014,0.004,0.028,0.139,0,0,0,0,0,0
open()
,只需使用pd.read_csv(args)
即可。 - Umar.H