将.DAT文件导入pandas数据框架

3

我有一个包含以下行的.DAT文件:

2016 01 01 00 00 19 348 2.05 7 618.4
2016 01 01 00 01 19 351 2.05 7 618.4
2016 01 01 00 02 18 0 2.05 7 618.4
2016 01 01 00 03 17 353 2.05 7 618.4
2016 01 01 00 04 19 346 2.02 7 618.4
2016 01 01 00 05 20 345 2.00 7 618.4
2016 01 01 00 06 22 348 1.97 7 618.4
.......

数据格式为:
year month day hour minute(HST) wind_speed(kts) wind_direction(dec) temperature(C) relative_humidity(%) pressure

我希望将.DAT文件导入到pandas数据框中,以年-月-日-时-分作为单个索引列,并将其余的值作为单独的列。 有什么建议吗?
谢谢!!

2个回答

3
您可以使用 read_csv 函数:
import pandas as pd
import numpy as np
from pandas.compat import StringIO
import datetime as dt

temp=u"""2016 01 01 00 00 19 348 2.05 7 618.4
2016 01 01 00 01 19 351 2.05 7 618.4
2016 01 01 00 02 18 0 2.05 7 618.4
2016 01 01 00 03 17 353 2.05 7 618.4
2016 01 01 00 04 19 346 2.02 7 618.4
2016 01 01 00 05 20 345 2.00 7 618.4
2016 01 01 00 06 22 348 1.97 7 618.4"""
#after testing replace StringIO(temp) to filename

parser = lambda date: pd.datetime.strptime(date, '%Y %m %d %H %M')
df = pd.read_csv(StringIO(temp), 
                 sep="\s+", #separator whitespace
                 index_col=0, #convert first column to datetimeindex
                 date_parser=parser, #function for converting dates
                 parse_dates=[[0,1,2,3,4]], #columns to datetime
                 header=None) #none header

需要设置列名,因为如果使用参数names,会出现以下错误:

NotImplementedError: file structure not yet supported

df.columns = ['wind_speed(kts)', 'wind_direction(dec)', 'temperature(C)', 'relative_humidity(%)', 'pressure'] 
#remove index name
df.index.name = None 

print (df)
                     wind_speed(kts)  wind_direction(dec)  temperature(C)  \
2016-01-01 00:00:00               19                  348            2.05   
2016-01-01 00:01:00               19                  351            2.05   
2016-01-01 00:02:00               18                    0            2.05   
2016-01-01 00:03:00               17                  353            2.05   
2016-01-01 00:04:00               19                  346            2.02   
2016-01-01 00:05:00               20                  345            2.00   
2016-01-01 00:06:00               22                  348            1.97   

                     relative_humidity(%)  pressure  
2016-01-01 00:00:00                     7     618.4  
2016-01-01 00:01:00                     7     618.4  
2016-01-01 00:02:00                     7     618.4  
2016-01-01 00:03:00                     7     618.4  
2016-01-01 00:04:00                     7     618.4  
2016-01-01 00:05:00                     7     618.4  
2016-01-01 00:06:00                     7     618.4  

print (df.dtypes)
wind_speed(kts)           int64
wind_direction(dec)       int64
temperature(C)          float64
relative_humidity(%)      int64
pressure                float64
dtype: object

print (df.index)
DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:01:00',
               '2016-01-01 00:02:00', '2016-01-01 00:03:00',
               '2016-01-01 00:04:00', '2016-01-01 00:05:00',
               '2016-01-01 00:06:00'],
              dtype='datetime64[ns]', freq=None)

1
这是一个稍微快一点的版本:

In [86]: df = (pd.read_csv(fn, sep='\s+', header=None,
    ...:                   parse_dates={'Date':[0,1,2,3,4]},
    ...:                   date_parser=lambda x: pd.to_datetime(x, format='%Y %m %d %H %M'))
    ...:         .set_index('Date'))
    ...:

In [87]: df
Out[87]:
                      5    6     7  8      9
Date
2016-01-01 00:00:00  19  348  2.05  7  618.4
2016-01-01 00:01:00  19  351  2.05  7  618.4
2016-01-01 00:02:00  18    0  2.05  7  618.4
2016-01-01 00:03:00  17  353  2.05  7  618.4
2016-01-01 00:04:00  19  346  2.02  7  618.4
2016-01-01 00:05:00  20  345  2.00  7  618.4
2016-01-01 00:06:00  22  348  1.97  7  618.4

In [88]: cols_str = 'wind_speed(kts) wind_direction(dec) temperature(C) relative_humidity(%) pressure'
    ...: cols = cols_str.split()
    ...:

In [89]: cols
Out[89]:
['wind_speed(kts)',
 'wind_direction(dec)',
 'temperature(C)',
 'relative_humidity(%)',
 'pressure']

In [90]: df.columns = cols

In [91]: df
Out[91]:
                     wind_speed(kts)  wind_direction(dec)  temperature(C)  relative_humidity(%)  pressure
Date
2016-01-01 00:00:00               19                  348            2.05                     7     618.4
2016-01-01 00:01:00               19                  351            2.05                     7     618.4
2016-01-01 00:02:00               18                    0            2.05                     7     618.4
2016-01-01 00:03:00               17                  353            2.05                     7     618.4
2016-01-01 00:04:00               19                  346            2.02                     7     618.4
2016-01-01 00:05:00               20                  345            2.00                     7     618.4
2016-01-01 00:06:00               22                  348            1.97                     7     618.4

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接