我有像这样的管道分隔值:
https|clients4.google.com|application/octet-stream|2296|
https|clients4.google.com|text/html; charset=utf-8|0|
....
....
https|clients4.google.com|application/octet-stream|2291|
我需要基于这些数据创建一个 Pandas DataFrame,并为每一列命名。
我有像这样的管道分隔值:
https|clients4.google.com|application/octet-stream|2296|
https|clients4.google.com|text/html; charset=utf-8|0|
....
....
https|clients4.google.com|application/octet-stream|2291|
给你:
>>> import pandas as pd
>>> pd.read_csv('data.csv', sep='|', index_col=False,
names=['protocol', 'server', 'type', 'value'])
Out[7]:
protocol server type value
0 https clients4.google.com application/octet-stream 2296
1 https clients4.google.com text/html; charset=utf-8 0
2 https clients4.google.com application/octet-stream 2291
StringIO
将其转换为类似文件的对象,然后可以将其读取为CSV格式。另外,由于数据似乎没有标题,可以传递header=None
,这样Pandas就不会将数据的第一行作为标题读入。还可以使用现成的方法向列名添加前缀(add_prefix()
),使列标签更符合"标签"的风格。data = """
https|clients4.google.com|application/octet-stream|2296|
https|clients4.google.com|text/html; charset=utf-8|0|
https|clients4.google.com|application/octet-stream|2291|
"""
from io import StringIO
sio = StringIO(data)
df = pd.read_csv(sio, sep='|', header=None).add_prefix('col_').dropna(how='all', axis=1)