将多个值读入pandas DataFrame

3

我有一个文本文件,其中的数据格式如下:

{"date":"Jan  6"; "time":"07:00:01"; "ip":"178.41.163.99"; "user":"null"; "country":"Slovakia"; "city":"Miloslavov"; "lat":48.1059; "lon":17.3}
{"date":"Jan  6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"postgres"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}
{"date":"Jan  6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"null"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}

我需要将它读入pandas DataFrame中,使用键作为列名,值作为项。这是我用来读取数据的代码:
columns = ['date', 'time', 'ip', 'user', 'country', 'city', 'lat', 'lon']
df = pd.read_csv("log.txt", sep=';', header=None, names=columns)

有点沮丧,因为我所做的只有这个:
               date                time  ...             lat              lon
0  {"date":"Jan  6"   "time":"07:00:01"  ...   "lat":48.1059      "lon":17.3}
1  {"date":"Jan  6"   "time":"07:05:26"  ...   "lat":57.7072   "lon":11.9668}
2  {"date":"Jan  6"   "time":"07:05:26"  ...   "lat":57.7072   "lon":11.9668}

我已经从头到尾阅读了文档,但仍然无法达到所需的结果,就像下面这样:

     date       time  ...       lat       lon
0  Jan  6   07:00:01  ...   48.1059      17.3
1  Jan  6   07:05:26  ...   57.7072   11.9668
2  Jan  6   07:05:26  ...   57.7072   11.9668

这是否有可能呢?任何建议都将不胜感激。谢谢。

1个回答

2

如果像看起来的那样,您的字符串值中没有任何 ;,您可以使用字符串替换将其转换为有效的(按行分隔)json:

最初的回答

In [11]: text
Out[11]: '{"date":"Jan  6"; "time":"07:00:01"; "ip":"178.41.163.99"; "user":"null"; "country":"Slovakia"; "city":"Miloslavov"; "lat":48.1059; "lon":17.3}\n{"date":"Jan  6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"postgres"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}\n{"date":"Jan  6"; "time":"07:05:26"; "ip":"37.123.163.124"; "user":"null"; "country":"Sweden"; "city":"Gothenburg"; "lat":57.7072; "lon":11.9668}'

In [12]: pd.read_json(text.replace(";", ","), lines=True)
Out[12]:
         city   country    date              ip      lat      lon      time      user
0  Miloslavov  Slovakia  Jan  6   178.41.163.99  48.1059  17.3000  07:00:01      null
1  Gothenburg    Sweden  Jan  6  37.123.163.124  57.7072  11.9668  07:05:26  postgres
2  Gothenburg    Sweden  Jan  6  37.123.163.124  57.7072  11.9668  07:05:26      null

1
值得注意的是,您应该向交付数据的人投诉,并要求他们确保它是有效的JSON。这种分号分隔的准JSON格式很奇怪,他们不应该生成它。 - Andy Hayden
1
哇!我之前尝试过read_json,但是因为没有使用lines=True,所以它对我没起作用。现在它像魔法一样奏效了,因为我自己生成源文件并将;改为, :) 问题是一些城市名字包含,,而read_json无法正确解析它们...不管怎样,非常感谢! - Denis Rasulev

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接