将 JSON 列映射到 Pandas 数据帧列

5
我有以下代码,它可以将JSON输入转换为Pandas dataframe。但是,由于JSON没有一致的模式,所以所有内容都不对齐。(如果一个字段在一个条目中不存在,它会使所有其他字段向左移动)是否有办法明确地定义下面的内容?请注意保留HTML标记。
df.field1 = json.field1

如果我可以通过它们的名称进行定义,那么我可以很好地做到这一点:)
谢谢
output = subprocess.check_output(command, shell=True)

# output of subprocess will be bytes, converting to string.
if isinstance(output, bytes):
    output = output.decode()

output = json.loads(output)
df = pd.DataFrame(output['apps']['app'])
df = df.loc[df['startedTime'] > starttime]
df.to_csv('yarn_output.csv')

示例输入 JSON

{"apps":{"app":[{"id":"application_1589431105417_21534","user":"udsldr","name":"HIVE-61a4ee14-1d26-4c7b-bf0d-1cc2a990557d","queue":"udsldr","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://uds-far-mn4.dab.02.net:8088/proxy/application_1589431105417_21534/","diagnostics":"Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0\n","clusterId":1589431105417,"applicationType":"TEZ","applicationTags":"","priority":0,"startedTime":1590294649069,"finishedTime":1590294666011,"elapsedTime":16942,"amContainerLogs":"http://uds-far-dn150.dab.02.net:8042/node/containerlogs/container_e66_1589431105417_21534_01_000001/udsldr","amHostHttpAddress":"uds-far-dn150.dab.02.net:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":144531,"vcoreSeconds":17,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"logAggregationStatus":"SUCCEEDED","unmanagedApplication":false,"amNodeLabelExpression":""},{"id":"application_1589431105417_21535","user":"nifildr","name":"HIVE-850812d7-9d22-4be8-a225-7b341f6ea980","queue":"default","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://uds-far-mn4.dab.02.net:8088/proxy/application_1589431105417_21535/","diagnostics":"Session stats:submittedDAGs=0, successfulDAGs=1, failedDAGs=0, killedDAGs=0\n","clusterId":1589431105417,"applicationType":"TEZ","applicationTags":"","priority":0,"startedTime":1590294664397,"finishedTime":1590294801090,"elapsedTime":136693,"amContainerLogs":"http://uds-far-dn129.dab.02.net:8042/node/containerlogs/container_e66_1589431105417_21535_01_000001/nifildr","amHostHttpAddress":"uds-far-dn129.dab.02.net:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":18279340,"vcoreSeconds":4248,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"logAggregationStatus":"TIME_OUT","unmanagedApplication":false,"amNodeLabelExpression":""},{"id":"application_1589431105417_21532","user":"udsldr","name":"HIVE-73e0c359-32a5-4334-89da-4a8ae2bb1037","queue":"udsldr","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://uds-far-mn4.dab.02.net:8088/proxy/application_1589431105417_21532/","diagnostics":"Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0\n","clusterId":1589431105417,"applicationType":"TEZ","applicationTags":"","priority":0,"startedTime":1590294622244,"finishedTime":1590294643808,"elapsedTime":21564,"amContainerLogs":"http://uds-far-dn35.dab.02.net:8042/node/containerlogs/container_e66_1589431105417_21532_01_000001/udsldr","amHostHttpAddress":"uds-far-dn35.dab.02.net:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":182247,"vcoreSeconds":22,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"logAggregationStatus":"SUCCEEDED","unmanagedApplication":false,"amNodeLabelExpression":""},{"id":"application_1589431105417_21533","user":"udssupport","name":"tcs.uds.webstats","queue":"udssystem","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://uds-far-mn4.dab.02.net:8088/proxy/application_1589431105417_21533/","diagnostics":"","clusterId":1589431105417,"applicationType":"SPARK","applicationTags":"","priority":0,"startedTime":1590294631138,"finishedTime":1590295670552,"elapsedTime":1039414,"amContainerLogs":"http://uds-far-dn148.dab.02.net:8042/node/containerlogs/container_e66_1589431105417_21533_01_000001/udssupport","amHostHttpAddress":"uds-far-dn148.dab.02.net:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":4762538052,"vcoreSeconds":775756,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"logAggregationStatus":"TIME_OUT","unmanagedApplication":false,"amNodeLabelExpression":""},{"id":"application_1589431105417_21530","user":"nifildr","name":"HIVE-e9a64e12-11f0-4ba8-b069-3be0ce561137","queue":"default","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://uds-far-mn4.dab.02.net:8088/proxy/application_1589431105417_21530/","diagnostics":"Session stats:submittedDAGs=0, successfulDAGs=3, failedDAGs=0, killedDAGs=0\n","clusterId":1589431105417,"applicationType":"TEZ","applicationTags":"","priority":0,"startedTime":1590294606965,"finishedTime":1590295033193,"elapsedTime":426228,"amContainerLogs":"http://uds-far-dn75.dab.02.net:8042/node/containerlogs/container_e66_1589431105417_21530_01_000001/nifildr","amHostHttpAddress":"uds-far-dn75.dab.02.net:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":114397555,"vcoreSeconds":27175,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"logAggregationStatus":"TIME_OUT","unmanagedApplication":false,"amNodeLabelExpression":""},{"id":"application_1589431105417_21531","user":"nifi","name":"HIVE-a063ddd1-5bf8-47b4-8ce3-8497c93b79a5","queue":"default","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://uds-far-mn4.dab.02.net:8088/proxy/application_1589431105417_21531/","diagnostics":"Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0\n","clusterId":1589431105417,"applicationType":"TEZ","applicationTags":"","priority":0,"startedTime":1590294613578,"finishedTime":1590294655173,"elapsedTime":41595,"amContainerLogs":"http://uds-far-dn56.dab.02.net:8042/node/containerlogs/container_e66_1589431105417_21531_01_000001/nifi","amHostHttpAddress":"uds-far-dn56.dab.02.net:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":345792,"vcoreSeconds":42,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"logAggregationStatus":"SUCCEEDED","unmanagedApplication":false,"amNodeLabelExpression":""},{"id":"application_1589431105417_21528","user":"udsldr","name":"com.cardinality.LocationDB","queue":"udsldr","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://uds-far-mn4.dab.02.net:8088/proxy/application_1589431105417_21528/","diagnostics":"","clusterId":1589431105417,"applicationType":"SPARK","applicationTags":"5ec9f8480000f1697e683969","priority":0,"startedTime":1590294605875,"finishedTime":1590294782281,"elapsedTime":176406,"amContainerLogs":"http://uds-far-dn167.dab.02.net:8042/node/containerlogs/container_e66_1589431105417_21528_01_000001/udsldr","amHostHttpAddress":"uds-far-dn167.dab.02.net:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":43389139,"vcoreSeconds":5239,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"logAggregationStatus":"TIME_OUT","unmanagedApplication":false,"amNodeLabelExpression":""},{"id":"application_1589431105417_21529","user":"keenek1","name":"Clean DPI Report","queue":"default","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://uds-far-mn4.dab.02.net:8088/proxy/application_1589431105417_21529/","diagnostics":"","clusterId":1589431105417,"applicationType":"SPARK","applicationTags":"","priority":0,"startedTime":1590294607111,"finishedTime":1590295032105,"elapsedTime":424994,"amContainerLogs":"http://uds-far-dn62.dab.02.net:8042/node/containerlogs/container_e66_1589431105417_21529_01_000001/keenek1","amHostHttpAddress":"uds-far-dn62.dab.02.net:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":2114077299,"vcoreSeconds":344079,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"logAggregationStatus":"TIME_OUT","unmanagedApplication":false,"amNodeLabelExpression":""},{"id":"application_1589431105417_21542","user":"murugaa1","name":"HIVE-a1a5aadb-254c-4289-ad22-e9c7ce5e9814","queue":"default","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://uds-far-mn4.dab.02.net:8088/proxy/application_1589431105417_21542/","diagnostics":"Session stats:submittedDAGs=0, successfulDAGs=1, failedDAGs=0, killedDAGs=0\n","clusterId":1589431105417,"applicationType":"TEZ","applicationTags":"","priority":0,"startedTime":1590295275713,"finishedTime":1590295297948,"elapsedTime":22235,"amContainerLogs":"http://uds-far-dn46.dab.02.net:8042/node/containerlogs/container_e66_1589431105417_21542_01_000001/murugaa1","amHostHttpAddress":"uds-far-dn46.dab.02.net:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":999465,"vcoreSeconds":217,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"logAggregationStatus":"SUCCEEDED","unmanagedApplication":false,"amNodeLabelExpression":""},{"id":"application_1589431105417_21543","user":"murugaa1","name":"HIVE-cdc8a5da-f880-4f8e-9baf-b306095b9efb","queue":"default","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://uds-far-mn4.dab.02.net:8088/proxy/application_1589431105417_21543/","diagnostics":"Session stats:submittedDAGs=0, successfulDAGs=1, failedDAGs=0, killedDAGs=0\n","clusterId":1589431105417,"applicationType":"TEZ","applicationTags":"","priority":0,"startedTime":1590295277611,"finishedTime":1590295301515,"elapsedTime":23904,"amContainerLogs":"http://uds-far-dn41.dab.02.net:8042/node/containerlogs/container_e66_1589431105417_21543_01_000001/murugaa1","amHostHttpAddress":"uds-far-dn41.dab.02.net:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":1077860,"vcoreSeconds":228,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"logAggregationStatus":"SUCCEEDED","unmanagedApplication":false,"amNodeLabelExpression":""}]}}

CSV 输出:

allocatedMB|applicationType|diagnostics|finalStatus|finishedTime|memorySeconds|queue|startedTime|user|vcoreSeconds
-1|TEZ|"Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0
"|SUCCEEDED|1590294666011|144531|udsldr|1590294649069|udsldr|17
-1|TEZ|"Session stats:submittedDAGs=0, successfulDAGs=1, failedDAGs=0, killedDAGs=0
"|SUCCEEDED|1590294801090|18279340|default|1590294664397|nifildr|4248
-1|TEZ|"Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, killedDAGs=0

请您提供一下output的样例以及期望的结果? - Quang Hoang
谢谢!我已经更新了。期望的结果只是一个表格,其中包含JSON中的所有字段。如果某个条目的字段未填充,则可以为空。 - kikee1222
你只是想要重命名最终数据框中的列吗? - NYC Coder
如果JSON条目不包含例如FinalStatus字段,则之后的所有字段当前都未对齐。我需要它们仍然在正确的列中,并且不受灵活的JSON模式的影响。 - kikee1222
你能解释一下什么是未对齐吗?如果finalStatus不存在,它将为NaN。不确定我是否理解了你的问题。 - NYC Coder
显示剩余4条评论
1个回答

2

尝试在转换为csv时使用不同的分隔符,我认为错位是由于逗号造成的:

df.to_csv('123.csv', sep='|', index=['id'])

谢谢你的帮助!我刚刚添加了上面的内容。您可以在应用程序类型中看到,如果没有分配的MB,我们会成功而不是使用TEZ、Spark或Empty。 - kikee1222
啊,不用理我,我看到它换行了 :) - kikee1222
当我将它转储到数据库表中时,它全部错位了,所以看起来就像在CSV中进行了换行,这会导致数据库表出现错误。因为CSV的第二行显示为DB表的第二行,但实际上应该是第一行的延续。 - kikee1222
好的,只需在生成的 CSV 上使用 read_csv() 确保正确创建了数据框。 - NYC Coder
感谢所有的帮助。我仍然不知道为什么它没有正确地显示在数据库表中,但这似乎是明天要解决的问题 :) - kikee1222
显示剩余3条评论

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接