如何使用jq合并JSON文件？

Question

如何使用jq合并JSON文件？

jsonjq

3

我将使用jq工具（jq-json-processor）在shell脚本中解析json。

我有两个json文件，想要将它们合并成一个唯一的文件。

以下是文件内容：

file1:

{"tag_id" : ["t1"], "inst_id" : "s1"}
{"tag_id" : ["t1"], "inst_id" : "s2"}

文件2：

{"tag_id" : ["t2"], "inst_id" : "s1"}
{"tag_id" : ["t2"], "inst_id" : "s2"}
{"tag_id" : ["t2"], "inst_id" : "s3"}

预期结果：

{"tag_id" : ["t1","t2"], "inst_id" : "s1"}
{"tag_id" : ["t1","t2"], "inst_id" : "s2"}
{"tag_id" : ["t2"], "inst_id" : "s3"}

- 崇德方

3个回答

0

这里是一种类似于连接的方法。它假设您的jq具有INDEX/2并支持--slurpfile命令行选项。如果您的jq没有这些功能，现在是升级的好时机，尽管有简单的解决方法。

调用

jq -n --slurpfile file1 file1.json -f join.jq file2.json

join.jq

def join(s2; joinField; field):
  INDEX(.[]; joinField) 
  | reduce s2 as $x (.;
      ($x|joinField) as $key
      | if .[$key] then (.[$key]|field) += ($x|field)
        else .[$key] = $x
      end )
  | .[]
  ;

$file1 | join(inputs; .inst_id; .tag_id)

- peak

0

以下方法非常高效，因为：

(a) 它利用了 file1.json 和 file2.json 是对象流的事实，从而避免了将这些对象存储为数组所需的内存；

(b) 它避免了排序（例如由 group_by 所需的排序）

关键概念是对象的键值加法。为了在流中执行对象的键值加法，我们定义了以下通用函数：

# s is assumed to be a stream of mutually
# compatible objects in the sense that, given
# any key of any object, the values at that key
# must be compatible w.r.t. `add`
def keywise_add(s):
  reduce s as $x ({};
     reduce ($x|keys_unsorted)[] as $k (.; 
       .[$k] += $x[$k]));

现在可以按照以下方式完成任务：

keywise_add(inputs | {(.inst_id): .tag_id} )
| keys_unsorted[] as $k
| {tag_id: .[$k], inst_id: $k}

调用

使用上述add.jq程序，进行如下调用：

jq -c -n -f add.jq file1.json file2.json

产生：

{"tag_id":["t1","t2"],"inst_id":"s1"}
{"tag_id":["t1","t2"],"inst_id":"s2"}
{"tag_id":["t2"],"inst_id":"s3"}

注意事项

以上假设inst_id的值为字符串类型。如果不是这种情况，则可以仍然采用以上方法，只要在inst_id | tostring之间没有冲突即可，例如，如果inst_id始终为数字。

- peak

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- peak · Accepted Answer

一种方法是使用group_by：

jq -n --slurpfile file1 file1.json --slurpfile file2 file2.json -f merge.jq

其中 merge.jq 包含：

def sigma(f): reduce f as $x (null; . + $x);

$file1 + $file2
| group_by(.inst_id)[]
| {tag_id: sigma(.[].tag_id), inst_id: .[0].inst_id }