我有以下文本:
Cluster 7: {4, 15, 21, 28, 33, 35, 43, 47, 53, 57, 59, 66,
69, 70, 74, 86, 87, 88, 90, 114, 136, 148, 201,
202, 212, 220, 227, 250, 252, 253, 259, 262, 267,
270, 282, 296, 318, 319, 323, 326, 341}
Cluster 8: {9, 10, 11, 20, 39, 55, 79, 101, 108, 143, 149,
221, 279, 284, 285, 286, 287, 327, 333, 334, 335,
336}
Cluster 9: {3, 64, 83, 93, 150, 153, 264, 269, 320, 321, 322}
Cluster 10: {94, 123, 147}
我希望你能提取每组中的数字并进行聚类分析,相关内容与IT技术有关。我已经尝试了使用正则表达式但效果不佳。
我已尝试过:
regex="(Cluster \d+): \{((\d+)[,\}][\n ]+)+|(?:(\d+),[\n ])"
但是这些组不匹配。
我想要的输出结果是:
["Cluster 7", '4', '15', '21', '28', '33', '35', '43', '47', '53', '57', '59', '66', '69', '70', '74', '86', '87', '88', '90', '114', '136', '148', '201', '202', '212', '220', '227', '250', '252', '253', '259', '262', '267', '270', '282', '296', '318', '319', '323', '326', '341', "Cluster 8", '9', '10', '11', '20', '39', '55', '79', '101', '108', '143', '149', '221', '279', '284', '285', '286', '287', '327', '333', '334', '335', '336', "Cluster 9", '3', '64', '83', '93', '150', '153', '264', '269', '320', '321', '322', "Cluster 10", "94", "123", "147"]
也许这不是最好的方法来做这件事。
谢谢。