我有一组损坏链接检查结果存在一个文本文件中:
Getting links from: https://www.foo.com/
├───OK─── http://www.this.com/
├───OK─── http://www.is.com/
├─BROKEN─ http://www.broken.com/
├───OK─── http://www.set.com/
├───OK─── http://www.one.com/
5 links found. 0 excluded. 1 broken.
Getting links from: https://www.bar.com/
├───OK─── http://www.this.com/
├───OK─── http://www.is.com/
├─BROKEN─ http://www.broken.com/
3 links found. 0 excluded. 1 broken.
Getting links from: https://www.boo.com/
├───OK─── http://www.this.com/
├───OK─── http://www.is.com/
2 links found. 0 excluded. 0 broken.
我正在尝试编写一个脚本来读取文件并创建一个字典列表,其中每个根链接都是键,其子链接(包括摘要行)都是值。
我想要实现的输出如下:
{"Getting links from: https://www.foo.com/": ["├───OK─── http://www.this.com/", "├───OK─── http://www.is.com/", "├─BROKEN─ http://www.broken.com/", "├───OK─── http://www.set.com/", "├───OK─── http://www.one.com/", "5 links found. 0 excluded. 1 broken."],
"Getting links from: https://www.bar.com/": ["├───OK─── http://www.this.com/", "├───OK─── http://www.is.com/", "├─BROKEN─ http://www.broken.com/", "3 links found. 0 excluded. 1 broken."],
"Getting links from: https://www.boo.com/": ["├───OK─── http://www.this.com/", "├───OK─── http://www.is.com/", "2 links found. 0 excluded. 0 broken."] }
以下是我目前的内容:
result_list = []
with open('link_checker_result.txt', 'r') as f:
temp_list = f.readlines()
for line in temp_list:
result_list.append(line)
这将给我输出:
['Getting links from: https://www.foo.com/', '├───OK─── http://www.this.com/', '├───OK─── http://www.is.com/', '├─BROKEN─ http://www.broken.com/', '├───OK─── http://www.set.com/', '├───OK─── http://www.one.com/', '5 links found. 0 excluded. 1 broken.', 'Getting links from: https://www.bar.com/', '├───OK─── http://www.this.com/', '├───OK─── http://www.is.com/', '...' ]
我认识到这些集合中有一些共同的特点,例如它们之间有一个空行,或者以“Getting…”开头。在写入字典之前,我是否应该尝试将其拆分?
我对Python还比较新,所以我承认我甚至不确定自己是否走在正确的方向上。非常感谢专家们的帮助!提前致谢!
├─
开头的数据行,还有摘要,它是_不以_├─
开头的。这怎么能更加有结构呢? - ForceBru