我正在研究在字符串列表中找到共同的字符串部分。如果我们采用样本数据集
private readonly List<string> Xpath = new List<string>()
{
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(1)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(2)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(3)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(4)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(5)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(6)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(7)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(8)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(9)>H2:nth-of-type(1)"
};
根据这个,我希望找出与哪些孩子相似的数据是一组Xpath列表。
我应该能够编程来告诉:
BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV
为了达到这个目的,我做的是像这样。我使用>
将每个项分开,然后为每个原始数据集创建一个项目列表。
接着使用这个列表,找出其中唯一的项目。
private IEnumerable<T> GetCommonItems<T>(IEnumerable<T>[] lists)
{
HashSet<T> hs = new HashSet<T>(lists.First());
for (int i = 1; i < lists.Length; i++)
{
hs.IntersectWith(lists[i]);
}
return hs;
}
能够找出唯一的值并重新创建数据集。但是,如果数据集中存在类似于“Div”在两个地方的情况,并且它也在每个原始数据集中,那么这种方法将只选择一个“Div”。
因此,最终可能会得到以下内容:
BODY>MAIN:nth-of-type(1)>DIV>SECTION
但是我需要的是:
BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of- type(3)>DIV>ARTICLE>DIV>DIV>DIV