在矩阵中查找共同链接并通过共同交集进行分类

Question

在矩阵中查找共同链接并通过共同交集进行分类

rmatrixstatagraph-theoryset-intersection

4

假设我有一个距离成本矩阵，其中目的地成本和起点成本都需要低于某个阈值 - 例如，100美元 - 才能共享链接。我的难点在于通过对这些地点进行分类来实现一组公共集合：A1连接（目的地成本和起点成本均低于阈值）与A2、A3和A4相连；A2与A1和A4相连；A4与A1和A2相连。因此，A1、A2和A4将被归类为同一组，因为它们之间连接频率最高。以下是一个示例矩阵：

    A1  A2  A3  A4  A5  A6  A7
A1  0   90  90  90  100 100 100
A2  80  0   90  90  90  110 100
A3  80  110 0   90  120 110 90
A4  90  90  110 0   90  100 90
A5  110 110 110 110 0   90  80
A6  120 130 135 100 90  0   90
A7  105 110 120 90  90  90  0

我正在使用Stata编程，但我尚未将矩阵以mata的形式放置。列出字母A和数字的那一列是一个包含矩阵行名的变量，其余列用每个地点的名称命名（例如A1等）。

以下代码返回了每个地点之间链接的列表，可能因为匆忙而显得有些 "bruteforcelly"：

    clear all

    set more off

    //inputting matrix

    input A1 A2 A3 A4 A5 A6 A7
    0 90 90 90 100 100 100
    80 0 90 90 90 100 100
    80 110 0 90 120 110 90
    90 90 110 0 90 100 90
    110 110 110 110 0 90 90
    120 130 135 100 90 0 90
    105 110 120 90 90 90 0

    end

    //generate row variable

    gen locality=""

    forv i=1/7{

        replace locality="A`i'" in `i'

    }
    *

    order locality, first


    //generating who gets below the threshold of 100

    forv i=1/7{

        gen r_`i'=0

        replace r_`i'=1 if A`i'<100 & A`i'!=0

    }
    *

    //checking if both ways (origin and destiny below threshold)

    forv i=1/7{

        gen check_`i'=.

    forv j=1/7{

            local v=r_`i'[`j']

            local vv=r_`j'[`i']

            replace check_`i'=`v'+`vv' in `j'

                }

    *
        }
    *

    //creating list of links

    gen locality_x=""

    forv i=1/7{

        preserve

        local name = locality[`i']

        keep if check_`i'==2

        replace locality_x="`name'"

        keep locality locality_x

        save "C:\Users\user\Desktop\temp_`i'", replace

        restore

    }
    *

    use "C:\Users\user\Desktop\temp_1", clear

    forv i=2/7{

        append using "C:\Users\user\Desktop\temp_`i'"
    }
    *

    //now locality_x lists if A.1 has links with A.2, A.3 etc. and so on.
    //the dificulty lies in finding a common intersection between the groups.

这会返回以下列表：

locality_x  locality
A1  A2
A1  A3
A1  A4
A2  A1
A2  A4
A3  A1
A4  A1
A4  A2
A4  A7
A5  A6
A5  A7
A6  A5
A6  A7
A7  A4
A7  A5
A7  A6

我想要熟悉集合交集，但我不知道如何在Stata中做。我想要做的是能够重新编程阈值并找到共同点的东西。如果您可以提供R中的解决方案，那将不胜感激，因为我对它有一点编程经验。

以与@user2957945在下面的回答中提供的方式，在R中获取列表：

structure(c(0L, 80L, 80L, 90L, 110L, 120L, 105L, 90L, 0L, 110L, 
90L, 110L, 130L, 110L, 90L, 90L, 0L, 110L, 110L, 135L, 120L, 
90L, 90L, 90L, 0L, 110L, 100L, 90L, 100L, 90L, 120L, 90L, 0L, 
90L, 90L, 100L, 110L, 110L, 100L, 90L, 0L, 90L, 100L, 100L, 90L, 
90L, 80L, 90L, 0L), .Dim = c(7L, 7L), .Dimnames = list(c("A1", 
"A2", "A3", "A4", "A5", "A6", "A7"), c("A1", "A2", "A3", "A4", 
"A5", "A6", "A7")))

# get values less than threshold
id = m < 100 
# make sure both values are less than threshold, and dont include diagonal
m_new = (id + t(id) == 2) & m !=0 
# melt data and subset to keep TRUE values (TRUE if both less than threshold and not on diagonal)
result  = subset(reshape2::melt(m_new), value)
# reorder to match question results , if needed 
result[order(result[[1]], result[[2]]), 1:2] 

   Var1 Var2
8    A1   A2
15   A1   A3
22   A1   A4
2    A2   A1
23   A2   A4
3    A3   A1
4    A4   A1
11   A4   A2
46   A4   A7
40   A5   A6
47   A5   A7
34   A6   A5
48   A6   A7
28   A7   A4
35   A7   A5
42   A7   A6

我还添加了“图论”标签，因为我认为这不完全是一个交集问题，在这个问题中，我可以将列表转换为向量，并在R中使用intersect函数。代码需要生成一个新的id，其中一些地点必须在同一个新的id（组）内。就像上面的例子一样，如果A.1的集合有A.2和A.4，A.2有A.1和A.4，A.4有A.1和A.2，那么这三个地点必须在同一个id（组）内。换句话说，我需要每个地点的最大交集分组。我理解可能会出现不同矩阵的问题，比如A.1有A.2和A.6，A.2有A.1和A.6，A.6有A.1和A.2（但考虑到上面的第一个例子，A.6没有A.4），在这种情况下，我欢迎添加A.6到分组中的解决方案或其他任意的解决方案，其中代码仅将第一组聚集在一起，从列表中删除A.1、A.2和A.4，并将A.6保留在原来的分组中。

- John Doe

在你的例子中，A4与A7相连，而A7又与A5和A6相连。因此，整个图形是相互连接的。那么，您是否希望将所有A划分为同一组？ - Andrei

整个图是连通的——是的——但我想找到的是地点之间最高的交集集合，就像我讨论的例子一样。 - John Doe

为什么不选A5、A6和A7呢？它们之间也有三条链接。你是在寻找图中的所有完整子图吗？还是我还是漏掉了什么？ - Andrei

A5、A6 和 A7 也将被分组在一起。找到完整的子图可能是解决方案的一部分，但考虑到链接，我希望获得尽可能大的分组。 - John Doe

我仍在努力理解您的确切意思。您是否想要一个链接与节点比例最大的子图？您是否想要一个连接的子图，其中节点数最大？当您说“分组”时，您指的是什么？ - Andrei

我对图论不是很熟悉，请见谅。我会尝试这样表达：对于每个地点（节点），我想找到它的最大完全子图。A1、A2和A4的集合是它们所有的最大子图 - 如果我有任何错误，请纠正我。因此，它们被归类在一起。 - John Doe

2个回答

1

假设你想要的是最大的完整子图，你可以使用igraph包：

# Load necessary libraries
library(igraph)

# Define global parameters
threshold <- 100

# Compute the adjacency matrix
# (distances in both directions need to be smaller than the threshold)
am <- m < threshold & t(m) < threshold

# Make an undirected graph given the adjacency matrix
# (we set diag to FALSE so as not to draw links from a vertex to itself)
gr <- graph_from_adjacency_matrix(am, mode = "undirected", diag = FALSE)

# Find all the largest complete subgraphs
lc <- largest_cliques(gr)

# Output the list of complete subgraphs as a list of vertex names
lapply(lc, (function (e) e$name))

据我所知，Stata中没有类似的功能。但是，如果您正在寻找最大连接子图（在您的情况下是整个图形），则可以在Stata中使用聚类命令（即clustermat）。

- Andrei

我相信你可能已经找到了答案！但是，你能否详细地评论一下你的代码呢？ - John Doe

@John Doe，我已经添加了一些注释。 - Andrei

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user2957945 · Accepted Answer

R语言中你可以做：

# get values less then threshold
id = m < 100 
# make sure both values are less then threshold, and dont include diagonal
m_new = (id + t(id) == 2) & m !=0 
# melt data and subset to keep TRUE values (TRUE if both less than threshold and not on diagonal)
result  = subset(reshape2::melt(m_new), value)
# reorder to match question results , if needed 
result[order(result[[1]], result[[2]]), 1:2] 

   Var1 Var2
8    A1   A2
15   A1   A3
22   A1   A4
2    A2   A1
23   A2   A4
3    A3   A1
4    A4   A1
11   A4   A2
46   A4   A7
40   A5   A6
47   A5   A7
34   A6   A5
48   A6   A7
28   A7   A4
35   A7   A5
42   A7   A6

.

structure(c(0L, 80L, 80L, 90L, 110L, 120L, 105L, 90L, 0L, 110L, 
90L, 110L, 130L, 110L, 90L, 90L, 0L, 110L, 110L, 135L, 120L, 
90L, 90L, 90L, 0L, 110L, 100L, 90L, 100L, 90L, 120L, 90L, 0L, 
90L, 90L, 100L, 110L, 110L, 100L, 90L, 0L, 90L, 100L, 100L, 90L, 
90L, 80L, 90L, 0L), .Dim = c(7L, 7L), .Dimnames = list(c("A1", 
"A2", "A3", "A4", "A5", "A6", "A7"), c("A1", "A2", "A3", "A4", 
"A5", "A6", "A7")))