如何进行dplyr的内连接col1 > col2

Question

如何进行dplyr的内连接col1 > col2

3

我在使用dplyr连接时遇到了困难，当我没有使用标准的“col1” =“col2”连接时，这些连接无法运行。下面是两个示例，展示了我的具体情况。

首先：

library(dplyr)

tableA <- data.frame(col1= c("a","b","c","d"),
                     col2 = c(1,2,3,4))

inner_join(tableA, tableA, by = c("col1"!="col1")) %>% 
  select(col1, col2.x) %>% 
  arrange(col1, col2.x)

错误: by 必须是一个（命名的）字符向量、列表或 NULL，而不是逻辑值，用于自然连接（不建议在生产代码中使用）

当我使用SQL复制此代码时，会得到以下结果：

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

copy_to(con, tableA)

tbl(con, sql("select a.col1, b.col2
              from 
              tableA as a
              inner join 
              tableA as b
              on a.col1 <> b.col1")) %>% 
  arrange(col1, col2)

SQL查询结果：

# Source:     SQL [?? x 2]
# Database:   sqlite 3.19.3 [:memory:]
# Ordered by: col1, col2
     col1  col2
     <chr> <dbl>
 1     a     2
 2     a     3
 3     a     4
 4     b     1
 5     b     3
 6     b     4
 7     c     1
 8     c     2
 9     c     4
10     d     1
# ... with more rows

第二部分与上一部分类似：

inner_join(tableA, tableA, by = c("col1" > "col1")) %>% 
   select(col1, col2.x) %>% 
   arrange(col1, col2.x)

错误: by 必须是 (命名的) 字符向量、列表或 NULL，用于自然连接（不建议在生产代码中使用），而不是逻辑值

SQL 等效语句：

tbl(con, sql("select a.col1, b.col2
              from tableA as a
              inner join tableA as b
              on a.col1 > b.col1")) %>% 
   arrange(col1, col2)

第二个 SQL 查询的结果：

# Source:     SQL [?? x 2]
# Database:   sqlite 3.19.3 [:memory:]
# Ordered by: col1, col2
   col1  col2
  <chr> <dbl>
1     b     1
2     c     1
3     c     2
4     d     1
5     d     2
6     d     3

有没有人知道如何使用dplyr代码创建这些SQL示例？

- Dyfan Jones

2个回答

1

使用dplyr和tidyr的解决方案。思路是扩展数据框，然后对原始数据框执行连接。之后，使用tidyr中的fill将NA填充到之前的记录中。最后，过滤出具有相同值和NA的记录。

library(dplyr)
library(tidyr)

tableB <- tableA %>%
  complete(col1, col2) %>%
  left_join(tableA %>% mutate(col3 = col2), by = c("col1", "col2")) %>%
  group_by(col1) %>%
  fill(col3, .direction = "up") %>%
  filter(col2 != col3, !is.na(col3)) %>%
  select(-col3) %>%
  ungroup()
tableB
# # A tibble: 6 x 2
#    col1  col2
#   <chr> <dbl>
# 1     b     1
# 2     c     1
# 3     c     2
# 4     d     1
# 5     d     2
# 6     d     3

数据

tableA <- data.frame(col1= c("a","b","c","d"),
                     col2 = c(1,2,3,4), stringsAsFactors = FALSE)

- www

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- h3rm4n · Accepted Answer

对于您的第一个案例：

library(dplyr)
library(tidyr)

expand(tableA, col1, col2) %>% 
  left_join(tableA, by = 'col1') %>% 
  filter(col2.x != col2.y) %>% 
  select(col1, col2 = col2.x)

结果如下：

# A tibble: 12 x 2
     col1  col2
   <fctr> <dbl>
 1      a     2
 2      a     3
 3      a     4
 4      b     1
 5      b     3
 6      b     4
 7      c     1
 8      c     2
 9      c     4
10      d     1
11      d     2
12      d     3

关于第二个问题：

expand(tableA, col1, col2) %>% 
  left_join(tableA, by = 'col1') %>% 
  filter(col2.x < col2.y) %>% 
  select(col1, col2 = col2.x)

结果如下：

# A tibble: 6 x 2
    col1  col2
  <fctr> <dbl>
1      b     1
2      c     1
3      c     2
4      d     1
5      d     2
6      d     3