具有大量JOIN条件的SQL查询速度非常慢

3
我继承了一个 SQL Server 2008 R2 项目,其中包含从另一张表格进行表格更新的操作:
  • Table1(大约有150,000行)有三个电话号码字段(Tel1Tel2Tel3
  • Table2(大约有20,000行)也有三个电话号码字段(Phone1Phone2Phone3
当其中任何一个号码匹配时,Table1 应该被更新。
当前代码如下:
UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel1 = t2.Phone1 and t1.Tel1 is not null) or
(t1.Tel1 = t2.Phone2 and t1.Tel1 is not null) or
(t1.Tel1 = t2.Phone3 and t1.Tel1 is not null) or
(t1.Tel2 = t2.Phone1 and t1.Tel2 is not null) or
(t1.Tel2 = t2.Phone2 and t1.Tel2 is not null) or
(t1.Tel2 = t2.Phone3 and t1.Tel2 is not null) or
(t1.Tel3 = t2.Phone1 and t1.Tel3 is not null) or
(t1.Tel3 = t2.Phone2 and t1.Tel3 is not null) or
(t1.Tel3 = t2.Phone3 and t1.Tel3 is not null);

然而,这个查询需要超过30分钟才能运行。
执行计划显示主要瓶颈在于围绕Table1的聚集索引扫描周围的一个Nested Loop。这两个表都在它们的ID列上有聚集索引。
由于我的DBA技能非常有限,谁能建议最好的方法来提高这个查询的性能?将Tel1Tel2Tel3添加到每列中作为索引是否是最好的选择,还是可以改变查询以提高性能?

2
在这两个表的Tel1、Tel2和Tel3上应用非聚集索引。 - Vishwajeet
1
如果一个字段为空,那么 = 就不会返回 true - 你不需要所有这些 and t1.Tel1 is not null。此外,你正在更新查询的字段,这可能会导致一些数据丢失(如果 Tel1 = Phone2Phone1 为空)。首先尝试规范化电话号码(即有一个链接表来保存电话号码)。 - Keith
请问您能否添加一些测试数据吗?(例如在SQLFiddle中) - gbn
已经排序完毕;@Vishwajeet,在按照下面的答案拆分查询后,索引产生了差异。@Keith,谢谢 - 当然,我知道NULL不会被评估,它们只是为了效果而存在 :P - KenD
4个回答

1

初看之下,我建议从查询中消除所有的OR条件。

看看以下代码是否更快(它将您的更新转换为3个不同的更新):

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel1 is not null AND t1.Tel1 IN (t2.Phone1, t2.Phone2, t2.Phone3);

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel2 is not null AND t1.Tel2 IN (t2.Phone1, t2.Phone2, t2.Phone3);

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel3 is not null AND t1.Tel3 IN (t2.Phone1, t2.Phone2, t2.Phone3);

我也考虑过这种方法。它不会产生完全相同的结果,但可能足够接近。与第一个查询不同的是,第二个和第三个查询中任何匹配项都将重新更新,但话说回来,我不理解 OP 查询的工作原理,因为每个“t1”行可能会匹配多个“t2”行。 - paul
我不认为结果会发生冲突。第一个查询仅查看t1.Tel1列,第二个查询仅查看t1.Tel2列,以此类推... - bastos.sergio
当第一个查询中t1.Tel1匹配时,t1.Tel2t1.Tel3也会被更新。然后在第二个和第三个查询中,t1.Tel2t1.Tel3肯定会匹配。 - paul
我猜t1.Tel1,t1.Tel2和t1.Tel3可能是相同的,会生成3个更新(而不是一个),但是OP查询也不是以类似的方式工作吗? - bastos.sergio
@Paul 好的,你说得对。我没有注意到在 SET 子句中更新了 t1.Tel2 和 t1.Tel3。 - bastos.sergio
谢谢大家 - 我已经按照上面的建议将查询拆分为单个的UPDATE。一旦我完成了这个步骤,SSMS的执行计划视图建议我添加一些索引,我已经做到了(正如@Vishwajeet所建议的)。现在一切都变得更快了,感谢你们的帮助!PS我意识到这个查询有点垃圾,可能会更新多次行 - 但显然这是“按设计”:/ - KenD

1

首先,规范化您的表格数据:

insert into Table1Tel 
select primaryKey, Tel1 as 'tel' from Table1 where Tel1 is not null
union select primaryKey, Tel2 from Table1 where Tel2 is not null
union select primaryKey, Tel3 from Table1 where Tel3 is not null

insert into Table2Phone 
select primaryKey, Phone1 as 'phone' from Table2 where Phone1 is not null
union select primaryKey, Phone2 from Table2 where Phone2 is not null
union select primaryKey, Phone3 from Table2 where Phone3 is not null

这些标准化的表格是存储电话号码比添加列更好的方法。
然后,您可以像以下示例一样跨表连接:
update t1
set surname = t2.surname, 
    Address1 = t2.Address1, 
    DOB = t2.DOB
from Table1 t1 
     inner join Table1Tel tel
         on t1.primaryKey = tel.primaryKey
     inner join Table2Phone phone
         on tel.tel = phone.phone
     inner join Table2 t2
         on phone.primaryKey = t2.primaryKey

请注意,这并不能解决数据中重复的根本问题 - 例如,如果您的数据中既有Joe又有Jane Bloggs,并且他们拥有相同的电话号码(即使在不同的字段中),您将更新两条记录使它们相同。

1
请尝试以下查询,并告诉我执行完成需要多长时间。
UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on (
    '|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|' LIKE '%|'+cast(t1.Tel1 as varchar(15)+'|%'
    or '|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|' LIKE '%|'+cast(t1.Tel2 as varchar(15)+'|%'
    or '|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|' LIKE '%|'+cast(t1.Tel3 as varchar(15)+'|%'
    )

将3个OR替换为1个LIKE应该更快。试一试吧。


1
您可以尝试以下类似的方法,这样就可以避免冗余更新。
UPDATE t1
SET surname = t2.surname,
    Address1=t2.Address1, DOB=t2.DOB, 
    Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3
FROM
    Table1 T1
INNER JOIN
(
SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel1 = t2.Phone1

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel1 = t2.Phone2

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel1 = t2.Phone3

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel2 = t2.Phone1

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel2 = t2.Phone2

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel2 = t2.Phone3

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel3 = t2.Phone1

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel3 = t2.Phone2

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel3 = t2.Phone3

) X
ON T1.ID = X.T1_ID
INNER JOIN Table2 T2 ON X.T2_ID = T2.TD

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接