具有大量JOIN条件的SQL查询速度非常慢

Question

具有大量JOIN条件的SQL查询速度非常慢

sqlsql-server-2008

3

我继承了一个 SQL Server 2008 R2 项目，其中包含从另一张表格进行表格更新的操作：

Table1（大约有150,000行）有三个电话号码字段（Tel1、Tel2、Tel3）
Table2（大约有20,000行）也有三个电话号码字段（Phone1、Phone2、Phone3）

当其中任何一个号码匹配时，Table1 应该被更新。

当前代码如下：

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel1 = t2.Phone1 and t1.Tel1 is not null) or
(t1.Tel1 = t2.Phone2 and t1.Tel1 is not null) or
(t1.Tel1 = t2.Phone3 and t1.Tel1 is not null) or
(t1.Tel2 = t2.Phone1 and t1.Tel2 is not null) or
(t1.Tel2 = t2.Phone2 and t1.Tel2 is not null) or
(t1.Tel2 = t2.Phone3 and t1.Tel2 is not null) or
(t1.Tel3 = t2.Phone1 and t1.Tel3 is not null) or
(t1.Tel3 = t2.Phone2 and t1.Tel3 is not null) or
(t1.Tel3 = t2.Phone3 and t1.Tel3 is not null);

然而，这个查询需要超过30分钟才能运行。

执行计划显示主要瓶颈在于围绕Table1的聚集索引扫描周围的一个Nested Loop。这两个表都在它们的ID列上有聚集索引。

由于我的DBA技能非常有限，谁能建议最好的方法来提高这个查询的性能？将Tel1、Tel2和Tel3添加到每列中作为索引是否是最好的选择，还是可以改变查询以提高性能？

- KenD

2

在这两个表的Tel1、Tel2和Tel3上应用非聚集索引。 - Vishwajeet

1

如果一个字段为空，那么 = 就不会返回 true - 你不需要所有这些 and t1.Tel1 is not null。此外，你正在更新查询的字段，这可能会导致一些数据丢失（如果 Tel1 = Phone2 但 Phone1 为空）。首先尝试规范化电话号码（即有一个链接表来保存电话号码）。 - Keith

请问您能否添加一些测试数据吗？（例如在SQLFiddle中） - gbn

已经排序完毕；@Vishwajeet，在按照下面的答案拆分查询后，索引产生了差异。@Keith，谢谢 - 当然，我知道NULL不会被评估，它们只是为了效果而存在 :P - KenD

4个回答

1

首先，规范化您的表格数据：

insert into Table1Tel 
select primaryKey, Tel1 as 'tel' from Table1 where Tel1 is not null
union select primaryKey, Tel2 from Table1 where Tel2 is not null
union select primaryKey, Tel3 from Table1 where Tel3 is not null

insert into Table2Phone 
select primaryKey, Phone1 as 'phone' from Table2 where Phone1 is not null
union select primaryKey, Phone2 from Table2 where Phone2 is not null
union select primaryKey, Phone3 from Table2 where Phone3 is not null

这些标准化的表格是存储电话号码比添加列更好的方法。

然后，您可以像以下示例一样跨表连接：

update t1
set surname = t2.surname, 
    Address1 = t2.Address1, 
    DOB = t2.DOB
from Table1 t1 
     inner join Table1Tel tel
         on t1.primaryKey = tel.primaryKey
     inner join Table2Phone phone
         on tel.tel = phone.phone
     inner join Table2 t2
         on phone.primaryKey = t2.primaryKey

请注意，这并不能解决数据中重复的根本问题 - 例如，如果您的数据中既有Joe又有Jane Bloggs，并且他们拥有相同的电话号码（即使在不同的字段中），您将更新两条记录使它们相同。

- Keith

1

请尝试以下查询，并告诉我执行完成需要多长时间。

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on (
    '|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|' LIKE '%|'+cast(t1.Tel1 as varchar(15)+'|%'
    or '|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|' LIKE '%|'+cast(t1.Tel2 as varchar(15)+'|%'
    or '|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|' LIKE '%|'+cast(t1.Tel3 as varchar(15)+'|%'
    )

将3个OR替换为1个LIKE应该更快。试一试吧。

- Maximus

1

您可以尝试以下类似的方法，这样就可以避免冗余更新。

UPDATE t1
SET surname = t2.surname,
    Address1=t2.Address1, DOB=t2.DOB, 
    Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3
FROM
    Table1 T1
INNER JOIN
(
SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel1 = t2.Phone1

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel1 = t2.Phone2

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel1 = t2.Phone3

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel2 = t2.Phone1

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel2 = t2.Phone2

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel2 = t2.Phone3

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel3 = t2.Phone1

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel3 = t2.Phone2

UNION

SELECT
    T1.ID AS T1_ID,
    T2.ID AS T2_ID
FROM
    t1.Tel3 = t2.Phone3

) X
ON T1.ID = X.T1_ID
INNER JOIN Table2 T2 ON X.T2_ID = T2.TD

- George

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- bastos.sergio · Accepted Answer

初看之下，我建议从查询中消除所有的OR条件。

看看以下代码是否更快（它将您的更新转换为3个不同的更新）：

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel1 is not null AND t1.Tel1 IN (t2.Phone1, t2.Phone2, t2.Phone3);

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel2 is not null AND t1.Tel2 IN (t2.Phone1, t2.Phone2, t2.Phone3);

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel3 is not null AND t1.Tel3 IN (t2.Phone1, t2.Phone2, t2.Phone3);