尝试将日期最早且非空的重复字段更新为null

4

我正在尝试将license_no字段更新为null,如果 license_no重复 并且 具有最早的终止日期 并且 如果该终止日期不为null。以下是我目前拥有的SQL查询。如何在更新语句中确定重复项的最小终止日期?

update driver
set license_no = NULL
where license_no IN (SELECT license_no 
                      FROM   driver 
                      GROUP  BY license_no 
                      HAVING ( Count(license_no) > 1 )) 

如果有两行具有相同的许可证号码,其中一行具有非空日期,而第二行具有空日期。在这种情况下,您想将许可证号码设置为NULL吗?这两个中的哪一个? - Vladimir Baranov
如果有两行具有相同的许可证号码,其中一行具有非空日期,而第二行具有空日期,则我希望在非空记录中将许可证号更新为null。 - Brandon Taylor
3个回答

1
使用窗口函数。尝试这个。
UPDATE A
SET    license_no = NULL
FROM   driver A
       JOIN (SELECT Min(termination_date) termination_date,
                    license_no
             FROM   (SELECT Row_number()OVER(partition BY license_no ORDER BY termination_date desc) rn,
                            *
                     FROM  driver) o
             WHERE  rn > 1
             GROUP  BY license_no) B
         ON a.license_no = b.license_no
            AND a.termination_date = b.termination_date 

如果有两行具有相同的许可证号码,其中一行具有非空日期,而第二行具有空日期,则您的解决方案将不会更新这些行,即使有两个具有相同号码的许可证。从问题中并不清楚这是否是期望的行为,但我的理解是应该更新其中一个许可证(具有非空日期的那个)。 - Vladimir Baranov

0

我将使用一个表变量来存储一些样本数据。我会添加各种许可证号码和日期的组合,以捕获所有可能性。

此外,我将添加并使用一个ID列到您的模式中。在我的解决方案中,我需要单独标识每一行。仅通过许可证号码来标识行是不够的。

DECLARE @TDrivers TABLE (ID int NOT NULL PRIMARY KEY, license_no varchar(50) NULL, termination_date date NULL);

insert into @TDrivers values (1, NULL, '20130101');
insert into @TDrivers values (2, NULL, '20130101');
insert into @TDrivers values (3, NULL, '20130102');
insert into @TDrivers values (4, NULL, '20130103');
insert into @TDrivers values (5, NULL, '20130104');

-- only one license with this number
insert into @TDrivers values (6, '01', '20140101');

-- only one license with this number
insert into @TDrivers values (7, '02', NULL);

-- two licenses with this number, different dates
insert into @TDrivers values (8, '03', '20140301');
insert into @TDrivers values (9, '03', '20140302');

-- two licenses with this number, same dates
insert into @TDrivers values (10, '04', '20140401');
insert into @TDrivers values (11, '04', '20140401');

-- two licenses with this number, dates are not null and null
insert into @TDrivers values (12, '05', '20140501');
insert into @TDrivers values (13, '05', NULL);

-- two licenses with this number, both dates are null
insert into @TDrivers values (14, '06', NULL);
insert into @TDrivers values (15, '06', NULL);

-- three licenses with this number, dates are unique and not null
insert into @TDrivers values (16, '07', '20140701');
insert into @TDrivers values (17, '07', '20140702');
insert into @TDrivers values (18, '07', '20140703');

-- three licenses with this number, dates are not unique and not null
insert into @TDrivers values (19, '08', '20140801');
insert into @TDrivers values (20, '08', '20140801');
insert into @TDrivers values (21, '08', '20140802');

-- dates are unique and have one null
insert into @TDrivers values (22, '09', '20140901');
insert into @TDrivers values (23, '09', '20140902');
insert into @TDrivers values (24, '09', '20140903');
insert into @TDrivers values (25, '09', NULL);

-- dates are unique and have several nulls
insert into @TDrivers values (26, '10', '20141001');
insert into @TDrivers values (27, '10', '20141002');
insert into @TDrivers values (28, '10', '20141003');
insert into @TDrivers values (29, '10', NULL);
insert into @TDrivers values (30, '10', NULL);
insert into @TDrivers values (31, '10', NULL);

-- dates are not unique and have one null
insert into @TDrivers values (32, '11', '20141101');
insert into @TDrivers values (33, '11', '20141101');
insert into @TDrivers values (34, '11', '20141102');
insert into @TDrivers values (35, '11', '20141103');
insert into @TDrivers values (36, '11', NULL);

-- dates are not unique and have several nulls
insert into @TDrivers values (37, '12', '20141201');
insert into @TDrivers values (38, '12', '20141201');
insert into @TDrivers values (39, '12', '20141202');
insert into @TDrivers values (40, '12', '20141203');
insert into @TDrivers values (41, '12', NULL);
insert into @TDrivers values (42, '12', NULL);
insert into @TDrivers values (43, '12', NULL);

SELECT * FROM @TDrivers ORDER BY ID;

首先,我们将找到所有出现超过一次的许可证号码:

WITH
CTE_DuplicatedLicenses
AS
(
    SELECT license_no
    FROM @TDrivers AS d
    WHERE license_no IS NOT NULL
    -- there is no need to update license numbers that are NULL already
    GROUP BY license_no
    HAVING (COUNT(license_no) > 1)
)

然后,对于每个重复的许可证号码,我们正在寻找具有最小终止日期(不为NULL)的一行。当存在多个具有相同终止日期的行时,我使用ORDER BY中的ID列来解决歧义。

,CTE_LicensesWithDates
AS
(
    SELECT
        CTE_Dates.ID
    FROM
        CTE_DuplicatedLicenses
        CROSS APPLY
        (
            SELECT TOP (1) d.ID
            FROM @TDrivers AS d
            WHERE
                d.license_no = CTE_DuplicatedLicenses.license_no
                AND d.termination_date IS NOT NULL
            ORDER BY
                d.termination_date, d.ID
        ) AS CTE_Dates
)

最后的CTE给出了所有应该被更新的IDs
update @TDrivers
set license_no = NULL
where ID IN
(
    SELECT ID
    FROM CTE_LicensesWithDates
);

SELECT * FROM @TDrivers ORDER BY ID;

这是原始表格:

ID   license_no   termination_date
1    NULL         2013-01-01
2    NULL         2013-01-01
3    NULL         2013-01-02
4    NULL         2013-01-03
5    NULL         2013-01-04
6    01           2014-01-01
7    02           NULL
8    03           2014-03-01
9    03           2014-03-02
10   04           2014-04-01
11   04           2014-04-01
12   05           2014-05-01
13   05           NULL
14   06           NULL
15   06           NULL
16   07           2014-07-01
17   07           2014-07-02
18   07           2014-07-03
19   08           2014-08-01
20   08           2014-08-01
21   08           2014-08-02
22   09           2014-09-01
23   09           2014-09-02
24   09           2014-09-03
25   09           NULL
26   10           2014-10-01
27   10           2014-10-02
28   10           2014-10-03
29   10           NULL
30   10           NULL
31   10           NULL
32   11           2014-11-01
33   11           2014-11-01
34   11           2014-11-02
35   11           2014-11-03
36   11           NULL
37   12           2014-12-01
38   12           2014-12-01
39   12           2014-12-02
40   12           2014-12-03
41   12           NULL
42   12           NULL
43   12           NULL

这是更新后的表格:

ID   license_no   termination_date
1    NULL         2013-01-01
2    NULL         2013-01-01
3    NULL         2013-01-02
4    NULL         2013-01-03
5    NULL         2013-01-04
6    01           2014-01-01
7    02           NULL
8    NULL         2014-03-01
9    03           2014-03-02
10   NULL         2014-04-01
11   04           2014-04-01
12   NULL         2014-05-01
13   05           NULL
14   06           NULL
15   06           NULL
16   NULL         2014-07-01
17   07           2014-07-02
18   07           2014-07-03
19   NULL         2014-08-01
20   08           2014-08-01
21   08           2014-08-02
22   NULL         2014-09-01
23   09           2014-09-02
24   09           2014-09-03
25   09           NULL
26   NULL         2014-10-01
27   10           2014-10-02
28   10           2014-10-03
29   10           NULL
30   10           NULL
31   10           NULL
32   NULL         2014-11-01
33   11           2014-11-01
34   11           2014-11-02
35   11           2014-11-03
36   11           NULL
37   NULL         2014-12-01
38   12           2014-12-01
39   12           2014-12-02
40   12           2014-12-03
41   12           NULL
42   12           NULL
43   12           NULL

要运行此示例,只需将所有代码块放在一个脚本中即可。

您可以在结果中看到,具有ID 8、10、12的行已更新。

ID为14、15的行未被更新,因为它们都具有NULL日期。

对于每个其他许可证号码,您可以看到仅将一行设置为NULL,即使有多行具有相同的日期。


谢谢Vlad。我需要多次执行更新来捕捉所有的重复项,但是这个方法可行!感谢你的帮助和详细的解释。 - Brandon Taylor
您IP地址为143.198.54.68,由于运营成本限制,当前对于免费用户的使用频率限制为每个IP每72小时10次对话,如需解除限制,请点击左下角设置图标按钮(手机用户先点击左上角菜单按钮)。 - Vladimir Baranov

0

我会使用一些标准的SQL,比如这样:

UPDATE
  driver d
SET
  license_no = null
WHERE 
  Exists ( SELECT 
             d2.license_no 
           FROM 
             driver d2 
           WHERE 
             d2.license_no = d.license_no 
           GROUP BY 
             d2.license_no 
           HAVING 
             Count(d2.license_no) > 1 
         ) AND
  Exists ( SELECT 
             d3.license_no
           FROM 
             driver d3
           WHERE 
             d3.license_no = d.license_no AND
             d3.termination_date = d.termination_date AND 
             d3.termination_date IS NOT NULL AND
             d3.termination_date IN ( SELECT 
                                        Min(d4.termination_date) 
                                      FROM 
                                        driver d4 
                                      WHERE 
                                         d4.license_no = d.license_no )                            
         )

第一个Exists()函数检查重复项。
第二个Exists()函数检查termination_date约束。
希望有所帮助!

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接