T-SQL - 连接多个表会导致重复行

6

我正在努力从以下示例Fiddle中获得以下结果。

结果:

enter image description here

源表:

enter image description here

Fiddle

该想法是通过查找预算和预测ID的最大计数来为每个RecordID创建单行,以创建其他列(如果该预算或预测ID没有值,则留下列单元格为空)。

我尝试使用PIVOT函数,但结果甚至不理想。

更新: 请参见下面的图像,我尝试解释预期的输出

简述:对于属于RecordID的每个BudgetID,创建单独的列,包括BDate、Result(Records表的Percentage * BAmount)和Status。

在我的例子中,记录ID55在预算表中有两个条目 - 因此需要2x3列来显示每个日期结果状态在它自己的列中在单行中为这个记录ID。由于记录ID77在预算表中有最多(三个)条目,因此用它来创建3x3列覆盖所有行。同样也适用于预测。

output

我希望你能帮助我。

谢谢。


1
请问您能否为此添加预期输出? - Biju jose
对我来说不太清楚。 - paparazzo
请参见更新以获取详细说明。 - ratanmalko
你需要修改我的答案并添加“列名”。你需要哪些列?让我知道你有什么...! - Pugal
SQL没有语法来基于新数据添加新列。你知道每个RecordID最多只有3个Budget IDs吗?如果是的话,你可以编写一个类似于数据透视表的查询来添加Result1、Result2和Result3列。如果不知道你需要多少个Result列直到运行查询时,那么你必须编写动态SQL,也就是读取数据的代码,然后根据数据编写新查询。 - MatBailie
3个回答

6

你好,

注意!我的目标是帮助读者学习,而不是为他的作业提供最终查询解决方案。因此,我将分两步呈现解决方案,并在动态解决方案中添加几个“PRINT”命令,以便读者可以选择检查工作中的中间步骤。

OP提供的DDL+DML:

DROP TABLE IF EXISTS Budget;
CREATE TABLE Budget
    (BudgetID int, RecordID int, BDate date,Percentage int, [Status] varchar(50));
INSERT INTO Budget
    (BudgetID, RecordID, BDate,Percentage,Status)
VALUES
    (1, 55, '2017-01-01', 60, 'ordered'),
    (2, 55, '2017-03-24', 40, 'ordered'),
    (3, 66, '2018-08-15', 100, 'invoiced'),
    (4, 77, '2018-12-02', 25, 'paid'),
    (5, 77, '2018-09-10', 35, 'ordered'),
    (6, 77, '2019-07-13', 40, 'ordered')
GO

DROP TABLE IF EXISTS Forecast;
CREATE TABLE Forecast
    (ForecastID int, RecordID int, FDate date, Percentage int);
INSERT INTO Forecast
    (ForecastID, RecordID, FDate,Percentage)
VALUES
    (1, 55, '2020-12-01', 100),
    (2, 77, '2023-05-17', 25),
    (3, 77, '2024-11-28', 75)
GO

DROP TABLE IF EXISTS Records;
CREATE TABLE Records
    (RecordID int, BAmount int, FAmount int, Name varchar(40), Description varchar(40) )
;
INSERT INTO Records
   (RecordID, BAmount,FAmount,Name, Description)
VALUES
    (55, 15000, 33000, 'Prod1', 'Desc1' ),
     (66, 22000, 17500, 'Prod2', 'Desc2' ),
    (77, 40000, 44000, 'Prod3', 'Desc3' )
GO

select * from Budget
select * from Forecast
select * from Records

首先展示一个简单的静态解决方案

这基于我们知道预算表中每个RecordID最多有三行,而预测表中每个RecordID最多有两行。这将有助于理解我接下来要展示的动态解决方案。

;With CteBudget as (
    select 
        b.BDate, b.BudgetID, b.Percentage, b.RecordID, b.Status
        ,RN = ROW_NUMBER() OVER (partition by b.RecordID order by b.BudgetID)
    from Budget b
),
CteForecast as (
    select 
        f.FDate, f.ForecastID, f.Percentage, f.RecordID
        ,RN = ROW_NUMBER() OVER (partition by f.RecordID order by f.ForecastID)
    from Forecast f
)
select 
    r.RecordID, r.Name, r.Description, 

    b1.BDate BDate1, (b1.Percentage * r.BAmount)/100 BResult1, b1.Status BStatus1, 
    b2.BDate BDate2, (b2.Percentage * r.BAmount)/100 BResult2, b2.Status BStatus2,
    b3.BDate BDate3, (b3.Percentage * r.BAmount)/100 BResult3, b3.Status BStatus3, 

    f1.FDate FDate1, (f1.Percentage * r.BAmount)/100 FResult1,
    f2.FDate FDate2, (f2.Percentage * r.BAmount)/100 FResult2

from Records r
left join CteBudget b1 on r.RecordID = b1.RecordID and b1.RN = 1
left join CteBudget b2 on r.RecordID = b2.RecordID and b2.RN = 2
left join CteBudget b3 on r.RecordID = b3.RecordID and b3.RN = 3
left join CteForecast f1 on r.RecordID = f1.RecordID and f1.RN = 1
left join CteForecast f2 on r.RecordID = f2.RecordID and f2.RN = 2
--where r.RecordID = 77
GO

注意! 对于静态解决方案和没有索引(我会在最后添加),上述解决方案在性能方面非常差,但一旦我们添加正确的索引并作为动态解决方案的基础,这个选项应该适合。

现在我们可以提出动态解决方案。

-- Get number of columns
Declare @NumBudget tinyint
Declare @NumForecast tinyint
SELECT @NumBudget = MAX(C) FROM (
    select COUNT(RecordID) C
    from Budget
    GROUP BY RecordID
) t
SELECT @NumForecast = MAX(C) FROM (
    select COUNT(RecordID) C
    from Forecast
    GROUP BY RecordID
) t
---------------------------------------------
DECLARE @SQLString1 nvarchar(MAX) = '';
DECLARE @SQLString2 nvarchar(MAX) = '';
DECLARE @loop int = 1;
WHILE @loop <= @NumBudget BEGIN 
    SET @SQLString1 = @SQLString1 + N'
    b' + CONVERT(VARCHAR(2),@loop) + '.BDate BDate' + CONVERT(VARCHAR(2),@loop) + ', (b' + CONVERT(VARCHAR(2),@loop) + '.Percentage * r.BAmount)/100 BResult' + CONVERT(VARCHAR(2),@loop) + ', b' + CONVERT(VARCHAR(2),@loop) + '.Status BStatus' + CONVERT(VARCHAR(2),@loop) + ', '

    SET @SQLString2 = @SQLString2 + N'
    left join CteBudget b' + CONVERT(VARCHAR(2),@loop) + ' on r.RecordID = b' + CONVERT(VARCHAR(2),@loop) + '.RecordID and b' + CONVERT(VARCHAR(2),@loop) + '.RN = 1'

    SET @loop = @loop + 1
END
SET @loop = 1

WHILE @loop <= @NumForecast BEGIN  
    SET @SQLString1 = @SQLString1 + N'
    f' + CONVERT(VARCHAR(2),@loop) + '.FDate FDate' + CONVERT(VARCHAR(2),@loop) + ', (f' + CONVERT(VARCHAR(2),@loop) + '.Percentage * r.BAmount)/100 FResult' + CONVERT(VARCHAR(2),@loop) + ','

    SET @SQLString2 = @SQLString2 + N'
    left join CteForecast f' + CONVERT(VARCHAR(2),@loop) + ' on r.RecordID = f' + CONVERT(VARCHAR(2),@loop) + '.RecordID and f' + CONVERT(VARCHAR(2),@loop) + '.RN = 1'

    SET @loop = @loop + 1
END
SET @SQLString1 = STUFF (@SQLString1, LEN(@SQLString1) , 1 , '')  
PRINT '/************************************************/'
PRINT @SQLString1
PRINT @SQLString2
PRINT '/************************************************/'

DECLARE @SQLString nvarchar(MAX);
SET @SQLString = N'
;With CteBudget as (
    select 
        b.BDate, b.BudgetID, b.Percentage, b.RecordID, b.Status
        ,RN = ROW_NUMBER() OVER (partition by b.RecordID order by b.BudgetID)
    from Budget b
),
CteForecast as (
    select 
        f.FDate, f.ForecastID, f.Percentage, f.RecordID
        ,RN = ROW_NUMBER() OVER (partition by f.RecordID order by f.ForecastID)
    from Forecast f
)
select 
    r.RecordID, r.Name, r.Description, 
'
+ @SQLString1
+ N'
from Records r'
+ @SQLString2

print @SQLString

EXECUTE sp_executesql @SQLString
GO

重要提示! 这个解决方案不一定是性能最好的,但可能是最简单易懂的。在生产环境中,一旦我们有了真实的DDL+DML和服务器统计数据,就可以提高性能并选择最适合我们特定情况的解决方案。

索引

注意! 上述解决方案可能导致大量数据排序,因此正确的索引非常重要!测试几种不同的选项并选择最佳选项非常重要。

为了论坛的利益(或者你称之为stackoverflow的什么,但我认为它不是一个讨论论坛界面,而是一个问答界面),我添加了创建聚集索引的查询语句,我假设你在生产环境中拥有这个索引,以及一个可选的非聚集索引,你应该对其进行测试(我没有测试其他选项,这只是我想到的第一个,因此建议使用真实的DDL+DML查找正确的索引)。

-- CLUSTERED INDEX
CREATE CLUSTERED INDEX IX_Budget_BudgetID
    ON dbo.Budget (BudgetID);   
GO 
CREATE CLUSTERED INDEX IX_Forecast_ForecastID
    ON dbo.Forecast (ForecastID);   
GO 
CREATE CLUSTERED INDEX IX_Records_RecordID  
    ON dbo.Records (RecordID);   
GO 

-- NONCLUSTERED INDEX
CREATE NONCLUSTERED INDEX NX_Budget_RecordID_BudgetID
    ON dbo.Budget (RecordID,BudgetID);   
GO 
CREATE NONCLUSTERED INDEX NX_Forecast_RecordID_ForecastID
    ON dbo.Forecast (RecordID,ForecastID);   
GO 
CREATE NONCLUSTERED INDEX NX_Records_RecordID_RecordID  
    ON dbo.Records (RecordID);   
GO 

非常感谢您详细的回答。我已经成功将示例调整为生产数据。 - ratanmalko
不客气@ratanmalko。很高兴听到我能帮忙的消息 :-) - Ronen Ariely

3

我认为我会这样做:

select Max(BudgetId) as BudgetID, RecordID
into #MBudget
from Budget
group by RecordID

select B.* 
into #MaxB
from #MBudget M
inner join Budget B
on M.BudgetID = B.BudgetID
and M.RecordID = B.RecordID

--The above will then only have the "maximum" BudgetID data from the table.

-- You need to then do the same with the Forecast table

select Max(ForecastId) as ForecastID, RecordID
into #MForecast
from Forecast
group by RecordID

select F.* 
into #MaxF
from #MForecast M
inner join Forecast F
on M.ForecastID = F.ForecastID
and M.RecordID = F.RecordID

-- Join them together on the RecordID

select *  -- you will need to pick the required fields
from #MaxF F
inner join #MaxB B 
on F.RecordID = B.RecordID
inner join Record R
on F.RecordID = R.RecordID

我认为这会帮助你,我不是fiddle的忠实粉丝,所以在我的头脑中完成了它。

我有点同意那些人的意见,你本可以更清晰地表达,但希望这能帮到你。


3
最后,我们创建了多个联接的动态查询。 输入表格:
CREATE TABLE #Budget (BudgetID int, RecordID int, BDate date,Percentage int, [Status] varchar(50));
INSERT INTO #Budget(BudgetID, RecordID, BDate,Percentage,Status)
VALUES(1, 55, '2017-01-01', 60, 'ordered'),(2, 55, '2017-03-24', 40, 'ordered'),(3, 66, '2018-08-15', 100, 'invoiced'),(4, 77, '2018-12-02', 25, 'paid'),(5, 77, '2018-09-10', 35, 'ordered'),(6, 77, '2019-07-13', 40, 'ordered')

CREATE TABLE #Forecast(ForecastID int, RecordID int, FDate date, Percentage int)
INSERT INTO #Forecast(ForecastID, RecordID, FDate,Percentage)
VALUES(4, 77, '2018-07-18', 24),(1, 55, '2020-12-01', 100),(2, 77, '2023-05-17', 25),(3, 77, '2024-11-28', 75)

CREATE TABLE #Records(RecordID int, BAmount int, FAmount int, Name varchar(40), Description varchar(40) )
INSERT INTO #Records(RecordID, BAmount,FAmount,Name, Description)
VALUES(55, 15000, 33000, 'Prod1', 'Desc1' ),(66, 22000, 17500, 'Prod2', 'Desc2' ),(77, 40000, 44000, 'Prod3', 'Desc3' )

最终连接查询

 select * from (select r1.RecordID, f1.FDate fdate1, b1.BDate bdate1
 , ROW_NUMBER() over(partition by r1.recordid order by (select NULL)) rn from #Records r1
 join #Forecast f1
 on r1.RecordID = f1.RecordID
 and r1.RecordID = 77
 join #Budget b1
 on r1.RecordID = b1.RecordID
)t1 join (select r2.RecordID, f2.FDate fdate2, b2.BDate bdate2
 , ROW_NUMBER() over(partition by r2.recordid order by (select NULL)) rn from #Records r2
 join #Forecast f2
 on r2.RecordID = f2.RecordID
 and r2.RecordID = 77
 join #Budget b2
 on r2.RecordID = b2.RecordID
)t2 on t1.RecordID = t2.RecordID and t1.rn < t2.rn join (select r3.RecordID, f3.FDate fdate3, b3.BDate bdate3
 , ROW_NUMBER() over(partition by r3.recordid order by (select NULL)) rn from #Records r3
 join #Forecast f3
 on r3.RecordID = f3.RecordID
 and r3.RecordID = 77
 join #Budget b3
 on r3.RecordID = b3.RecordID
)t3 on t2.RecordID = t3.RecordID and t2.rn < t3.rn join (select r4.RecordID, f4.FDate fdate4, b4.BDate bdate4
 , ROW_NUMBER() over(partition by r4.recordid order by (select NULL)) rn from #Records r4
 join #Forecast f4
 on r4.RecordID = f4.RecordID
 and r4.RecordID = 77
 join #Budget b4
 on r4.RecordID = b4.RecordID
)t4 on t3.RecordID = t4.RecordID and t3.rn < t4.rn join (select r5.RecordID, f5.FDate fdate5, b5.BDate bdate5
 , ROW_NUMBER() over(partition by r5.recordid order by (select NULL)) rn from #Records r5
 join #Forecast f5
 on r5.RecordID = f5.RecordID
 and r5.RecordID = 77
 join #Budget b5
 on r5.RecordID = b5.RecordID
)t5 on t4.RecordID = t5.RecordID and t4.rn < t5.rn join (select r6.RecordID, f6.FDate fdate6, b6.BDate bdate6
 , ROW_NUMBER() over(partition by r6.recordid order by (select NULL)) rn from #Records r6
 join #Forecast f6
 on r6.RecordID = f6.RecordID
 and r6.RecordID = 77
 join #Budget b6
 on r6.RecordID = b6.RecordID
)t6 on t5.RecordID = t6.RecordID and t5.rn < t6.rn join (select r7.RecordID, f7.FDate fdate7, b7.BDate bdate7
 , ROW_NUMBER() over(partition by r7.recordid order by (select NULL)) rn from #Records r7
 join #Forecast f7
 on r7.RecordID = f7.RecordID
 and r7.RecordID = 77
 join #Budget b7
 on r7.RecordID = b7.RecordID
)t7 on t6.RecordID = t7.RecordID and t6.rn < t7.rn join (select r8.RecordID, f8.FDate fdate8, b8.BDate bdate8
 , ROW_NUMBER() over(partition by r8.recordid order by (select NULL)) rn from #Records r8
 join #Forecast f8
 on r8.RecordID = f8.RecordID
 and r8.RecordID = 77
 join #Budget b8
 on r8.RecordID = b8.RecordID
)t8 on t7.RecordID = t8.RecordID and t7.rn < t8.rn join (select r9.RecordID, f9.FDate fdate9, b9.BDate bdate9
 , ROW_NUMBER() over(partition by r9.recordid order by (select NULL)) rn from #Records r9
 join #Forecast f9
 on r9.RecordID = f9.RecordID
 and r9.RecordID = 77
 join #Budget b9
 on r9.RecordID = b9.RecordID
)t9 on t8.RecordID = t9.RecordID and t8.rn < t9.rn

首先从IP表中验证OP和上述查询。如果答案被接受,您将使用以下 SP。

SP 1

exec rownumber

create procedure rownumber as                                           --1st sp.
declare @r int = 1, @select nvarchar(100) = 'select * from ', @count int, @finalquery nvarchar(max) = N'', @out nvarchar(max)
begin
 set @count = (select Count(*) from (select r.RecordID, f1.FDate fdate1, b1.BDate bdate1
  , ROW_NUMBER() over(partition by r.recordid order by (select NULL)) rn from #Records r
  join #Forecast f1
  on r.RecordID = f1.RecordID
  and r.RecordID = 77
  join #Budget b1
  on r.RecordID = b1.RecordID
  )t
 )
 if @count > 2
 begin
  while @count >= @r
  begin
   exec multyJoin @r, @out output
   set @finalquery += @out
   set @r = @r + 1
  end
 end
 else begin
  select * from (
   select r.RecordID, f1.FDate fdate1, b1.BDate bdate1
   , ROW_NUMBER() over(partition by r.recordid order by (select NULL)) rn from #Records r
   join #Forecast f1
   on r.RecordID = f1.RecordID
   and r.RecordID = 77
   join #Budget b1
   on r.RecordID = b1.RecordID
  )t
 end
 set @finalquery = stuff(@finalquery,1, 6,@select)
 set @finalquery = REPLACE(@finalquery, 'on t0.RecordID = t1.RecordID and t0.rn < t1.rn ','')
 print @finalquery
 exec sp_executesql @finalquery
end

SP 2

create procedure multyJoin (@r int, @join varchar(max) output) as       --2nd sp. which is called form below sp
declare @cond nvarchar(100), @rvc varchar(3) = convert(int, @r)     ----row number of current in varchar
, @rvp varchar(3) = convert(int, @r-1)      ----row number of previous in varchar
begin

 set @join = ' join (select r'+@rvc+'.RecordID, f'+@rvc+'.FDate fdate'+@rvc+', b'+@rvc+'.BDate bdate'+@rvc+  ---Here add your columns as like as 'Percentage'+@rvc+' * '+@rvc+'BAmount'
  ', ROW_NUMBER() over(partition by r'+@rvc+'.recordid order by (select NULL)) rn from #Records r'+@rvc+'
  join #Forecast f'+@rvc+'
  on r'+@rvc+'.RecordID = f'+@rvc+'.RecordID
  and r'+@rvc+'.RecordID = 77
  join #Budget b'+@rvc+'
  on r'+@rvc+'.RecordID = b'+@rvc+'.RecordID
 )t'+@rvc+
 ' on t'+@rvp+'.RecordID = t'+@rvc+'.RecordID and t'+@rvp+'.rn < t'+@rvc+'.rn'  --Here multiple join has been created. Each table contains one row
end

感谢您的脚本。使用旋转功能的动态查询将非常棒。 - ratanmalko
你解决了吗? - Pugal
我希望能够看到你答案中提到的动态查询的示例。 - ratanmalko

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接