T-SQL查找冗余索引

6

有没有人知道一种T-SQL脚本,可以检测整个数据库中的冗余索引?表中一个冗余索引的例子如下:

Index 1: 'ColumnA', 'ColumnB', 'ColumnC'
Index 2: 'ColumnA', 'ColumnB'

忽略其他考虑因素,比如列宽和覆盖索引,第二个索引是多余的。谢谢。

5
我对index2的冗余性提出异议。如果columnC很宽,对于某些查询,使用index2可能更有效。此外,覆盖索引并不能使所有列顺序相同的非覆盖索引变得冗余。 - Amy B
请查看此链接:www.sql-server-performance.com - an.it.professional
5个回答

13

有些情况下冗余并不适用。例如,假设ColumnC是一个非常大的字段,但有时需要快速检索它。在这种情况下,您的index 1将不需要进行键查找:

select ColumnC from YourTable where ColumnnA = 12

另一方面,索引2要小得多,因此可以在需要索引扫描的查询中在内存中读取:

select * from YourTable where ColumnnA like '%hello%'

所以它们并不是真正的冗余。

如果您不相信我上面的论点,您可以找到像“冗余”索引这样的内容:

;with ind as (
    select  a.object_id
    ,       a.index_id
    ,       cast(col_list.list as varchar(max)) as list
    from    (
            select  distinct object_id
            ,       index_id
            from    sys.index_columns
            ) a
    cross apply
            (
            select  cast(column_id as varchar(16)) + ',' as [text()]
            from    sys.index_columns b
            where   a.object_id = b.object_id
                    and a.index_id = b.index_id
            for xml path(''), type
            ) col_list (list)
)
select  object_name(a.object_id) as TableName
,       asi.name as FatherIndex
,       bsi.name as RedundantIndex
from    ind a
join    sys.sysindexes asi
on      asi.id = a.object_id
        and asi.indid = a.index_id
join    ind b
on      a.object_id = b.object_id
        and a.object_id = b.object_id
        and len(a.list) > len(b.list)
        and left(a.list, LEN(b.list)) = b.list
join    sys.sysindexes bsi
on      bsi.id = b.object_id
        and bsi.indid = b.index_id

在性能“意外”下降的情况下,为你的用户带来蛋糕吧 :-)


1
谢谢 - 但我对你的意见是否认为索引2与索引1重复并不感兴趣。我需要的是一个T-SQL脚本,它可以提醒我某个索引可能与另一个索引重复。 - Randy Minder
4
问题并不在于我是否相信你。我的问题也不是关于何为冗余索引的定义。如果更多人能够按照所提出的问题回答就好了,而不要试图回答未被问到的问题。我只是想要一个查询,让我知道哪些索引可能是冗余的,然后我们将评估每个索引以决定采取什么行动。顺便说一句,这个查询很好! - Randy Minder
10
哇... +1 对 Andomar 的回答感到赞赏,因为不得不面对这种态度... 我非常感激那些比我更懂的人抽出时间指出我可能没有考虑或意识到的问题方面。谢谢 @Andomar。 - Tobias J

3

在受Paul Nielsen启发的情况下,我编写了这个查询来查找/区分:

  • 重复项(忽略包含顺序)
  • 多余的(不同的包含列)
  • 重叠的(不同的索引列)

并记录它们的使用情况 (也许有人想使用is_descending_key,但我不需要。)

WITH IndexColumns AS
(
    SELECT I.object_id AS TableObjectId, OBJECT_SCHEMA_NAME(I.object_id) + '.' + OBJECT_NAME(I.object_id) AS TableName, I.index_id AS IndexId, I.name AS IndexName
        , (IndexUsage.user_seeks + IndexUsage.user_scans + IndexUsage.user_lookups) AS IndexUsage
        , IndexUsage.user_updates AS IndexUpdates

       , (SELECT CASE is_included_column WHEN 1 THEN NULL ELSE column_id END AS [data()]
        FROM sys.index_columns AS IndexColumns
        WHERE IndexColumns.object_id = I.object_id
          AND IndexColumns.index_id = I.index_id
        ORDER BY index_column_id, column_id
        FOR XML PATH('')
       ) AS ConcIndexColumnNrs

       ,(SELECT CASE is_included_column WHEN 1 THEN NULL ELSE COL_NAME(I.object_id, column_id) END AS [data()]
        FROM sys.index_columns AS IndexColumns
        WHERE IndexColumns.object_id = I.object_id
          AND IndexColumns.index_id = I.index_id
        ORDER BY index_column_id, column_id
        FOR XML PATH('')
       ) AS ConcIndexColumnNames

       ,(SELECT CASE is_included_column WHEN 1 THEN column_id ELSE NULL END AS [data()]
        FROM sys.index_columns AS IndexColumns
        WHERE IndexColumns.object_id = I.object_id
        AND IndexColumns.index_id = I.index_id
        ORDER BY column_id
        FOR XML PATH('')
       ) AS ConcIncludeColumnNrs

       ,(SELECT CASE is_included_column WHEN 1 THEN COL_NAME(I.object_id, column_id) ELSE NULL END AS [data()]
        FROM sys.index_columns AS IndexColumns
        WHERE IndexColumns.object_id = I.object_id
          AND IndexColumns.index_id = I.index_id
        ORDER BY column_id
        FOR XML PATH('')
       ) AS ConcIncludeColumnNames
    FROM sys.indexes AS I
       LEFT OUTER JOIN sys.dm_db_index_usage_stats AS IndexUsage
        ON IndexUsage.object_id = I.object_id
          AND IndexUsage.index_id = I.index_id
          AND IndexUsage.Database_id = db_id() 
)
SELECT
  C1.TableName
  , C1.IndexName AS 'Index1'
  , C2.IndexName AS 'Index2'
  , CASE WHEN (C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs) AND (C1.ConcIncludeColumnNrs = C2.ConcIncludeColumnNrs) THEN 'Exact duplicate'
        WHEN (C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs) THEN 'Different includes'
        ELSE 'Overlapping columns' END
--  , C1.ConcIndexColumnNrs
--  , C2.ConcIndexColumnNrs
  , C1.ConcIndexColumnNames
  , C2.ConcIndexColumnNames
--  , C1.ConcIncludeColumnNrs
--  , C2.ConcIncludeColumnNrs
  , C1.ConcIncludeColumnNames
  , C2.ConcIncludeColumnNames
  , C1.IndexUsage
  , C2.IndexUsage
  , C1.IndexUpdates
  , C2.IndexUpdates
  , 'DROP INDEX ' + C2.IndexName + ' ON ' + C2.TableName AS Drop2
  , 'DROP INDEX ' + C1.IndexName + ' ON ' + C1.TableName AS Drop1
FROM IndexColumns AS C1
  INNER JOIN IndexColumns AS C2 
    ON (C1.TableObjectId = C2.TableObjectId)
    AND (
         -- exact: show lower IndexId as 1
            (C1.IndexId < C2.IndexId
            AND C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs
            AND C1.ConcIncludeColumnNrs = C2.ConcIncludeColumnNrs)
         -- different includes: show longer include as 1
         OR (C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs
            AND LEN(C1.ConcIncludeColumnNrs) > LEN(C2.ConcIncludeColumnNrs))
         -- overlapping: show longer index as 1
         OR (C1.IndexId <> C2.IndexId
            AND C1.ConcIndexColumnNrs <> C2.ConcIndexColumnNrs
            AND C1.ConcIndexColumnNrs like C2.ConcIndexColumnNrs + ' %')
    )
ORDER BY C1.TableName, C1.ConcIndexColumnNrs

您真应该得到一枚奖章! - ˈvɔlə

3

尝试使用下面的脚本来显示未使用的索引,希望能够帮助。

/****************************************************************
Description: Script to show Unused Indexes using DMVs
****************************************************************/

SELECT TOP 100
o.name AS ObjectName
, i.name AS IndexName
, i.index_id AS IndexID  
, dm_ius.user_seeks AS UserSeek
, dm_ius.user_scans AS UserScans
, dm_ius.user_lookups AS UserLookups
, dm_ius.user_updates AS UserUpdates
, p.TableRows
, 'DROP INDEX ' + QUOTENAME(i.name) 
+ ' ON ' + QUOTENAME(s.name) + '.' + QUOTENAME(OBJECT_NAME(dm_ius.object_id)) as 'drop statement'
FROM sys.dm_db_index_usage_stats dm_ius  
INNER JOIN sys.indexes i ON i.index_id = dm_ius.index_id AND dm_ius.object_id = i.object_id   
INNER JOIN sys.objects o on dm_ius.object_id = o.object_id
INNER JOIN sys.schemas s on o.schema_id = s.schema_id
INNER JOIN (SELECT SUM(p.rows) TableRows, p.index_id, p.object_id 
                FROM sys.partitions p GROUP BY p.index_id, p.object_id) p 
        ON p.index_id = dm_ius.index_id AND dm_ius.object_id = p.object_id
WHERE OBJECTPROPERTY(dm_ius.object_id,'IsUserTable') = 1
AND dm_ius.database_id = DB_ID()   
AND i.type_desc = 'nonclustered'
AND i.is_primary_key = 0
AND i.is_unique_constraint = 0
ORDER BY (dm_ius.user_seeks + dm_ius.user_scans + dm_ius.user_lookups) ASC
GO

1

我创建了以下查询,给出了很多有用的信息来识别重复和接近重复的索引。还包括其他信息,如索引占用的内存页数,这使我能够优先考虑较大的索引。它显示哪些列已经建立了索引,哪些列已经被包含在内,因此我可以看到是否存在两个几乎相同的索引,只有包含列略有不同。

WITH IndexSummary AS
(

SELECT DISTINCT sys.objects.name AS [Table Name],
    sys.indexes.name AS [Index Name],
    SUBSTRING((SELECT ', ' +  sys.columns.Name as [text()]
        FROM sys.columns
            INNER JOIN sys.index_columns
                ON sys.index_columns.column_id = sys.columns.column_id
                AND sys.index_columns.object_id = sys.columns.object_id
        WHERE sys.index_columns.index_id = sys.indexes.index_id
            AND sys.index_columns.object_id = sys.indexes.object_id
            AND sys.index_columns.is_included_column = 0
        ORDER BY sys.columns.name
    FOR XML Path('')), 2, 10000) AS [Indexed Column Names],
    ISNULL(SUBSTRING((SELECT ', ' +  sys.columns.Name as [text()]
        FROM sys.columns
            INNER JOIN sys.index_columns
            ON sys.index_columns.column_id = sys.columns.column_id
            AND sys.index_columns.object_id = sys.columns.object_id
        WHERE sys.index_columns.index_id = sys.indexes.index_id
            AND sys.index_columns.object_id = sys.indexes.object_id
            AND sys.index_columns.is_included_column = 1
        ORDER BY sys.columns.name
        FOR XML Path('')), 2, 10000), '') AS [Included Column Names],
    sys.indexes.index_id, sys.indexes.object_id
FROM sys.indexes
    INNER JOIN SYS.index_columns
        ON sys.indexes.index_id = SYS.index_columns.index_id
            AND sys.indexes.object_id = sys.index_columns.object_id
    INNER JOIN sys.objects
        ON sys.OBJECTS.object_id = SYS.indexES.object_id
WHERE sys.objects.type = 'U'
)

SELECT IndexSummary.[Table Name],
    IndexSummary.[Index Name],
    IndexSummary.[Indexed Column Names],
    IndexSummary.[Included Column Names],
    PhysicalStats.page_count as [Page Count],
    CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Size (MB)],
    CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Fragment %]
FROM IndexSummary
    INNER JOIN sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)
       AS PhysicalStats
        ON PhysicalStats.index_id = IndexSummary.index_id
            AND PhysicalStats.object_id = IndexSummary.object_id
WHERE (SELECT COUNT(*) as Computed
        FROM IndexSummary Summary2
        WHERE Summary2.[Table Name] = IndexSummary.[Table Name]
            AND Summary2.[Indexed Column Names] = IndexSummary.[Indexed Column Names]) > 1
ORDER BY [Table Name], [Index Name], [Indexed Column Names], [Included Column Names]

查询结果如下所示:

Results of the query look like this:

Table Name  Index   Indexed Cols    Included Cols   Pages   Size (MB)   Frag %
My_Table    Indx_1     Col1         Col2, Col3       123      0.96       8.94
My_Table    Indx_2     Col1         Col2, Col3       123      0.96       8.94

完整描述

完整解释请参见在SQL Server中识别重复或冗余索引


0

我刚刚在阅读一些MSDN博客时,注意到了一个执行此操作的脚本,并想起了这个问题。

我还没有测试它与Andomar的脚本相比是否有任何特定的好处。

不过,我可能会对两者都进行修改,以考虑评估冗余时两个索引的大小。

编辑:

另请参阅Kimberley Tripp的删除重复索引的帖子。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接