在Oracle中从逗号分隔的字符串中删除重复值

4
我需要你帮忙处理regexp_replace函数。我有一个表格,其中一列是包含重复值的拼接字符串。如何消除它们?
例子:
Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha

我需要输出结果。
Ian,Beatty,Larry,Neesha

这些重复项是随机的,没有特定顺序。

更新--

这是我的表格样式:

ID   Name1   Name2    Name3     
1     a       b         c
1     c       d         a
2     d       e         a
2     c       d          b

我需要每个ID的一行,其中包含一个由逗号分隔的唯一name1、name2和name3字符串。
ID    Name
1     a,c,b,d,c
2     d,c,e,a,b

我尝试使用listagg和distinct,但无法去除重复项。


1
使用适当的连接表或嵌套表而不是逗号分隔列表是一个很好的理由。祝你好运。 - Gordon Linoff
这似乎是一个重复的问题,与这个相同。 - Dave
模式不同,无法与我的数据配合使用。重复项仍然存在。 - Cindy
5个回答

1

我会选择最简单的选项 -

SELECT ID, LISTAGG(NAME_LIST, ',')
  FROM (SELECT ID, NAME1 NAME_LIST FROM DATA UNION
        SELECT ID, NAME2 FROM DATA UNION
        SELECT ID, NAME3 FROM DATA
      )
GROUP BY ID;

Demo.


0

在这种情况下有一种方法可以找到重复项,但是如果在同一ID的字符串中存在一个以上的重复名称,则删除它们会成为问题。 这里是可以处理每个ID一个重复项的代码。
示例数据:

WITH
    tbl AS
        (
            Select 1 "ID", 'a' "NAME_1", 'b' "NAME_2", 'c' "NAME_3" From Dual Union All
            Select 1 "ID", 'c' "NAME_1", 'd' "NAME_2", 'a' "NAME_3" From Dual Union All
            Select 2 "ID", 'd' "NAME_1", 'e' "NAME_2", 'a' "NAME_3" From Dual Union All
            Select 2 "ID", 'c' "NAME_1", 'd' "NAME_2", 'b' "NAME_3" From Dual 
        ),
    lists AS
        (
            Select 1 "ID", 'a,c,b,d,c' "NAME" From Dual Union All
            Select 2 "ID", 'd,c,e,a,b' "NAME" From Dual  
        ),

创建CTE,将您的LISTAGG字符串与原始数据进行比较,找到重复值:
  grid AS
    (
        Select DISTINCT l.ID, l.NAME,
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_1 || ',', '')) ) / Length(t.NAME_1 || ',') > 1 THEN NAME_1 END  "NAME_1",
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_2 || ',', '')) ) / Length(t.NAME_2 || ',') > 1 THEN NAME_2 END  "NAME_2",
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_3 || ',', '')) ) / Length(t.NAME_3 || ',') > 1 THEN NAME_3 END  "NAME_3"
        From
            lists l
        Inner Join
            tbl t ON(t.ID = l.ID) 
    )

        ID NAME      NAME_1 NAME_2 NAME_3
---------- --------- ------ ------ ------
         2 d,c,e,a,b                      
         1 a,c,b,d,c c                    
         1 a,c,b,d,c               c     

主要SQL,使用Union组合语句,构建新字符串(删除第二次出现)并与原有字符串进行比较后将该新字符串放在后面。

SELECT DISTINCT l.ID, Nvl(g.NAME, l.NAME) NAME
FROM
    lists l
LEFT JOIN
    (
        SELECT ID,  CASE  WHEN NAME_1 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_1, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_1, 1, 2) + Length(NAME_1)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    UNION ALL
        SELECT ID,  CASE  WHEN NAME_2 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_2, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_2, 1, 2) + Length(NAME_2)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    UNION ALL
        SELECT ID,  CASE  WHEN NAME_3 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_3, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_3, 1, 2) + Length(NAME_3)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    ) g ON(g.ID = l.ID And Length(g.NAME) < Length(l.NAME))

R e s u l t :
        ID NAME         
---------- -------------
         2 d,c,e,a,b    
         1 a,c,b,d     

对于字符串中的多个出现或多个不同名称,应该进行一些递归或嵌套操作来完成它...


0

所以,试一下这个...

([^,]+),(?=.*[A-Za-z],[] ]*\1)

0

如果重复的值不相邻,我认为你不能仅使用regexp_replace来完成。一种方法是将值拆分,消除重复项,然后重新组合。

将分隔符分隔的字符串标记化的常见方法是使用regexp_substrconnect by子句。使用绑定变量与您的字符串一起使用可以使代码更清晰:

var value varchar2(100);
exec :value := 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha';

select regexp_substr(:value, '[^,]+', 1, level) as value
from dual
connect by regexp_substr(:value, '[^,]+', 1, level) is not null;

VALUE                        
------------------------------
Ian                           
Beatty                        
Larry                         
Neesha                        
Beatty                        
Neesha                        
Ian                           
Neesha                        

你可以将其用作子查询(或CTE),获取其中不同的值,然后使用listagg重新组装它:

select listagg(value, ',') within group (order by value) as value
from (
  select distinct value from (
    select regexp_substr(:value, '[^,]+', 1, level) as value
    from dual
    connect by regexp_substr(:value, '[^,]+', 1, level) is not null
  )
);

VALUE                        
------------------------------
Beatty,Ian,Larry,Neesha       

如果你正在查看表中的多行数据,那么使用connect-by语法会变得更加复杂,但是你可以使用非确定性引用来避免循环:

with t42 (id, value) as (
  select 1, 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha' from dual
  union all select 2, 'Mary,Joe,Mary,Frank,Joe' from dual
)
select id, listagg(value, ',') within group (order by value) as value
from (
  select distinct id, value from (
    select id, regexp_substr(value, '[^,]+', 1, level) as value
    from t42
    connect by regexp_substr(value, '[^,]+', 1, level) is not null
    and id = prior id
    and prior dbms_random.value is not null
  )
)
group by id;

        ID VALUE                        
---------- ------------------------------
         1 Beatty,Ian,Larry,Neesha       
         2 Frank,Joe,Mary                

当然,如果您正确地存储关系数据,则不需要这样做;在列中具有分隔字符串并不是一个好主意。

我会尝试一下并让您知道结果...实际上,数据并不存在于分隔字符串中。它来自每个ID的多行,并且我已经使用listagg将它们连接成每个ID的1行。 - Cindy
1
@Cindy - 所以为什么你不在调用 listagg 之前获取不同的值呢? - Alex Poole

0
使用这个功能,对我来说很有效。
DECLARE  
input_string varchar2(255);
merged_users VARCHAR2(4000);
merged_list VARCHAR2(4000);

BEGIN
input_string:='abc3,abc1,abc2,abc3,abc2,abc4';

 -- Remove leading and trailing commas from input_string
input_string := TRIM(',' FROM input_string);

 -- Split the input_string into individual elements
 WITH data AS (
 SELECT TRIM(REGEXP_SUBSTR(input_string, '[^,]+', 1, LEVEL)) AS token
 FROM dual
 CONNECT BY LEVEL <= REGEXP_COUNT(input_string, '[^,]+')
 ),

 -- Select distinct tokens and concatenate them
distinct_data AS (
SELECT DISTINCT token
FROM data
)
SELECT LISTAGG(token, ',') WITHIN GROUP (ORDER BY 1) INTO merged_users
FROM distinct_data;

DBMS_OUTPUT.PUT_LINE(merged_users);

END;
/

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接