按列分组以确切的单词、相同的前两个单词或相同的前三个单词进行分组。

4

我正在尝试按具有完全相同单词或相同前两个单词或相同前三个单词的列进行分组。

tblsample:

name             
Brown sugar  
Brown sugar Creme
Tiger zoo
Tiger  
Blue red green
Blue red green yellow

预期结果:

name              cntnme
Brown sugar         2
Tiger zoo           1
Tiger               1
Blue red green      2

我仅能按照完全相同的单词进行分组。
Select  name, count(name) as cntnme from tblsample group by name

1
为什么在不计数的情况下使用计数? - Amira Bedhiafi
我想按组计算相同名称 - Shayne David
对不起,我改进了预期的输出。谢谢。 - Shayne David
微软 SQL Server 2017 - Shayne David
为什么“棕糖奶油”没有出现在结果中?为什么是“老虎”? - Gordon Linoff
显示剩余5条评论
3个回答

3
听起来你想使用SOUNDEX(Transact-SQL)运算符,例如:
create table dbo.Names (
  Name nvarchar(50) not null
);
insert dbo.Names values
  ('Brown sugar'),
  ('Brown sugar Creme'),
  ('Tiger zoo'),
  ('Tiger'),
  ('Blue red green'),
  ('Blue red green yellow'),
  ('Blue red green yellow orange');

select Name=min(Name), CntNme=count(1)
from dbo.Names
group by soundex(Name);

Name            CntNme
--------------- ------
Blue red green  3
Brown sugar     2
Tiger           2

1
我根本不理解这些点赞。Soundex(https://learn.microsoft.com/en-us/sql/t-sql/functions/soundex-transact-sql?view=sql-server-ver15)与识别*相同*单词无关。它的目的基本上是相反的。 - Gordon Linoff
这种技术会存在不一致性(例如:尝试插入一个包含“蓝色红色”的行和另一个包含“蓝色参考”的行)。 - George Joseph
在这种技术中,它会被计算两次。 - George Joseph

1
我正在尝试按照相同的单词、相同的前两个单词或相同的前三个单词对列进行分组。
这表明可以采用以下方法:
with words2_3 as (
      select t.*
      from t
      where name like '% %' and name not like '% % % %'
     )
select w.name, count(*)
from t join
     words2_3 w
     on t.name + ' ' like w.name + ' %'
group by w.name;

这里有一个db<>fiddle。

您的结果与问题陈述不一致,因此这些结果与您的陈述不符。


1
以下是如何完成此操作的示例。
create table dbo.t(x varchar(30))

insert into dbo.t values('Brown sugar')
insert into dbo.t values('Brown sugar Creme')
insert into dbo.t values('Tiger zoo')
insert into dbo.t values('Tiger')
insert into dbo.t values('Blue red green')
insert into dbo.t values('Blue red green yellow')
insert into dbo.t values('Blue red green yellow green')

--First  i identify the records that have more than one word using space as a delimiter and then i look them up with words that match using a like condition.

with data
  as (select x, x as parent_word
        from dbo.t
        where charindex(' ',x) <> 0
      union all
      select b.x,a.parent_word
        from data a
        join dbo.t b 
          on a.x <> b.x
         and b.x like concat(a.x,'%')
      )
--In this block i find out the parent_word associated wit each of the words, and use min to identify them out. In this example (Blue,red green yellow),(Blue,red,green,yellow,green) and (Blue,red,gree) would all be segregated by the lowest value of parent which is (Blue,red,green)
    ,data_parent_word
    as (
        select x,min(parent_word) as min_value
          from data
        group by x
        )
 select min_value as word_val, count(x) as cnt
   from data_parent_word
 group by min_value
  union all
  select x,1
    from dbo.t 
   where charindex(' ',x) = 0

最后,我使用union all将“单词列表”中获得的记录与所获得的内容合并。
这将产生以下输出。
+----------------+----------+
|    word_val    |    cnt   |
+----------------+----------+
| Blue red green |        3 |
| Brown sugar    |        2 |
| Tiger zoo      |        1 |
| Tiger          |        1 |
+----------------+----------+

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接