SQL:将一列拆分成多个单词以搜索用户输入

10

我希望将用户输入的单词与表中某列中的单词进行逐一比较。

例如,考虑我表格中的这些行:

ID Name
1  Jack Nicholson
2  Henry Jack Blueberry
3  Pontiac Riddleson Jack
考虑用户输入为“Pontiac Jack”。我想为每个匹配分配权重/排名,因此我不能使用简单的LIKE(WHERE Name LIKE @SearchString)。如果任何行中存在Pontiac,则赋予其10分。每个匹配的Jack获得另外10分等等。因此,第3行将获得20分,第1行和第2行将获得10分。我已将用户输入拆分为单个单词,并将它们存储到一个临时表@SearchWords(Word)中。但是,我无法想出一种SELECT语句,使我能够将其组合起来。也许我正在错误的方法上思考?

1
你考虑过使用SQL Server全文搜索吗? - Mitch Wheat
是的,我有 - 它对我们来说效果不佳,并且很难按照我们的要求进行定制。 - Donnie Thomas
1
+1 用于全文搜索 - 不一定是SQL Server,例如lucene.net。 - no_one
5个回答

1
针对SQL Server,可以尝试以下内容:
SELECT Word, COUNT(Word) * 10 AS WordCount
FROM SourceTable
INNER JOIN SearchWords ON CHARINDEX(SearchWords.Word, SourceTable.Name) > 0
GROUP BY Word

不错,优雅的解决方案。我想原帖中的表格应该有一些将单个单词与原始搜索短语联系起来的东西 - 因此,获取整个短语的分数就像将短语连接到这个表格上,并按整个短语分组的单词计数求和一样简单。顺便说一下,用户名很好,是我最喜欢的xkcd之一 :) - user377136

0

这个怎么样?(这是MySQL语法,我认为你只需要用+替换CONCAT即可)

SELECT names.id, count(searchwords.word) FROM names, searchwords WHERE names.name LIKE CONCAT('%', searchwords.word, '%') GROUP BY names.id

然后您将获得一个SQL结果,其中包含名称表的ID以及与该ID匹配的单词计数。


0

您可以通过一个公共表达式来计算权重。例如:

--** Set up the example tables and data
DECLARE @Name TABLE (id INT IDENTITY, name VARCHAR(50));
DECLARE @SearchWords TABLE (word VARCHAR(50));

INSERT INTO @Name
        (name)
VALUES  ('Jack Nicholson')
       ,('Henry Jack Blueberry')
       ,('Pontiac Riddleson Jack')
       ,('Fred Bloggs');

INSERT INTO @SearchWords
        (word)
VALUES  ('Jack')
       ,('Pontiac');

--** Example SELECT with @Name selected and ordered by words in @SearchWords
WITH Order_CTE (weighting, id)
AS (
    SELECT COUNT(*) AS weighting
         , id 
      FROM @Name AS n
      JOIN @SearchWords AS sw
        ON n.name LIKE '%' + sw.word + '%' 
     GROUP BY id
)
SELECT n.name
     , cte.weighting
  FROM @Name AS n
  JOIN Order_CTE AS cte
    ON n.id = cte.id
 ORDER BY cte.weighting DESC;

使用这种技术,您还可以为每个搜索词应用一个值。因此,您可以使Jack比Pontiac更有价值。这将类似于以下内容:
--** Set up the example tables and data
DECLARE @Name TABLE (id INT IDENTITY, name VARCHAR(50));
DECLARE @SearchWords TABLE (word VARCHAR(50), value INT);

INSERT INTO @Name
        (name)
VALUES  ('Jack Nicholson')
       ,('Henry Jack Blueberry')
       ,('Pontiac Riddleson Jack')
       ,('Fred Bloggs');

--** Set up search words with associated value
INSERT INTO @SearchWords
        (word, value)
VALUES  ('Jack',10)
       ,('Pontiac',20)
       ,('Bloggs',40);


--** Example SELECT with @Name selected and ordered by words and values in @SearchWords
WITH Order_CTE (weighting, id)
AS (
    SELECT SUM(sw.value) AS weighting
         , id 
      FROM @Name AS n
      JOIN @SearchWords AS sw
        ON n.name LIKE '%' + sw.word + '%' 
     GROUP BY id
)
SELECT n.name
     , cte.weighting
  FROM @Name AS n
  JOIN Order_CTE AS cte
    ON n.id = cte.id
 ORDER BY cte.weighting DESC;      

0
像这样的东西怎么样...
Select id, MAX(names.name), count(id)*10 from names
inner join @SearchWords as sw on 
    names.name like '%'+sw.word+'%'
group by id 

假设有一个名为“names”的表格。

0

在我看来,最好的做法是维护一个包含所有单词的独立表。例如:

ID     Word       FK_ID
1      Jack       1
2      Nicholson  1
3      Henry      2
(etc)

这个表将通过触发器进行更新,并且您需要在“Word”、“FK_ID”上创建一个非聚集索引。然后生成权重的SQL语句将变得简单而高效。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接