SQL解析多个子字符串

4

我有一个非常长且复杂的字符串,其中包含换行符 - 我很难解析。我需要能够创建一个查询语句,每个字段都有一个列。

理想情况下,应该找到换行符 - 对于每一行 - 回到:,冒号前面的所有内容应该是列的名称,冒号和新的换行符之间的所有内容应该是字段中的数据。

所有数据都返回为字符串,因此我只需为下面的每一行构建一个选择语句。我不确定是否可能。

第二种选择是硬编码并说类似于CHARINDEX ('Home Phone:',notes,0),在这里我找到家庭电话字符串,然后在指定字符串后拉出介于新换行符之间的所有内容。

在这种情况下,我的查询中的每个选择项都将说 - 查找字符串“家庭电话”并拉出冒号后面的内容,或查找字符串“学校名称”等。

这就是数据的外观(在一个名为notes的所有字符串中):

Home Phone: 1234567890  
Cell Phone: 1234567890  
Date of Birth: 01/01/1971 
School Name: James Jones High  School 
Address:123 Main Street 
School City: Queens  
School State: PA  
School Zip: 32112 
Years Teaching: 12  
Grade Levels: Middle School  
Total Students: 120  
Subject: Music:   
How did they hear:  Other, provide more info: Former partner teacher in the Middle School 
Type: Public/Charter   
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa  aaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaa aaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaa aaaaa aaaaaa aaaaaa aaaaaaaaaaaa aaaaaaaaaaaa aaa aaaa aaaaa aaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaa.   
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbb bbbbbbbbb bbbbbbb bbbbbb bbbbbb bbbbbbb  bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb 
Question 3: ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccccccccccccccccc cccccccccccccccccc ccccccccccc ccccccccccccc ccccccccccccccccc cccccccc

因此,输出结果应该类似于这样(每个字段中也回答了所有长问题)。
Home Phone  Cell Phone  Date of Birth:  …   Type:               Question 1 :                Question 2:    Question 3: 
1234567890  1234567890  1/1/1971            Public/Charter      aaaaaaaa aaaaaaaaaaaaa.     bbb bbbbbbbbbb ccccccccccccccccccccccc 

我不确定这是否有意义,但任何建议都非常感谢。

提取子字符串和换行符的代码 - 但这是硬编码的。我无法弄清如何动态地完成它。

SELECT  ltrim(rtrim(CHARINDEX ( 'Home Phone:' ,notes, 0) + LEN('Home Phone: '))) as 'beggining',
        ltrim(rtrim(CHARINDEX ( CHAR(10) ,notes, 0)))   as 'ending',
        SUBSTRING(notes,(CHARINDEX ( 'Home Phone:' ,notes, 0) + LEN('Home Phone: ')),(LEN('Home Phone: '))) as 'home phone',    
FROM    table a 

谢谢!

我不知道怎么做。 - Elizabeth
你的数据行数是固定的还是会变化?我认为你可能需要使用字符串分割器将其首先分成行,然后再分成列。 - Sean Lange
不要在SQL中进行文本处理,这样做非常痛苦/丑陋。请在客户端完成此操作,并将解析后的数据保存到适当规范化的数据中。 - Marc B
那么答案就是它需要是动态的。当数据点的数量可能会改变时,它需要一个动态解决方案,这样你就不必一遍又一遍地更改代码。这就是编写代码的目的,你不应该总是不断地更改它。 - Sean Lange
如果您能够确定起始位置,并且已经查看了重复项并知道如何查找换行符,则您只需要使用SUBSTRING()函数。 - Tab Alleman
显示剩余9条评论
3个回答

2
很大一部分(90%)的功劳应该归给Alex K,他提供了一个深入的答案,关于如何找到字符的第n个出现

SQL Server - 在字符串中查找第n次出现

我采用了那个答案,调整了你的问题,然后应用了PIVOT将其分成所需的行/列。 只要它们始终具有相同的逻辑(每个问题/答案由换行符分隔),此方法应该能够为您需要的任意数量的唯一问题集创建所需的输出。

--Creates temporary table for testing, ID column and second set of data
--used to ensure query works for each unique set of questions
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
    DROP TABLE #Results

CREATE TABLE #Results 
    (ID INT IDENTITY(1,1) NOT NULL,
    Notes NVARCHAR(4000) NOT NULL)
INSERT INTO #Results
    (Notes)
VALUES
    ('Home Phone: 1234567890  
    Cell Phone: 1234567890  
    Date of Birth: 01/01/1971 
    School Name: James Jones High  School 
    Address:123 Main Street 
    School City: Queens  
    School State: PA  
    School Zip: 32112 
    Years Teaching: 12  
    Grade Levels: Middle School  
    Total Students: 120  
    Subject: Music:   
    How did they hear:  Other, provide more info: Former partner teacher in the Middle School 
    Type: Public/Charter '),
    ('Home Phone: test  
    Cell Phone: test 
    Date of Birth: test
    School Name: test
    Address:test 
    School City: test 
    School State: test  
    School Zip: test 
    Years Teaching: test 
    Grade Levels: test 
    Total Students: test
    Subject: test   
    How did they hear:  test 
    Type: test ');

--Recursive CTE to determine the position of each successive line break
--Used CHARINDEX to search CHAR(13) and CHAR(10) and find line breaks and carriage returns
WITH cte
AS

    (SELECT ID, Notes, 1 AS Starts, CHARINDEX(CHAR(13)+CHAR(10),Notes) AS Pos
    FROM #Results
    UNION ALL
    SELECT ID, Notes, Pos +1, CHARINDEX(CHAR(13)+CHAR(10),Notes,Pos+1) AS Pos
    FROM cte
    WHERE
        pos >0),

--2nd CTE breaks each question set into it's own row
cte2
AS
    (SELECT ID, Notes,Starts, Pos,
        SUBSTRING(Notes, Starts,
            CASE
                WHEN pos > 0 THEN (pos - starts)
                ELSE LEN(notes)
            END) AS Token
    FROM cte),

--3rd CTE cleans up the data, separating the Questions/Answers into separate columns
--REPLACE is used to remove Line Break (CHAR(10)), output was then showing a TAB so used
--double REPLACE and removed CHAR(9) (tab)
--LTRIM removes leading space
cte3
AS
    (SELECT ID, 
        LTRIM(REPLACE(REPLACE(SUBSTRING(Token,CHARINDEX(CHAR(13)+CHAR(10),Token),CHARINDEX(':',Token)),CHAR(10),''),CHAR(9),'')) AS Question, 
        LTRIM(SUBSTRING(Token,CHARINDEX(':',Token)+1,4000)) AS Answer
    FROM cte2)

--Pivot separates each Question/Answer row into it's own column
SELECT *
FROM
    (SELECT ID, Question, Answer
    FROM cte3) AS a
PIVOT
    (MAX(Answer)
    FOR [Question] IN([Address],[Cell Phone],[Date of Birth],[Grade Levels],[Home Phone],[How did they hear],
                        [School City],[School Name],[School State],[School Zip],[Subject],[Total Students],[Type],[Years Teaching])) AS pvt

我在每个部分都加了注释,希望能解释我的逻辑,如果有任何问题,请告诉我。

编辑:动态透视表

可以使用动态 SQL 创建一个透视表,自动选择所有“问题”列并相应地进行调整。我认为不能在一步完成,因为我必须使用多个 CTE。我会采取上述用于创建 CTE、CTE2 和 CTE3(基本上是在透视查询之前的所有步骤)的步骤,并创建这些步骤的 VIEW,然后使用该 VIEW 进行以下操作(对于我的示例,该 VIEW 名称为“Questionaire”)。

DECLARE @columns AS NVARCHAR(MAX)
DECLARE @query AS NVARCHAR(MAX)

SET @columns =  STUFF((SELECT DISTINCT ',' + QUOTENAME(q.question)
        FROM questionaire AS q
        FOR XML PATH(''), TYPE
        ).value('.','NVARCHAR(MAX)')
        ,1,1,'')

SET @query =    'SELECT ID, '+ @columns +' FROM
        (
            SELECT ID, Answer, Question
            FROM questionaire
        ) AS a
        PIVOT
        (
            MAX(Answer)
            FOR Question IN(' +@columns+')
        ) AS p'
EXECUTE(@query)

这很好运作 —— 我想问一下,我稍微添加/修改了逻辑。最后我创建了一个临时表 SELECT distinct ('[' +Question +']') as 'Question',将列标题放入其中,并尝试将其添加到我的最终透视表中。在透视表的 For 语句中,我有 FOR [Question] IN( Select Question from #temp )) AS p,但它似乎不太满意。 - Elizabeth
是的,PIVOT 不允许您像那样使用 SELECT 语句声明列。使用动态 SQL 可能实现,我编辑了我的答案,提供了一个查询,展示我如何实现它。 - Jericho
我的视图必须包含所有三个CTE吗?CREATE VIEW questionaire AS WITH cte1 AS (...), cte2 AS (...), cte3 AS (...) SELECT ... - Elizabeth
还有一个问题 - 如果我创建一个临时表(#questionable)并将cte3的内容放入其中,然后从那里提取查询,这会有什么不利影响吗? - Elizabeth
临时表应该可以正常工作 - 我只在回答 Stack Overflow 的问题时使用临时表,所以我甚至没有考虑过它。 :) - Jericho
太好了,这个完美地运作了(然后我们意识到我们只有在预先知道列的情况下才能构建报告),所以我们将保留这个逻辑供内部使用,然后在数据透视表之前停止,并将非透视数据发送到Visual Studio SSRS包中,在系统中进行透视(这样我们可以使其动态化)。再次感谢您的帮助! - Elizabeth

0

我知道这里有很多人不喜欢这个分隔符,但这是我喜欢的。它只能处理多达8000个输入值,并且分隔符只是一个单一字符。然而,它具有其他一些分隔符没有的好处,除非您有大量的输入,否则几乎可以胜任所有任务。 您可以在此处找到代码。http://www.sqlservercentral.com/articles/Tally+Table/72993/ 评论(需要登录)跨足了很多页,对这个分隔符进行了非常长的讨论。

其他人更喜欢使用透视表这样的东西,但我更喜欢交叉表(也称为条件聚合),因为我发现语法不太晦涩。

我冒昧稍微修改了您的样本数据。我更改了手机的价值,使其与家庭电话不同。我还缩短了问题的回答,因为它们不需要成百上千个字符来演示技术。

declare @SomeValue varchar(8000)

set @SomeValue = 'Home Phone: 1234567890  
Cell Phone: 3344556677
Date of Birth: 01/01/1971 
School Name: James Jones High  School 
Address:123 Main Street 
School City: Queens  
School State: PA  
School Zip: 32112 
Years Teaching: 12  
Grade Levels: Middle School  
Total Students: 120  
Subject: Music:   
How did they hear:  Other, provide more info: Former partner teacher in the Middle School 
Type: Public/Charter   
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa.
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb
Question 3: ccccccccccccccccccccccc cccccccc';

select 
    MAX(case when s.ItemNumber = 1 then x.Item end) as HomePhone
    , MAX(case when s.ItemNumber = 2 then x.Item end) as DOB
    , MAX(case when s.ItemNumber = 3 then x.Item end) as DOB
    , MAX(case when s.ItemNumber = 4 then x.Item end) as SchoolName
    , MAX(case when s.ItemNumber = 5 then x.Item end) as SchoolAddress
    , MAX(case when s.ItemNumber = 6 then x.Item end) as SchoolCity
    , MAX(case when s.ItemNumber = 7 then x.Item end) as SchoolState
    , MAX(case when s.ItemNumber = 8 then x.Item end) as SchoolZip
    , MAX(case when s.ItemNumber = 9 then x.Item end) as YearsTeaching
    , MAX(case when s.ItemNumber = 10 then x.Item end) as GradeLevels
    , MAX(case when s.ItemNumber = 11 then x.Item end) as TotalStudents
    , MAX(case when s.ItemNumber = 12 then x.Item end) as Subject
    , MAX(case when s.ItemNumber = 13 then x.Item end) as HowHeard
    , MAX(case when s.ItemNumber = 14 then x.Item end) as SchoolType
    , MAX(case when s.ItemNumber = 15 then x.Item end) as Question1
    , MAX(case when s.ItemNumber = 16 then x.Item end) as Question2
    , MAX(case when s.ItemNumber = 17 then x.Item end) as Question3
from dbo.DelimitedSplit8K(@SomeValue, CHAR(10)) s
cross apply dbo.DelimitedSplit8K(s.Item, ':') x

0
你可以像这样尝试使用xml,但我会在musicprovide more info之后去掉额外的:
DECLARE @string nvarchar(max) = '
Home Phone: 1234567890  
Cell Phone: 1234567890  
Date of Birth: 01/01/1971 
School Name: James Jones High  School 
Address:123 Main Street 
School City: Queens  
School State: PA  
School Zip: 32112 
Years Teaching: 12  
Grade Levels: Middle School  
Total Students: 120  
Subject: Music   
How did they hear:  Other, provide more info, Former partner teacher in the Middle School 
Type: Public/Charter   
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa  aaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaa aaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaa aaaaa aaaaaa aaaaaa aaaaaaaaaaaa aaaaaaaaaaaa aaa aaaa aaaaa aaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaa.   
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbb bbbbbbbbb bbbbbbb bbbbbb bbbbbb bbbbbbb  bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb 
Question 3: ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccccccccccccccccc cccccccccccccccccc ccccccccccc ccccccccccccc ccccccccccccccccc cccccccc'
,@xml as xml

SELECT @xml = REPLACE ('<mystring><fieldname id="'+REPLACE(REPLACE(right(@string,LEN(@string)-2),':','" >'),CHAR(10),'</fieldname><fieldname id="')+'</fieldname></mystring>' ,CHAR(13),'')

SELECT
    n.v.value('(fieldname[@id="Home Phone"])[1]','NVARCHAR(11)') AS 'Home Phone',
    n.v.value('(fieldname[@id="Cell Phone"])[1]','NVARCHAR(11)') AS 'Cell Phone',
    n.v.value('(fieldname[@id="Date of Birth"])[1]','NVARCHAR(12)') AS 'Date of Birth',
    n.v.value('(fieldname[@id="School Name"])[1]','NVARCHAR(30)') AS 'School Name',
    n.v.value('(fieldname[@id="Address"])[1]','NVARCHAR(30)') AS 'Address',
    n.v.value('(fieldname[@id="School City"])[1]','NVARCHAR(15)') AS 'School City',
    n.v.value('(fieldname[@id="School State"])[1]','NVARCHAR(10)') AS 'School State',
    n.v.value('(fieldname[@id="School Zip"])[1]','NVARCHAR(6)') AS 'School Zip',
    n.v.value('(fieldname[@id="Years Teaching"])[1]','NVARCHAR(5)') AS 'Years Teaching',
    n.v.value('(fieldname[@id="Grade Levels"])[1]','NVARCHAR(15)') AS 'Grade Levels',
    n.v.value('(fieldname[@id="Total Students"])[1]','NVARCHAR(5)') AS 'Total Students',
    n.v.value('(fieldname[@id="How did they hear"])[1]','NVARCHAR(100)') AS 'How did they hear',
    n.v.value('(fieldname[@id="Type"])[1]','NVARCHAR(25)') AS 'Type',
    n.v.value('(fieldname[@id="Question 1"])[1]','NVARCHAR(128)') AS 'Question 1',
    n.v.value('(fieldname[@id="Question 2"])[1]','NVARCHAR(128)') AS 'Question 2',
    n.v.value('(fieldname[@id="Question 3"])[1]','NVARCHAR(128)') AS 'Question 3'
FROM @xml.nodes('mystring') as n(v);

结果:

    Home Phone  Cell Phone  Date of Birth School Name                    Address                        School City     School State School Zip Years Teaching Grade Levels    Total Students How did they hear                                                                                    Type                      Question 1                                                                                                                       Question 2                                                                                                                       Question 3
----------- ----------- ------------- ------------------------------ ------------------------------ --------------- ------------ ---------- -------------- --------------- -------------- ---------------------------------------------------------------------------------------------------- ------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------
 1234567890  1234567890  01/01/1971    James Jones High  School      123 Main Street                 Queens          PA           32112      12             Middle School   120             Other, provide more info, Former partner teacher in the Middle School                               Public/Charter            aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa  aaaaaaaaaaaaaaaaa  bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb   ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccc

(1 row(s) affected)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接