如何将CSV字符串转换为SQL表格?

5
假设你想将CSV格式的字符串导入到SQL服务器的表中,但你没有文件,只有这个字符串。在我们的情况下,我们的设计不允许我们使用内置的文件导入功能(即我们实际上不能访问该文件,只有一个CSV格式的字符串;如果您能访问该文件,请参见Underhill Julian的回复以节省自己的麻烦)。
经过几个小时的谷歌搜索,我惊讶地发现在SQL Server 2017中并没有内置函数可以做到这一点。此外,我找不到任何人发布的函数能够在实际场景中完成这项任务(除了传递一个完美格式化的字符串之外)。问题在于这并不像简单地按逗号和换行符拆分一个字符串那么容易。如果一个列中有逗号或换行符,会发生什么?
经过长时间的挫折和苦战,终于有了成果。下面是一个真正动态的过程,它可以读取任何CSV并输出一个动态表。请参阅下面的答案。

如果我正确地阅读了那个过程..你仍然需要手动创建最终表,对吗? - Nick.McDermaid
是的,如果你想创建一个新表的话。在我们的情况下,我们实际上是想要将记录加载到现有的表中。这个过程被创建为一个实用程序,所以我们可以在其他格式的CSV字符串中重复使用它。如果你从另一个过程中调用它,你可以使用它传回的表来插入、更新或创建新的表,或者简单地修改这个过程来实际创建一个表而不是使用临时表。 - rphello101
3个回答

2
    /*
    
    --- Description ---
    
    CSV text exported from Excel allows cells to contain commas and line breaks, so splitting
    the CSV isn't as simple as splitting on comma and end-of-line. We need to temporarly
    replace any LF (line feed) or comma characters that are in a cell so that we can correctly
    split the CSV string. 
    
    --- Legend ---
            
    CHAR(10) = LF (line feed/end of line)
    CHAR(13) = CR (carraige return/end of line)
    CHAR(128) = Represents line feed within quoted column
    CHAR(129) = Respesents comma within quoted column
    CHAR(130) = Represents quoted text within quoted column
    
    --- Instructions ---
    
    1. This assumes you will create this procedure exactly (so you can reuse it) and build a separate import procedure. 
    In your procedure, get the @csv text from the file (or whatever your source is). Ex:
    
    DECLARE @csv NVARCHAR(MAX)
    
    select @csv = <get your CSV string how ever you need to>
    
    2. In your procedure, create your temporary table based on the CSV data (must match 
    what you expect to be returned) Ex:
    
    CREATE TABLE #import(
        ...
    )
    
    3. In your procedure, call this procedure and insert the restults. Ex:
    
    INSERT INTO #import EXEC sp_CSVtoTable @csv 

    4. Now you're ready to do whatever you need with the data (insert into permanent table, etc.)     
    */
    
    -- exec sp_00_CSVtoTable
    CREATE PROCEDURE [dbo].[sp_00_CSVtoTable](
        @csv NVARCHAR(MAX)
    )
    AS
    BEGIN
        SET NOCOUNT ON;
        SET ANSI_WARNINGS OFF
    
        /*************************************
        Clean Text
        **************************************/
        SELECT @csv = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@csv
            , CHAR(239), '') --Some funky character at the beginning of most CSV strings pulled from Excel
            , CHAR(187), '')
            , CHAR(191), '')
            , CHAR(13), CHAR(10)) --Windows line ending is CR LF; remove CR and keep LF for consitency across OS
            , CHAR(10) + CHAR(10), CHAR(10)) --Replace LF LF with LF
    
        
        /*************************************
        Replace LF and commas in-cell with
        temporary characters
        **************************************/
        
        --Separate lines that are surrounded by quotes
        DECLARE @csvReplaceTextInCell TABLE(
            row INT IDENTITY(1,1)
            , m_text NVARCHAR(MAX)
        )
    
        SELECT @csv = REPLACE(@csv, '""', CHAR(130)) --Double quotes represent an escaped quotation mark (a quotation mark used within a cell)
    
        INSERT INTO @csvReplaceTextInCell SELECT value FROM STRING_SPLIT(@csv, '"') -- Each quoted column will be on a separate row. The rows preceding and following will be unquoted
        
        UPDATE @csvReplaceTextInCell SET m_text = REPLACE(REPLACE(m_text
                                                    , CHAR(10), CHAR(128)) --Replace LF on quoted lines with temp character
                                                    , ',', CHAR(129)) --Replace commas on quoted lines with temp character
        WHERE row % 2 = 0 --Every other row will be a quoted column (i.e. even rows will be quoted columns)
    
        --Stuff and string_agg seem to be very unreliable. Re-concatenate string the old fashioned way
        DECLARE @thisrow INT, @totalrows INT
    
        SELECT @thisrow = 1, @totalrows = COUNT(*), @csv = ''
        FROM @csvReplaceTextInCell
    
        WHILE @thisrow <= @totalrows
        BEGIN
            SELECT @csv = @csv + m_text
            FROM @csvReplaceTextInCell
            WHERE row = @thisrow
            
            SELECT @thisrow = @thisrow + 1
        END
        
        /*************************************
        Split CSV string into rows
        **************************************/
    
        CREATE TABLE  #csvRows (
            attdata NVARCHAR(MAX)
        )
        
        INSERT INTO #csvRows SELECT value FROM STRING_SPLIT(@csv, CHAR(10))
    
        DELETE FROM #csvRows WHERE attdata = '' OR attdata = ',' --remove blank row
    
        ALTER TABLE #csvRows ADD row INT IDENTITY(1,1)
    
        /*************************************
        Create the table
        **************************************/
    
        CREATE TABLE #csvTable(
            row INT
        )
    
        DECLARE @sql NVARCHAR(MAX)= 'ALTER TABLE #csvTable ADD '
    
        SELECT @sql = @sql + 'c' + CONVERT(VARCHAR, ROW_NUMBER() OVER (ORDER BY value)) + ' NVARCHAR(MAX),' 
        FROM STRING_SPLIT((SELECT attdata FROM #csvRows WHERE row = 1), ',')
    
        SELECT @sql = LEFT(@sql, LEN(@sql) - 1)
    
        --PRINT @sql
    
        EXEC (@sql)
    
        ALTER TABLE #csvTable DROP COLUMN row
    
        /*************************************
        Split CSV string into columns for each row
        **************************************/
    
        SELECT @totalrows = COUNT(*), @thisrow = 1 FROM #csvRows
    
        WHILE @thisrow <= @totalrows
        BEGIN
            SELECT @sql = 'INSERT INTO #csvTable SELECT '
    
            SELECT @sql = @sql + '''' 
                + REPLACE(REPLACE(REPLACE(REPLACE(value
                    , '''', '''''')
                    , CHAR(128), CHAR(10)) --Re-add the line break
                    , CHAR(129), ',') --Re-add commas
                    , CHAR(130), '"') --Re-add quotation mark
                + ''','
            FROM STRING_SPLIT((SELECT attdata FROM #csvRows WHERE row = @thisrow), ',')
    
            SELECT @sql = LEFT(@sql, LEN(@sql) - 1)
    
            --PRINT @sql
            
            BEGIN TRY
                EXEC (@sql)
            END TRY
            BEGIN CATCH
                PRINT 'Error on row ' + CONVERT(VARCHAR, @thisrow) + ': ' + ERROR_MESSAGE()
                PRINT @SQL
                SELECT * FROM #csvRows
                RETURN
            END CATCH
            
            SET @thisrow = @thisrow + 1
        END
    
        /*************************************
        Final output
        **************************************/
    
        SELECT * FROM #csvTable
        
        DROP TABLE #csvTable
       
    END

感谢您提供的解决方案,虽然已经有两年了,但这是一个非常好的解决方案。我的赞同不足以表达我的感激之情。我不得不删除I字符,因为它会将我的排序中的所有I都去掉,但它确实非常有效。rphello101做得非常出色! - u07ch
还要考虑一个@jsonTable NVARCHAR(MAX) OUTPUT参数,并将倒数第二个like改为SET @jsonTable = (SELECT * FROM #csvTable FOR JSON PATH)。使用JSON输出,你可能需要处理更少的临时表预配置。 - undefined

2
一些 CSV 文件与 JSON 格式非常接近,因此在 SQL 2017 及以上版本中可以利用这些函数(即如果所有字符串都带引号)。如果您知道有多少列,则此版本将适用。如果您需要动态列数,则可以将动态 SQL 附加到 openjson 的结果而不是 pivot。
Declare @newline char(2) = CHAR(13) + CHAR(10)

Declare @csv nvarchar(4000) = 
'1,2,3,"2020-01-01","fred"
4,5,6,"2020-01-03","Sarah"
7,8,9,"2020-01-05","Jo"'

Select pvt.*
from
(
    Select row_no = [rows].[key]
    ,      col_no = [cols].[key]
    ,       cols.[value]
    from (
        Select [json] = '[[' + replace(@csv,@newline,'],[') + ']]'
    ) step1
    cross apply openjson([json]) [rows]
    cross apply openjson([rows].[value]) cols
) base
pivot (
    max(base.[value])
    for base.col_no in ([0],[1],[2],[3],[4])
) pvt

CSV通常包含未加引号的文本,例如1,2,3,some text here,4,5。只有在文本中包含逗号或换行符(字段和记录分隔符)时才需要引用,并且在所有其他情况下都是可选的。JSON字符串中不能有换行符,必须使用\n进行引用,并且反斜杠本身必须用\进行引用,因此转换可能会变得非常混乱... - wqw
1
是的,说得好。这只适用于“友好”的 CSV,其中所有文本都已被引用,而这在格式标准中并不是必需的。 - James Mc
使用此方法在SSRS中传递多值参数,这个解决方案非常巧妙,可以成功地反序列化复杂的参数。干得好,詹姆斯。非常感谢 :) - ViKiNG

1
有很多方法来做到这一点。
1.
bcp dbo.ImportTest in 'C:\ImportData.txt' -T -SserverName\instanceName

2.
IF OBJECT_ID('dbo.SampleCSVTable') IS NOT NULL
    DROP TABLE dbo.SampleCSVTable
GO
 
CREATE TABLE dbo.SampleCSVTable
(
    PersonID INT,
    FullName VARCHAR(512),
    PreferredName VARCHAR(512),
    SearchName VARCHAR(512),
    IsPermittedToLogon BIT,
    LogonName VARCHAR(512)
)
GO

BULK INSERT dbo.SampleCSVTable
FROM 'C:\Sample CSV File.txt'
WITH
(
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n'
)
GO

3. 
INSERT INTO dbo.ImportTest
SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\ImportData.xls', [Sheet1$])

OR 
INSERT INTO dbo.ImportTest
SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\ImportData.xls', 'SELECT * FROM [Sheet1$]')

4.
INSERT INTO dbo.ImportTest
SELECT * FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\ImportData.xls;Extended Properties=Excel 8.0')...[Sheet1$]

5. 
EXEC sp_addlinkedserver 'ImportData',
   'Jet 4.0', 'Microsoft.Jet.OLEDB.4.0',
   'C:\ImportData.xls',
   NULL,
   'Excel 8.0'
GO

5. 
INSERT INTO dbo.ImportTest
SELECT *
FROM OPENQUERY(ImportData, 'SELECT * FROM [Sheet1$]')

6. POWERSHELL
Function CreateStagingTable($location, $file, $extension, $server, $database)
{
    $full = $location + $file + $extension
    $all = Get-Content $full
    $columns = $all[0]
    $columns = $columns.Replace(" ","")
    $columns = $columns.Replace(",","] VARCHAR(100), [")
    $table = "CREATE TABLE " + $file + "([" + $columns + "] VARCHAR(100))"
    $connection = New-Object System.Data.SqlClient.SqlConnection
    $buildTable = New-Object System.Data.SqlClient.SqlCommand
    $connection.ConnectionString = "Data Source=" + $server + ";Database=" + $database + ";integrated security=true"
    $buildTable.CommandText = $table
    $buildTable.Connection = $connection
    $connection.Open()
    $buildTable.ExecuteNonQuery()
    $connection.Close()
}
CreateStagingTable -location "C:\files\" -file "savingsrate" -extension ".txt" -server "OURSERVER\OURINSTANCE" -database "StagingDatabase"

ALSO you could use:

Function AutoImportCommaFlatFiles($location, $file, $extension, $server, $database)
{
    $full = $location + $file + $extension
    $all = Get-Content $full
    $columns = $all[0]
    $columns = $columns.Replace(" ","")
    $columns = $columns.Replace(",","] VARCHAR(100), [")
    $table = "CREATE TABLE " + $file + "([" + $columns + "] VARCHAR(100))"
    $connection = New-Object System.Data.SqlClient.SqlConnection
    $buildTable = New-Object System.Data.SqlClient.SqlCommand
    $insertData = New-Object System.Data.SqlClient.SqlCommand
    $connection.ConnectionString = "Data Source=" + $server + ";Database=" + $database + ";integrated security=true"
    $buildTable.CommandText = $table
    $buildTable.Connection = $connection
    ## Added to function
    $x = 0
    $insertData.CommandText = "EXECUTE stp_CommaBulkInsert @1,@2"
    $insertData.Parameters.Add("@1", $full)
    $insertData.Parameters.Add("@2", $file)
    $insertData.Connection = $connection
    $connection.Open()
    $buildTable.ExecuteNonQuery()
    $connection.Close()
    ## Added to function
    $x = 1
    if ($x = 1)
    {
        $connection.Open()
        $insertData.ExecuteNonQuery()
        $connection.Close()
    }
}
AutoImportCommaFlatFiles -location "C:\files\" -file "savingsrate" -extension ".txt" -server "OURSERVER\OURINSTANCE" -database "StagingDatabase"



 
  1. SSIS包 - 您可以在谷歌上找到有关它们的教程。您甚至可以使用SSMS导入文件,然后将其保存为作业,以便将其安排为SSIS包。

谢谢您添加这些信息,对于其他人寻找帮助也很有用。我非常清楚大多数情况下这是可能的(这也可能是我找不到所需内容的确切原因)。然而,上述每个选项的主要限制是您需要访问该文件(我们的团队没有访问权限)。这些文件实际上被转换为二进制字符串并存储在数据库表中。数据库管理员不想进行应用程序更改以允许我们访问文件,因此我们必须使用存储在数据库表中的CSV字符串。 - rphello101
你尝试过这个吗?https://stackoverflow.com/questions/43573668/import-varbinarymax-data-using-sql-server-ssis-data-import-wizard/43606817 - Underhill Julian
是的,我们在使用SMSS时经常使用它。实际上,这个工具是由另一个过程调用的,而这个过程又被我们的应用程序调用。本质上,我们创建了一个工具,允许我们的用户在没有数据库访问权限的情况下上传CSV文件,然后将记录加载到现有表中。该应用程序已经设计好了将CSV文件读入二进制字符串并将其存储在数据库中的功能。我们利用了这个现有的功能,不幸的是,这导致了手动处理CSV字符串而不是文件本身的头痛。 - rphello101

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接