在SQL Server中如何规范化Unicode字符串?

10
SQL Server中是否有一个函数可以规范化Unicode字符串?例如:
UPDATE Orders SET Notes = NormalizeString(Notes, 'FormC')

Unicode规范化形式:

  • 组合形式C):将A + ¨转换为Ä
  • 分解形式D):将Ä转换为A + ¨
  • 兼容组合形式KC):A + ¨ + + n变成Ä + f + i + n
  • 兼容分解形式KD):Ä + + n变成A + ¨ + f + i + n

我找不到任何内置函数,所以我认为没有。


理想情况下,如果只能有一个,则我今天需要使用形式C:

Unicode规范化形式C,规范组成。将每个已分解的组合(由基本字符加组合字符组成)转换为规范的预组合等效形式。例如,A + ¨变成Ä。

另请参阅

4个回答

7
抱歉,截至目前(2012测试版本),SQL Server 的任何版本中都没有此类功能。虽然比较可以正确地忽略组合,但是没有将字符组成用法转换为一种常规形式的函数。
未来版本的 ANSI 标准已经提出了建议,在语法下使用 NORMALIZE(string, NFC),但在实际应用中需要很长时间才能实现。如果您想进行规范化操作,现在必须使用具有更好字符串处理功能的适当编程语言,在从数据库中提取字符串或编写 CLR 存储过程时执行。

1

尝试使用这个CLR函数

using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction]
    public static SqlString fnRemoveDiacritics(string text)
    {
        string stringFormD = text.Normalize(System.Text.NormalizationForm.FormD);
        System.Text.StringBuilder retVal = new System.Text.StringBuilder();
        for (int index = 0; index < stringFormD.Length; index++)
        {
            if (System.Globalization.CharUnicodeInfo.GetUnicodeCategory(stringFormD[index]) != System.Globalization.UnicodeCategory.NonSpacingMark)
                retVal.Append(stringFormD[index]);
        }
        return retVal.ToString().Normalize(System.Text.NormalizationForm.FormC);
    }
}

并且在SQL中

SELECT dbo.fnRemoveDiacritics('Äěščřžýáíé')
-- Returns: Aescrzyaie

感谢http://www.dotnetportal.cz/blogy/4/Tomas-Jecha/663/NET-Tip-6-Ciste-odstraneni-diakritiky


1
如果您从 SQL 中硬编码字符串,应该指定 N'Äěščřžýáíé'(对于 nvarchar)。 - David S.
1
@IanBoyd 我不明白你所说的那张图片的意思。我指的不是出于性能原因,而是字符串将会丢失信息:https://i.imgur.com/6QbzEgV.png - David S.
2
@DavidS。啊,我明白你的意思了。 - Ian Boyd

1

如果在SQL Server中没有内置的解决方案,而且您不想编写C#函数,您可以使用translate函数手动执行此操作:

select translate(last_name, 'éêëèàäçïîìôöòûù', 'eeeeaaciiiooouu') from employees

您还可以使用以下解决方案来自: https://coderwall.com/p/a6koxq/how-to-remove-diacritics-in-sql-server

从SQL Server中删除字符串中的变音符号最简单的方法是使用不包括变音符号的字符集对字符串进行排序,例如:

SELECT 'àéêöhello!' Collate SQL_Latin1_General_CP1253_CI_AI

这将输出:

aeeohello!

只有在您不使用Unicode字符串时才能正常工作,因此如果您有一个Unicode字符串,请先将其转换为varchar。


1

我遇到了一些问题,

我在C#中编写了一个新的CRL函数,并像SQL函数一样使用。

我的C#代码(规范化为NFC或NFD)。同时,正确处理空字符串:

using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.Text;

    public static class CLRUnicodeNormalize
    {
        [SqlFunction]
        public static SqlString UnicodeNormalizeC(SqlString inStr)
        {
            if (inStr.IsNull) return SqlString.Null;
            return inStr.ToString().Normalize(NormalizationForm.FormC);
        }
        public static SqlString UnicodeNormalizeD(SqlString inStr)
        {
            if (inStr.IsNull) return SqlString.Null;
            return inStr.ToString().Normalize(NormalizationForm.FormD);
        }
    }

在 SQL 查询中使用示例(从任何规范化到 C):

UPDATE o SET o.ObjectName=dbo.UnicodeNormalizeC(o.ObjectName)

在使用前安装归一化函数(在这种情况下不需要C#...二进制文件包含可用的.Net代码,请参见上面的源代码):

/* 
GO
DROP FUNCTION [dbo].UnicodeNormalize
GO
DROP ASSEMBLY [CLRUnicodeNormalize]
GO
*/


CREATE ASSEMBLY [CLRUnicodeNormalize]
FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C010300747F4BC90000000000000000E00022200B013000000A0000000600000000000076280000002000000040000000000010002000000002000004000000000000000600000000000000008000000002000000000000030060850000100000100000000010000010000000000000100000000000000000000000242800004F00000000400000C803000000000000000000000000000000000000006000000C00000070270000380000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E746578740000007C08000000200000000A000000020000000000000000000000000000200000602E72737263000000C80300000040000000040000000C0000000000000000000000000000400000402E72656C6F6300000C000000006000000002000000100000000000000000000000000000400000420000000000000000000000000000000058280000000000004800000002000500A4200000CC0600000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000A20F00281000000A2C067E1100000A2A0F00FE16120000016F1200000A176F1300000A281400000A2AA20F00281000000A2C067E1100000A2A0F00FE16120000016F1200000A186F1300000A281400000A2A000042534A4201000100000000000C00000076342E302E33303331390000000005006C00000014020000237E0000800200000803000023537472696E6773000000008805000004000000235553008C0500001000000023475549440000009C0500003001000023426C6F620000000000000002000001471500000900000000FA0133001600000100000014000000020000000200000002000000140000000F00000001000000020000000000FE0101000000000006003201A50206009F01A5020600510073020F00C5020000060079003A02060015013A020600E1003A02060086013A02060052013A0206006B013A02060090003A020600650086020600430086020600C4003A020600AB00D1010600E90221020A0000014C020A00EB01D4020600F701210206002802FC02000000000100000000000100010081011000BD01000041000100010050200000000096000A003B00010079200000000096001C003B000200000001006D02000001006D02090067020100110067020600190067020A00290067021000310067021000390067021000410067021000490067021000510067021000590067021000610067021500690067021000710067021000790067021000890067020600910016021A0091001C021E008100F50122009900C70126009100F0022C0020007B002A012E000B0042002E0013004B002E001B006A002E00230073002E002B008B002E0033008B002E003B008B002E00430073002E004B0091002E0053008B002E005B008B002E006300A9002E006B00D3002E007300E000048000000100000000000000000000000000BD01000004000000000000000000000032003A000000000004000000000000000000000032002E00000000000000003C4D6F64756C653E00556E69636F64654E6F726D616C697A654300556E69636F64654E6F726D616C697A65440053797374656D2E44617461006D73636F726C696200477569644174747269627574650044656275676761626C6541747472696275746500436F6D56697369626C6541747472696275746500417373656D626C795469746C6541747472696275746500417373656D626C7954726164656D61726B417474726962757465005461726765744672616D65776F726B41747472696275746500417373656D626C7946696C6556657273696F6E41747472696275746500417373656D626C79436F6E66696775726174696F6E4174747269627574650053716C46756E6374696F6E41747472696275746500417373656D626C794465736372697074696F6E41747472696275746500436F6D70696C6174696F6E52656C61786174696F6E7341747472696275746500417373656D626C7950726F6475637441747472696275746500417373656D626C79436F7079726967687441747472696275746500417373656D626C79436F6D70616E794174747269627574650052756E74696D65436F6D7061746962696C69747941747472696275746500434C52556E69636F64654E6F726D616C697A650053797374656D2E52756E74696D652E56657273696F6E696E670053716C537472696E6700546F537472696E6700434C52556E69636F64654E6F726D616C697A652E646C6C006765745F49734E756C6C0053797374656D004E6F726D616C697A6174696F6E466F726D0053797374656D2E5265666C656374696F6E004D6963726F736F66742E53716C5365727665722E536572766572002E63746F7200696E5374720053797374656D2E446961676E6F73746963730053797374656D2E52756E74696D652E496E7465726F7053657276696365730053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300446562756767696E674D6F6465730053797374656D2E446174612E53716C5479706573004F626A656374006F705F496D706C696369740053797374656D2E54657874000000000042B7983A5EC3D848A69F95A94C809E3900042001010803200001052001011111042001010E042001010203200002030611490320000E0520010E115105000111490E08B77A5C561934E089060001114911490801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F7773010801000200000000001701001253514C4E6F726D616C697A65537472696E67000005010000000017010012436F7079726967687420C2A920203230313900002901002434303635643437642D343839642D343638332D626233652D32393963316632623536626400000C010007312E302E302E3000004901001A2E4E45544672616D65776F726B2C56657273696F6E3D76342E360100540E144672616D65776F726B446973706C61794E616D65122E4E4554204672616D65776F726B20342E3604010000000000000000A704E49600000000020000007C000000A8270000A809000000000000000000000000000010000000000000000000000000000000525344534526D8C335B5074C819DB6DF9CC26A3701000000433A5C55736572735C537A395C736F757263655C7265706F735C53514C4E6F726D616C697A65537472696E675C53514C4E6F726D616C697A65537472696E675C6F626A5C52656C656173655C434C52556E69636F64654E6F726D616C697A652E706462004C28000000000000000000006628000000200000000000000000000000000000000000000000000058280000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF250020001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001001000000018000080000000000000000000000000000001000100000030000080000000000000000000000000000001000000000048000000584000006C03000000000000000000006C0334000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE00000100000001000000000000000100000000003F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B004CC020000010053007400720069006E006700460069006C00650049006E0066006F000000A802000001003000300030003000300034006200300000001A000100010043006F006D006D0065006E007400730000000000000022000100010043006F006D00700061006E0079004E0061006D00650000000000000000004E0013000100460069006C0065004400650073006300720069007000740069006F006E0000000000530051004C004E006F0072006D0061006C0069007A00650053007400720069006E00670000000000300008000100460069006C006500560065007200730069006F006E000000000031002E0030002E0030002E003000000050001800010049006E007400650072006E0061006C004E0061006D006500000043004C00520055006E00690063006F00640065004E006F0072006D0061006C0069007A0065002E0064006C006C0000004800120001004C006500670061006C0043006F007000790072006900670068007400000043006F0070007900720069006700680074002000A90020002000320030003100390000002A00010001004C006500670061006C00540072006100640065006D00610072006B00730000000000000000005800180001004F0072006900670069006E0061006C00460069006C0065006E0061006D006500000043004C00520055006E00690063006F00640065004E006F0072006D0061006C0069007A0065002E0064006C006C000000460013000100500072006F0064007500630074004E0061006D00650000000000530051004C004E006F0072006D0061006C0069007A00650053007400720069006E00670000000000340008000100500072006F006400750063007400560065007200730069006F006E00000031002E0030002E0030002E003000000038000800010041007300730065006D0062006C0079002000560065007200730069006F006E00000031002E0030002E0030002E0030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000783800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
WITH PERMISSION_SET = UNSAFE
GO

CREATE FUNCTION [dbo].UnicodeNormalizeC(@inStr nvarchar(max)) RETURNS nvarchar(max) 
AS EXTERNAL NAME [CLRUnicodeNormalize].[CLRUnicodeNormalize].[UnicodeNormalizeC]
GO
CREATE FUNCTION [dbo].UnicodeNormalizeD(@inStr nvarchar(max)) RETURNS nvarchar(max) 
AS EXTERNAL NAME [CLRUnicodeNormalize].[CLRUnicodeNormalize].[UnicodeNormalizeD]
GO

编译使用的 .Net 版本:4.6

以上二进制代码的使用责任由用户自行承担!

测试示例:

DECLARE @str_nfd nvarchar(10) = dbo.UnicodeNormalizeD('á');
SELECT LEN(@str_nfd);   -- NFD normalization, return: 2
SELECT LEN(dbo.UnicodeNormalizeC(@str_nfd)); -- back to NFC, return: 1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接