我想把所有东西都升级到"Latin1_General_CI_AS"。据我所知,它们几乎是相同的排序规则,在这个过程中不会丢失数据...
有人知道这是否正确吗?
以下是更完整的答案:
这些排序规则之间的主要区别在于它们如何应用字符扩展规则。某些拉丁字符可能会扩展为多个字符。当使用非unicode文本时,SQL_xxxx排序规则可能忽略这些字符扩展,但对于unicode文本则应用它们。因此:在使用一种排序规则与另一种排序规则时,连接、排序和比较可能返回不同的结果。
例如:
在 Latin1_General_CI_AS
下,这两个语句将返回相同的记录集,因为 ß
被扩展为 ss
。
SELECT * FROM MyTable3 WHERE Comments = 'strasse'
SELECT * FROM MyTable3 WHERE Comments = 'straße'
使用SQL_Latin1_General_CP1_CI_AS
时,上述语句返回不同的记录,因为ß
被视为不同于ss
的字符。
Regarding data-loss potential:
NVARCHAR
fields are all Unicode, which is a single character set, so there can't be any data loss for these fields (this also covers XML fields which are also stored as UTF-16 Little Endian). Meta-data fields that store the object / column / index / etc names are all NVARCHAR
so no need to worry about those.VARCHAR
fields having different Collations but the same Code Page between the differing Collations will not be a problem since the Code Page is the character set.VARCHAR
fields having different Collations and moving to a different Code Page (when changing Collations) can have data loss if any of the characters being used are not represented in the new Code Page. HOWEVER, this is only an issue when physically changing the Collation of a particular field (described below) and would not happen upon changing the default Collation of a database.Local variables and string literals get their Collation from the Database default. Changing the database default will change the Collation used for both local variables and string literals. But changing the Database's default Collation does not change the Collation used for existing string columns in the tables in that Database. This generally should not cause any problems when comparing or concatenating a column with a literal and/or variable since the literals and variables will take on the Collation of the column due to Collation Precedence. The only potential problem would be Code Page conversions that might occur for characters of values between 128 - 255 that are not available in the Code Page used by the Collation of the column.
If you are expecting a predicate / comparison / sort / concatenation / etc for a column to behave differently upon changing the Database's default Collation, then you will need to explicitly change that column's Collation using the following command:
ALTER TABLE [{table_name}]
ALTER COLUMN [{column_name}]
{same_datatype}
{same_NULL_or_NOT NULL_setting}
COLLATE {name_of_Database_default_Collation};
Be sure to specify the exact same datatype and NULL
/ NOT NULL
setting that are currently being used, else they can revert to the default if not already being the default value. After that, if there are any indexes on any of the string columns that just had their Collation changed, then you need to rebuild those indexes.
Changing the Database's default Collation will change the Collation of certain database-specific meta-data, such as the name
field in both sys.objects
, sys.columns
, sys.indexes
, etc. Filtering these system Views against local variables or string literals won't be a problem since the Collation will be changing on both sides. But, if you JOIN any of the local system Views to temporary tables on string fields, and the Database-level Collation between the local database and tempdb
doesn't match, then you will get the "Collation mismatch" error. This is discussed below along with the remedy.
One difference between these two Collations is in how they sort certain characters for VARCHAR
data (this does not affect NVARCHAR
data). The non-EBCDIC SQL_
Collations use what is called "String Sort" for VARCHAR
data, while all other Collations, and even NVARCHAR
data for the non-EBCDIC SQL_
Collations, use what is called "Word Sort". The difference is that in "Word Sort", the dash -
and apostrophe '
(and maybe a few other characters?) are given a very low weight and are essentially ignored unless there are no other differences in the strings. To see this behavior in action, run the following:
DECLARE @Test TABLE (Col1 VARCHAR(10) NOT NULL);
INSERT INTO @Test VALUES ('aa');
INSERT INTO @Test VALUES ('ac');
INSERT INTO @Test VALUES ('ah');
INSERT INTO @Test VALUES ('am');
INSERT INTO @Test VALUES ('aka');
INSERT INTO @Test VALUES ('akc');
INSERT INTO @Test VALUES ('ar');
INSERT INTO @Test VALUES ('a-f');
INSERT INTO @Test VALUES ('a_e');
INSERT INTO @Test VALUES ('a''kb');
SELECT * FROM @Test ORDER BY [Col1] COLLATE SQL_Latin1_General_CP1_CI_AS;
-- "String Sort" puts all punctuation ahead of letters
SELECT * FROM @Test ORDER BY [Col1] COLLATE Latin1_General_100_CI_AS;
-- "Word Sort" mostly ignores dash and apostrophe
Returns:
String Sort
-----------
a'kb
a-f
a_e
aa
ac
ah
aka
akc
am
ar
and:
Word Sort
---------
a_e
aa
ac
a-f
ah
aka
a'kb
akc
am
ar
While you will "lose" the "String Sort" behavior, I'm not sure that I would call that a "feature". It is a behavior that has been deemed undesirable (as evidenced by the fact that it wasn't brought forward into any of the Windows collations). However, it is a definite difference of behavior between the two collations (again, just for non-EBCDIC VARCHAR
data), and you might have code and/or customer expectations based upon the "String Sort" behavior. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.
Another difference between SQL_Latin1_General_CP1_CI_AS
and Latin1_General_100_CI_AS
is the ability to do Expansions on VARCHAR
data (NVARCHAR
data can already do these for most SQL_
Collations), such as handling æ
as if it were ae
:
IF ('æ' COLLATE SQL_Latin1_General_CP1_CI_AS =
'ae' COLLATE SQL_Latin1_General_CP1_CI_AS)
BEGIN
PRINT 'SQL_Latin1_General_CP1_CI_AS';
END;
IF ('æ' COLLATE Latin1_General_100_CI_AS =
'ae' COLLATE Latin1_General_100_CI_AS)
BEGIN
PRINT 'Latin1_General_100_CI_AS';
END;
Returns:
Latin1_General_100_CI_AS
The only thing you are "losing" here is not being able to do these expansions. Generally speaking, this is another benefit of moving to a Windows Collation. However, just like with the "String Sort" to "Word Sort" move, the same caution applies: it is a definite difference of behavior between the two collations (again, just for VARCHAR
data), and you might have code and/or customer expectations based upon not having these mappings. This requires testing your code and possibly researching to see if this change in behavior might have any negative impact on users.
(first noted in @Zarepheth's answer and expanded on here)
Another difference (that is also a benefit of moving to a Windows Collation) is that filtering a VARCHAR
column that is indexed on NVARCHAR
literal / variable / column you will no longer invalidate the index on the VARCHAR
column. This is due to the Windows Collations using the same Unicode sorting and comparison rules for both VARCHAR
and NVARCHAR
data. Because the sort order is the same between the two types, when the VARCHAR
data gets converted into NVARCHAR
(explicitly or implicitly due to datatype precedence), the order of items in the index is still valid. For more details on this behavior, please see my post: Impact on Indexes When Mixing VARCHAR and NVARCHAR Types.
The server-level Collation is used to set the Collation of the system databases, which includes [model]
. The [model]
database is used as a template to create new databases, which includes [tempdb]
upon each server startup. So, if the Database's default collation does not match the instance's default Collation and you join local tables to temporary tables on string fields, then you will get the Collation-mismatch error. Fortunately there is a somewhat easy way to correct for collation differences between the database that is "current" when CREATE #TempTable
is executed and [tempdb]
. When creating temporary tables, declare a collation (on string columns) using the COLLATE
clause and use either a specific collation (if you know that the DB will always be using that collation), or DATABASE_DEFAULT
(if you don't always know the collation of the DB where this code will execute):
CREATE TABLE #Temp (Col1 NVARCHAR(40) COLLATE DATABASE_DEFAULT);
This is not necessary for table variables since they get their default Collation from the "current" database. However, if you have both table variables and temporary tables and join them on string fields, then you will need to use COLLATE {specific_collation}
or COLLATE DATABASE_DEFAULT
as shown directly above.
The server-level collation also controls local variable names, CURSOR
variable names, and GOTO
labels. While none of these would be impacted by the specific change being dealt with in this Question, it is at least something to be aware of.
It is best to use the most recent version of the desired collation, if multiple versions are available. Starting in SQL Server 2005, a "90" series of collations was introduced, and SQL Server 2008 introduced a "100" series of collations. You can find these collations by using the following queries:
SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]90[_]%'; -- 476
SELECT * FROM sys.fn_helpcollations() WHERE [name] LIKE N'%[_]100[_]%'; -- 2686
ALSO, while the question asks about case-insensitive Collations, it should be noted that if someone else is looking to make a similar change but is using case-sensitive Collations, then another difference between SQL Server Collations and Windows Collations, for VARCHAR
data only, is which case sorts first. Meaning, if you have both A
and a
, the SQL_
Collations will sort A
before a
, while the non-SQL_
Collations (and the SQL_
Collations when dealing with NVARCHAR
data) will sort a
before A
.
如需有关更改数据库或整个实例的排序规则的更多信息和详细信息,请参阅我的文章:
更改实例、数据库和所有用户数据库中所有列的排序规则:可能会发生什么问题?
有关使用字符串和排序规则的更多信息,请访问:排序规则信息
这个MSDN论坛上有更多信息:
其中提到:
如果排序规则是SQL_Latin1_General_CP1_CI_AS或Latin1_General_CI_AS,你应该不会看到太大的差异,但两者都有快慢之分。
Latin1_General_CI_AS:Latin1-General,不区分大小写,区分重音符号,不区分假名类型,不区分宽度
SQL_Latin1_General_CP1_CI_AS:Latin1-General,不区分大小写,区分重音符号,不区分假名类型,不区分宽度(对于Unicode数据),非Unicode数据在代码页1252上使用SQL Server排序顺序52
因此,在我看来,如果您的数据只包含a-z0-9,则不应该看到任何差异。
SELECT * FROM ::fn_helpcollations()
WHERE name IN (
'SQL_Latin1_General_CP1_CI_AS',
'Latin1_General_CI_AS'
)
Latin1_General_CI_AS: 这是一种Latin1-General的排序规则,它不区分大小写、重音符号、假名类型和宽度。
SQL_Latin1_General_CP1_CI_AS: 这也是一种Latin1-General的排序规则,它不区分大小写、重音符号、假名类型和宽度,但是对于Unicode数据,使用的是SQL Server Sort Order 52,非Unicode数据使用Code Page 1252进行排序。
因此,从中可以推断出所使用的代码页是相同的(Latin1-General => 1252),因此您应该不会遇到任何数据损失 - 如果在转换后发生任何变化,可能是排序顺序 - 这可能是无关紧要的。
COLLATE {desired_collation}
来指定排序规则。我不确定,但您可能可以在定义变量时指定排序规则。由于我是凭记忆回答这个问题的,建议您查阅 SQL Server T-SQL 文档以获取语法和更多详细信息。 - ZarephethVARCHAR
,因为那会导致数据丢失。你需要使用匈牙利语或二进制排序规则,因为它们是唯一不将这些特定字符等同的排序规则。所以尝试使用COLLATE Hungarian_Technical_100_CI_AS_SC
或COLLATE Latin1_General_100_BIN2
。有关更多详细信息,请查看我的答案在这里和在这里。此外,变量(和字符串字面值)始终具有当前数据库默认排序规则的排序规则,并且无法赋予不同的排序规则。 - Solomon Rutzky