SQL地址数据混乱,如何在查询中清理它?

6
我有地址数据存储在SQL Server 2000数据库中,需要提取给定客户代码的所有地址。问题是,有许多拼写错误的地址,一些地址信息缺失等。所以我需要对其进行清理。我需要过滤掉错误的拼写、缺失的地址信息等,并得出“平均”记录。例如,如果在5个记录中有4个正确拼写了“纽约”,那么应返回该值。
我不能修改数据,也不能在输入时对其进行验证或其他任何操作。我只能修改数据的副本或通过查询进行操作。
我在这里找到了部分答案:Addresses stored in SQL server have many small variations(errors),但我需要允许每个代码有多个有效地址。
样例数据:
Code    Name                       Address1                      Address2           City            State          Zip     TimesUsed
10003   AMERICAN NUTRITON INC     2183 BALL STREET                                 OLDEN           Utah           87401     177
10003   AMEICAN NUTRITION INC     2183 BALL STREET              PO BOX 1504        OLDEN           Utah           87402     76
10003   AMERICAN NUTRITION INC    2183 BALL STREET                                 OLDEN           Utah           87402     24
10003   AMERICAN NUTRITION INC    2183 BALL STREET              PO BOX 1504        OLDEN           Utah           87402     17
10003   Samantha Brooks           506 S. Main Street                               Ellensburg      Washington     98296     1
10003   BEMIS COMPANY             1401 W. FOURTH PLAIN BLVD.                       VANCOUVER       Washington     98660     1
10003   CEI                       597 VANDYRE BOULEVARD                            WRIGHTSTOWN     Wisconsin      54180     1
10003   Pacific Pet               28th Avenue                                      OLDEN           Utah           84401     1
10003   PETSMART, INC.            16091 NORTH 25TH STREET                          PHOENA         Arizona        85027      1
10003   THE PET FIRM              16418 NORTH 37TH STREET                          PHOENA         Arizona        85503      1
期望输出:
代码    名称                        地址1                             地址2                城市           州              邮编
10003   美国营养公司                2183 Ball Avenue                                   Olden         犹他州          84401
10003   Samantha Brooks             506 S. Main Street                               Ellensburg     华盛顿州        98296 
10003   BEMIS公司                 1401 W. FOURTH PLAIN BLVD.                       VANCOUVER      华盛顿州        98660
10003   CEI                       975 VANDYKE ROAD                                 WRIGHTSTOWN    威斯康星州      54180
10003   太平洋宠物公司               29th Street                                        OGDEN         犹他州          84401
10003   PETSMART公司              16091 NORTH 25TH AVENUE                          PHOENA        亚利桑那州       85027
10003   宠物公司                    16418 NORTH 37TH STREET                          PHOENA        亚利桑那州       85503

如果字段内的差异很小 - 一个或两个字符,您可以尝试基于Levenshtein距离进行分组,但是首先使事物正确分组以计算这些距离基本上会使您回到原点。 - Marc B
任何解决方案都部分取决于到达数据的数量和频率。这只是“新”数据,还是现有项目可以更新?如果您尝试检索的经过消毒的地址列表不是基于垃圾数据的最新副本,这是否重要? - MartW
@Marc - 你确定要考虑Levenshtein距离吗?The Pet ShopThe Pot Shop可能是你可以得到的除了相等之外最短的距离,它们会是两个不同的公司。 - RichardTheKiwi
1
我看了一下Levenshtein算法,但是我认为我不能在这里使用它。首先,将其转换为SQL将是绝对的噩梦,而且即使最好情况下也无法解决我的大部分挑战。 - MAW74656
@MAW74656 - 如果您自己尝试标准化街道地址,那将是一场噩梦。例如,以第一个地址为例。您可能有一个条目是(2813 WALL AVENUE PO BOX 1234, NULL)(对于Address1、Address),另一个是(2813 WALL AVE. BOX 1234, NULL),另一个是(2813 WALL AVE, #1234),还有一个是(2813 WALL AVENUE, PO BOX 1234)。变化太多了。如果您尝试自己编写代码,那么您会招来克苏鲁,特别是因为您无法更正原始来源。 - Thomas
显示剩余2条评论
6个回答

2

使用 soundex(name) 进行分组,您将得到如下结果。您需要在自己的数据上进行测试,以确定是否适用于您的情况。我无法在 SQL Server 2000 上测试此功能,因此不确定 soundex 是否可用。

declare @T table (Code char(5), Name varchar(50), Address1 varchar(50))
insert into @T values
('10003', 'AMERICAN NUTRITON INC',  '2183 BALL STREET'),
('10003', 'AMEICAN NUTRITION INC',  '2183 BALL STREET'),
('10003', 'AMERICAN NUTRITION INC', '2183 BALL STREET'),
('10003', 'AMERICAN NUTRITION INC', '2183 BALL STREET'),
('10003', 'Samantha Brooks',        '506 S. Main Street'),
('10003', 'BEMIS COMPANY',          '1401 W. FOURTH PLAIN BLVD.'),
('10003', 'CEI',                    '597 VANDYRE BOULEVARD'),
('10003', 'Pacific Pet',            '28th Avenue'),
('10003', 'PETSMART, INC.',         '16091 NORTH 25TH STREET'),
('10003', 'THE PET FIRM',           '16418 NORTH 37TH STREET')

select
  min(Code) as Code,
  min(Name) as Name,
  min(Address1) as Address1
from @T
group by soundex(Name)
________________________________________________________
Code  Name                    Address1
10003 AMEICAN NUTRITION INC   2183 BALL STREET
10003 AMERICAN NUTRITION INC  2183 BALL STREET
10003 BEMIS COMPANY           1401 W. FOURTH PLAIN BLVD.
10003 CEI                     597 VANDYRE BOULEVARD
10003 Pacific Pet             28th Avenue
10003 PETSMART, INC.          16091 NORTH 25TH STREET
10003 Samantha Brooks         506 S. Main Street
10003 THE PET FIRM            16418 NORTH 37TH STREET

1
如果“American”有多个拼写形式怎么办? - MAW74656
@MAW74656 - 在您的示例数据中没有,现在有了,因为问题已经被编辑。使用group by解决了原始问题。 - Mikael Eriksson
@MAW74656 - 你考虑过 soundex 吗?尝试使用 group by soundex(Name) 代替 group by Name - Mikael Eriksson
我不熟悉soundex,它是什么?是的,我编辑了问题,因为事后意识到它缺失了。我的编辑更贴近整个数据集。 - MAW74656
@MAW74656 - 你可以在这里阅读有关Soundex的信息:http://en.wikipedia.org/wiki/Soundex,以及在SQL Server中使用Soundex的相关内容:http://msdn.microsoft.com/en-us/library/ms187384.aspx。我没有SQL Server 2000,所以无法确认是否可用Soundex。 - Mikael Eriksson

2
最佳解决方案是使用经过CASS认证的地址标准化程序或服务来格式化和验证地址。除了美国邮政服务(USPS)提供此类工具外,还有许多第三方程序或服务可以提供此功能。地址解析比你想象的要复杂得多,因此试图编写一些查询来完成它将充满风险。谷歌地理编码(Google's Geocoding)是另一个可参考的地方。显然,谷歌要求您展示结果才能使用他们的地理编码服务。这意味着只能使用专用地址解析器,如USPS或第三方程序。

1
很遗憾,由于他们网站上的这一句话“禁止在地图上显示地理编码结果”,谷歌地理编码不是一个选项。 - MAW74656
1
我不是法律专家。然而,我认为雅虎的Placefinder API提供了类似的功能,并且他们的使用条款似乎没有那么多限制。还有其他第三方地点查找器,收取少量费用,您可以不受数据使用方面的限制。http://developer.yahoo.com/geo/placefinder/ - eoldre
我将使用雅虎地点查找API来检查每个地址,然后将有效的地址保存到一个新的数据库表中。然后我会提供那张表。 - MAW74656
@thomas,你能再说一遍为什么谷歌不是这个问题的正确解决方案吗? - Alex Gordon
@I__。正如MAW74656所提到的,您需要阅读Google Geocoding API使用条款:“Geocoding API只能与Google地图一起使用;禁止在不显示地图的情况下使用地理编码结果。” http://code.google.com/apis/maps/documentation/geocoding/ - Thomas

0

根据您提到的相同答案,可以尝试这样做:

select Code,
       (select top 1 Name from Table1 where Code=X.Code  and Address1=X.Address1
       group by Name order by count(*) desc) Name,
       Address1
from    Table1 X
group by Code,Address1

0

好的,这里是一个开头,哈哈...

-- address, address2, city to uppercase
-- change all ".", "," to " " (space)
-- remove all double spaces from address (in case the previous call added double spaces)
-- trim
UPDATE
    consumers
SET
    address = TRIM(REPLACE(REPLACE(REPLACE(UPPER(address), '.', ' '), ',', ' '), '  ', ' ')),
    address2= TRIM(REPLACE(REPLACE(REPLACE(UPPER(address2), '.', ' '), ',', ' '), '  ', ' ')),
    city    = TRIM(REPLACE(REPLACE(REPLACE(UPPER(city), '.', ' '), ',', ' '), '  ', ' ')),
    state   = TRIM(UPPER(state))
WHERE
    id = _consumer_id;

-- address update all to use abbreviations Street -> St
UPDATE
    consumers
SET
    address = REPLACE(address, 'NORTH','N'),
    address = REPLACE(address, 'EAST','E'),
    address = REPLACE(address, 'SOUTH','S'),
    address = REPLACE(address, 'WEST','W'),

    address = REPLACE(address, 'ALLEY','ALY'),
    address = REPLACE(address, 'ANNEX','ANX'),
    address = REPLACE(address, 'APARTMENT','APT'),
    address = REPLACE(address, 'ARCADE','ARC'),
    address = REPLACE(address, 'AVENUE','AVE'),
    address = REPLACE(address, 'BASEMENT','BSMT'),
    address = REPLACE(address, 'BAYOU','BYU'),
    address = REPLACE(address, 'BEACH','BCH'),
    address = REPLACE(address, 'BEND','BND'),
    address = REPLACE(address, 'BLUFF','BLF'),
    address = REPLACE(address, 'BOTTOM','BTM'),
    address = REPLACE(address, 'BOULEVARD','BLVD'),
    address = REPLACE(address, 'BRANCH','BR'),
    address = REPLACE(address, 'BRIDGE','BRG'),
    address = REPLACE(address, 'BROOK','BRK'),
    address = REPLACE(address, 'BUILDING','BLDG'),
    address = REPLACE(address, 'BURG','BG'),
    address = REPLACE(address, 'BYPASS','BYP'),
    address = REPLACE(address, 'CAMP','CP'),
    address = REPLACE(address, 'CANYON','CYN'),
    address = REPLACE(address, 'CAPE','CPE'),
    address = REPLACE(address, 'CAUSEWAY','CSWY'),
    address = REPLACE(address, 'CENTER','CTR'),
    address = REPLACE(address, 'CIRCLE','CIR'),
    address = REPLACE(address, 'CLIFF','CLFS'),
    address = REPLACE(address, 'CLIFFS','CLFS'),
    address = REPLACE(address, 'CLUB','CLB'),
    address = REPLACE(address, 'CORNER','COR'),
    address = REPLACE(address, 'CORNERS','CORS'),
    address = REPLACE(address, 'COURSE','CRSE'),
    address = REPLACE(address, 'COURT','CT'),
    address = REPLACE(address, 'COURTS','CTS'),
    address = REPLACE(address, 'COVE','CV'),
    address = REPLACE(address, 'CREEK','CRK'),
    address = REPLACE(address, 'CRESCENT','CRES'),
    address = REPLACE(address, 'CROSSING','XING'),
    address = REPLACE(address, 'DALE','DL'),
    address = REPLACE(address, 'DAM','DM'),
    address = REPLACE(address, 'DEPARTMENT','DEPT'),
    address = REPLACE(address, 'DIVIDE','DV'),
    address = REPLACE(address, 'DRIVE','DR'),
    address = REPLACE(address, 'ESTATE','EST'),
    address = REPLACE(address, 'EXPRESSWAY','EXPY'),
    address = REPLACE(address, 'EXTENSION','EXT'),
    address = REPLACE(address, 'FALLS','FLS'),
    address = REPLACE(address, 'FERRY','FRY'),
    address = REPLACE(address, 'FIELD','FLD'),
    address = REPLACE(address, 'FIELDS','FLDS'),
    address = REPLACE(address, 'FLAT','FLT'),
    address = REPLACE(address, 'FLOOR','FL'),
    address = REPLACE(address, 'FORD','FRD'),
    address = REPLACE(address, 'FOREST','FRST'),
    address = REPLACE(address, 'FORGE','FRG'),
    address = REPLACE(address, 'FORK','FRK'),
    address = REPLACE(address, 'FORKS','FRKS'),
    address = REPLACE(address, 'FORT','FT'),
    address = REPLACE(address, 'FREEWAY','FWY'),
    address = REPLACE(address, 'FRONT','FRNT'),
    address = REPLACE(address, 'GARDEN','GDNS'),
    address = REPLACE(address, 'GARDENS','GDNS'),
    address = REPLACE(address, 'GATEWAY','GTWY'),
    address = REPLACE(address, 'GLEN','GLN'),
    address = REPLACE(address, 'GREEN','GRN'),
    address = REPLACE(address, 'GROVE','GRV'),
    address = REPLACE(address, 'HANGER','HNGR'),
    address = REPLACE(address, 'HARBOR','HBR'),
    address = REPLACE(address, 'HAVEN','HVN'),
    address = REPLACE(address, 'HEIGHTS','HTS'),
    address = REPLACE(address, 'HIGHWAY','HWY'),
    address = REPLACE(address, 'HILL','HL'),
    address = REPLACE(address, 'HILLS','HLS'),
    address = REPLACE(address, 'HOLLOW','HOLW'),
    address = REPLACE(address, 'INLET','INLT'),
    address = REPLACE(address, 'ISLAND','IS'),
    address = REPLACE(address, 'ISLANDS','ISS'),
    address = REPLACE(address, 'JUNCTION','JCT'),
    address = REPLACE(address, 'KEY','KY'),
    address = REPLACE(address, 'KNOLL','KNLS'),
    address = REPLACE(address, 'KNOLLS','KNLS'),
    address = REPLACE(address, 'LAKE','LK'),
    address = REPLACE(address, 'LAKES','LKS'),
    address = REPLACE(address, 'LANDING','LNDG'),
    address = REPLACE(address, 'LANE','LN'),
    address = REPLACE(address, 'LIGHT','LGT'),
    address = REPLACE(address, 'LOAF','LF'),
    address = REPLACE(address, 'LOBBY','LBBY'),
    address = REPLACE(address, 'LOCK','LCKS'),
    address = REPLACE(address, 'LOCKS','LCKS'),
    address = REPLACE(address, 'LODGE','LDG'),
    address = REPLACE(address, 'LOWER','LOWR'),
    address = REPLACE(address, 'MANOR','MNR'),
    address = REPLACE(address, 'MEADOW','MDWS'),
    address = REPLACE(address, 'MEADOWS','MDWS'),
    address = REPLACE(address, 'MILL','ML'),
    address = REPLACE(address, 'MILLS','MLS'),
    address = REPLACE(address, 'MISSION','MSN'),
    address = REPLACE(address, 'MOUNT','MT'),
    address = REPLACE(address, 'MOUNTAIN','MTN'),
    address = REPLACE(address, 'NECK','NCK'),
    address = REPLACE(address, 'OFFICE','OFC'),
    address = REPLACE(address, 'ORCHARD','ORCH'),
    address = REPLACE(address, 'PARKWAY','PKWY'),
    address = REPLACE(address, 'PENTHOUSE','PH'),
    address = REPLACE(address, 'PINE','PNES'),
    address = REPLACE(address, 'PINES','PNES'),
    address = REPLACE(address, 'PLACE','PL'),
    address = REPLACE(address, 'PLAIN','PLN'),
    address = REPLACE(address, 'PLAINS','PLNS'),
    address = REPLACE(address, 'PLAZA','PLZ'),
    address = REPLACE(address, 'POINT','PT'),
    address = REPLACE(address, 'PORT','PRT'),
    address = REPLACE(address, 'PRAIRIE','PR'),
    address = REPLACE(address, 'RADIAL','RADL'),
    address = REPLACE(address, 'RANCH','RNCH'),
    address = REPLACE(address, 'RAPID','RPDS'),
    address = REPLACE(address, 'RAPIDS','RPDS'),
    address = REPLACE(address, 'REST','RST'),
    address = REPLACE(address, 'RIDGE','RDG'),
    address = REPLACE(address, 'RIVER','RIV'),
    address = REPLACE(address, 'ROAD','RD'),
    address = REPLACE(address, 'ROOM','RM'),
    address = REPLACE(address, 'SHOAL','SHL'),
    address = REPLACE(address, 'SHOALS','SHLS'),
    address = REPLACE(address, 'SHORE','SHR'),
    address = REPLACE(address, 'SHORES','SHRS'),
    address = REPLACE(address, 'SPACE','SPC'),
    address = REPLACE(address, 'SPRING','SPG'),
    address = REPLACE(address, 'SPRINGS','SPGS'),
    address = REPLACE(address, 'SQUARE','SQ'),
    address = REPLACE(address, 'STATION','STA'),
    address = REPLACE(address, 'STRAVENUE','STRA'),
    address = REPLACE(address, 'STREAM','STRM'),
    address = REPLACE(address, 'STREET','ST'),
    address = REPLACE(address, 'SUITE','STE'),
    address = REPLACE(address, 'SUMMIT','SMT'),
    address = REPLACE(address, 'TERRACE','TER'),
    address = REPLACE(address, 'TRACE','TRCE'),
    address = REPLACE(address, 'TRACK','TRAK'),
    address = REPLACE(address, 'TRAFFICWAY','TRFY'),
    address = REPLACE(address, 'TRAIL','TRL'),
    address = REPLACE(address, 'TRAILER','TRLR'),
    address = REPLACE(address, 'TUNNEL','TUNL'),
    address = REPLACE(address, 'TURNPIKE','TPKE'),
    address = REPLACE(address, 'UNION','UN'),
    address = REPLACE(address, 'UPPER','UPPR'),
    address = REPLACE(address, 'VALLEY','VLY'),
    address = REPLACE(address, 'VIADUCT','VIA'),
    address = REPLACE(address, 'VIEW','VW'),
    address = REPLACE(address, 'VILLAGE','VLG'),
    address = REPLACE(address, 'VILLE','VL'),
    address = REPLACE(address, 'VISTA','VIS'),
    address = REPLACE(address, 'WAY','WAY'),
    address = REPLACE(address, 'WELL','WLS'),
    address = REPLACE(address, 'WELLS','WLS')
WHERE
    id = _consumer_id;


-- do the same for address2 but only if it has a value ( might help with speed? )
UPDATE
    consumers
SET
    address2 = REPLACE(address2, 'NORTH','N'),
    address2 = REPLACE(address2, 'EAST','E'),
    address2 = REPLACE(address2, 'SOUTH','S'),
    address2 = REPLACE(address2, 'WEST','W'),

    address2 = REPLACE(address2, 'ALLEY','ALY'),
    address2 = REPLACE(address2, 'ANNEX','ANX'),
    address2 = REPLACE(address2, 'APARTMENT','APT'),
    address2 = REPLACE(address2, 'ARCADE','ARC'),
    address2 = REPLACE(address2, 'AVENUE','AVE'),
    address2 = REPLACE(address2, 'BASEMENT','BSMT'),
    address2 = REPLACE(address2, 'BAYOU','BYU'),
    address2 = REPLACE(address2, 'BEACH','BCH'),
    address2 = REPLACE(address2, 'BEND','BND'),
    address2 = REPLACE(address2, 'BLUFF','BLF'),
    address2 = REPLACE(address2, 'BOTTOM','BTM'),
    address2 = REPLACE(address2, 'BOULEVARD','BLVD'),
    address2 = REPLACE(address2, 'BRANCH','BR'),
    address2 = REPLACE(address2, 'BRIDGE','BRG'),
    address2 = REPLACE(address2, 'BROOK','BRK'),
    address2 = REPLACE(address2, 'BUILDING','BLDG'),
    address2 = REPLACE(address2, 'BURG','BG'),
    address2 = REPLACE(address2, 'BYPASS','BYP'),
    address2 = REPLACE(address2, 'CAMP','CP'),
    address2 = REPLACE(address2, 'CANYON','CYN'),
    address2 = REPLACE(address2, 'CAPE','CPE'),
    address2 = REPLACE(address2, 'CAUSEWAY','CSWY'),
    address2 = REPLACE(address2, 'CENTER','CTR'),
    address2 = REPLACE(address2, 'CIRCLE','CIR'),
    address2 = REPLACE(address2, 'CLIFF','CLFS'),
    address2 = REPLACE(address2, 'CLIFFS','CLFS'),
    address2 = REPLACE(address2, 'CLUB','CLB'),
    address2 = REPLACE(address2, 'CORNER','COR'),
    address2 = REPLACE(address2, 'CORNERS','CORS'),
    address2 = REPLACE(address2, 'COURSE','CRSE'),
    address2 = REPLACE(address2, 'COURT','CT'),
    address2 = REPLACE(address2, 'COURTS','CTS'),
    address2 = REPLACE(address2, 'COVE','CV'),
    address2 = REPLACE(address2, 'CREEK','CRK'),
    address2 = REPLACE(address2, 'CRESCENT','CRES'),
    address2 = REPLACE(address2, 'CROSSING','XING'),
    address2 = REPLACE(address2, 'DALE','DL'),
    address2 = REPLACE(address2, 'DAM','DM'),
    address2 = REPLACE(address2, 'DEPARTMENT','DEPT'),
    address2 = REPLACE(address2, 'DIVIDE','DV'),
    address2 = REPLACE(address2, 'DRIVE','DR'),
    address2 = REPLACE(address2, 'ESTATE','EST'),
    address2 = REPLACE(address2, 'EXPRESSWAY','EXPY'),
    address2 = REPLACE(address2, 'EXTENSION','EXT'),
    address2 = REPLACE(address2, 'FALLS','FLS'),
    address2 = REPLACE(address2, 'FERRY','FRY'),
    address2 = REPLACE(address2, 'FIELD','FLD'),
    address2 = REPLACE(address2, 'FIELDS','FLDS'),
    address2 = REPLACE(address2, 'FLAT','FLT'),
    address2 = REPLACE(address2, 'FLOOR','FL'),
    address2 = REPLACE(address2, 'FORD','FRD'),
    address2 = REPLACE(address2, 'FOREST','FRST'),
    address2 = REPLACE(address2, 'FORGE','FRG'),
    address2 = REPLACE(address2, 'FORK','FRK'),
    address2 = REPLACE(address2, 'FORKS','FRKS'),
    address2 = REPLACE(address2, 'FORT','FT'),
    address2 = REPLACE(address2, 'FREEWAY','FWY'),
    address2 = REPLACE(address2, 'FRONT','FRNT'),
    address2 = REPLACE(address2, 'GARDEN','GDNS'),
    address2 = REPLACE(address2, 'GARDENS','GDNS'),
    address2 = REPLACE(address2, 'GATEWAY','GTWY'),
    address2 = REPLACE(address2, 'GLEN','GLN'),
    address2 = REPLACE(address2, 'GREEN','GRN'),
    address2 = REPLACE(address2, 'GROVE','GRV'),
    address2 = REPLACE(address2, 'HANGER','HNGR'),
    address2 = REPLACE(address2, 'HARBOR','HBR'),
    address2 = REPLACE(address2, 'HAVEN','HVN'),
    address2 = REPLACE(address2, 'HEIGHTS','HTS'),
    address2 = REPLACE(address2, 'HIGHWAY','HWY'),
    address2 = REPLACE(address2, 'HILL','HL'),
    address2 = REPLACE(address2, 'HILLS','HLS'),
    address2 = REPLACE(address2, 'HOLLOW','HOLW'),
    address2 = REPLACE(address2, 'INLET','INLT'),
    address2 = REPLACE(address2, 'ISLAND','IS'),
    address2 = REPLACE(address2, 'ISLANDS','ISS'),
    address2 = REPLACE(address2, 'JUNCTION','JCT'),
    address2 = REPLACE(address2, 'KEY','KY'),
    address2 = REPLACE(address2, 'KNOLL','KNLS'),
    address2 = REPLACE(address2, 'KNOLLS','KNLS'),
    address2 = REPLACE(address2, 'LAKE','LK'),
    address2 = REPLACE(address2, 'LAKES','LKS'),
    address2 = REPLACE(address2, 'LANDING','LNDG'),
    address2 = REPLACE(address2, 'LANE','LN'),
    address2 = REPLACE(address2, 'LIGHT','LGT'),
    address2 = REPLACE(address2, 'LOAF','LF'),
    address2 = REPLACE(address2, 'LOBBY','LBBY'),
    address2 = REPLACE(address2, 'LOCK','LCKS'),
    address2 = REPLACE(address2, 'LOCKS','LCKS'),
    address2 = REPLACE(address2, 'LODGE','LDG'),
    address2 = REPLACE(address2, 'LOWER','LOWR'),
    address2 = REPLACE(address2, 'MANOR','MNR'),
    address2 = REPLACE(address2, 'MEADOW','MDWS'),
    address2 = REPLACE(address2, 'MEADOWS','MDWS'),
    address2 = REPLACE(address2, 'MILL','ML'),
    address2 = REPLACE(address2, 'MILLS','MLS'),
    address2 = REPLACE(address2, 'MISSION','MSN'),
    address2 = REPLACE(address2, 'MOUNT','MT'),
    address2 = REPLACE(address2, 'MOUNTAIN','MTN'),
    address2 = REPLACE(address2, 'NECK','NCK'),
    address2 = REPLACE(address2, 'OFFICE','OFC'),
    address2 = REPLACE(address2, 'ORCHARD','ORCH'),
    address2 = REPLACE(address2, 'PARKWAY','PKWY'),
    address2 = REPLACE(address2, 'PENTHOUSE','PH'),
    address2 = REPLACE(address2, 'PINE','PNES'),
    address2 = REPLACE(address2, 'PINES','PNES'),
    address2 = REPLACE(address2, 'PLACE','PL'),
    address2 = REPLACE(address2, 'PLAIN','PLN'),
    address2 = REPLACE(address2, 'PLAINS','PLNS'),
    address2 = REPLACE(address2, 'PLAZA','PLZ'),
    address2 = REPLACE(address2, 'POINT','PT'),
    address2 = REPLACE(address2, 'PORT','PRT'),
    address2 = REPLACE(address2, 'PRAIRIE','PR'),
    address2 = REPLACE(address2, 'RADIAL','RADL'),
    address2 = REPLACE(address2, 'RANCH','RNCH'),
    address2 = REPLACE(address2, 'RAPID','RPDS'),
    address2 = REPLACE(address2, 'RAPIDS','RPDS'),
    address2 = REPLACE(address2, 'REST','RST'),
    address2 = REPLACE(address2, 'RIDGE','RDG'),
    address2 = REPLACE(address2, 'RIVER','RIV'),
    address2 = REPLACE(address2, 'ROAD','RD'),
    address2 = REPLACE(address2, 'ROOM','RM'),
    address2 = REPLACE(address2, 'SHOAL','SHL'),
    address2 = REPLACE(address2, 'SHOALS','SHLS'),
    address2 = REPLACE(address2, 'SHORE','SHR'),
    address2 = REPLACE(address2, 'SHORES','SHRS'),
    address2 = REPLACE(address2, 'SPACE','SPC'),
    address2 = REPLACE(address2, 'SPRING','SPG'),
    address2 = REPLACE(address2, 'SPRINGS','SPGS'),
    address2 = REPLACE(address2, 'SQUARE','SQ'),
    address2 = REPLACE(address2, 'STATION','STA'),
    address2 = REPLACE(address2, 'STRAVENUE','STRA'),
    address2 = REPLACE(address2, 'STREAM','STRM'),
    address2 = REPLACE(address2, 'STREET','ST'),
    address2 = REPLACE(address2, 'SUITE','STE'),
    address2 = REPLACE(address2, 'SUMMIT','SMT'),
    address2 = REPLACE(address2, 'TERRACE','TER'),
    address2 = REPLACE(address2, 'TRACE','TRCE'),
    address2 = REPLACE(address2, 'TRACK','TRAK'),
    address2 = REPLACE(address2, 'TRAFFICWAY','TRFY'),
    address2 = REPLACE(address2, 'TRAIL','TRL'),
    address2 = REPLACE(address2, 'TRAILER','TRLR'),
    address2 = REPLACE(address2, 'TUNNEL','TUNL'),
    address2 = REPLACE(address2, 'TURNPIKE','TPKE'),
    address2 = REPLACE(address2, 'UNION','UN'),
    address2 = REPLACE(address2, 'UPPER','UPPR'),
    address2 = REPLACE(address2, 'VALLEY','VLY'),
    address2 = REPLACE(address2, 'VIADUCT','VIA'),
    address2 = REPLACE(address2, 'VIEW','VW'),
    address2 = REPLACE(address2, 'VILLAGE','VLG'),
    address2 = REPLACE(address2, 'VILLE','VL'),
    address2 = REPLACE(address2, 'VISTA','VIS'),
    address2 = REPLACE(address2, 'WAY','WAY'),
    address2 = REPLACE(address2, 'WELL','WLS'),
    address2 = REPLACE(address2, 'WELLS','WLS')
WHERE
    id = _consumer_id
    AND address2  != '';

请注意 - 如果您不检查前后的空格,或者验证要替换的标记是否不是复合词的一部分,则会破坏数据。例如,Grand Forks,Eastern Ave,Buena Vista。这些都是不正确的。 - HungryBeagle

0

OpenRefine可能是您的答案。

但您需要将其导出为csv格式文件,然后在OpenRefine中清理数据,并将其重新导入到SQL Server中。


0

我的工作是帮助编写地址验证软件(为SmartyStreets)。我想回应Thomas的答案,唯一实际有效的解决方案是使用CASS认证的供应商。这非常复杂,但这些服务可以为您完成并且做得很好。

我还要补充说,大多数免费API都有许可限制,禁止将其用于处理地址列表(Google不是唯一一个——即使USPS也有使用其API的限制)。

我建议您使用LiveAddressCASS-Certified Scrubbing等服务来满足您的需求(后者可能更适合现有表格),但我会让您自己进行研究,以便更了解情况。如果您有任何与地址相关的问题,我很乐意亲自帮助您。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接