有人知道如何将geonames.org数据导入我的数据库吗? 我正在尝试导入的是http://download.geonames.org/export/dump/DO.zip,我的数据库是MySQL数据库。
有人知道如何将geonames.org数据导入我的数据库吗? 我正在尝试导入的是http://download.geonames.org/export/dump/DO.zip,我的数据库是MySQL数据库。
我通过查看你在链接的zip文件中所提到的名为“主要'GeoName'表具有以下字段:”部分的自述文件找到了以下内容。
首先,在您的MySQL实例上创建数据库和表。每行都给出了上面我引用标题的部分中的字段类型。
CREATE DATABASE DO_test;
CREATE TABLE `DO_test`.`DO_table` (
`geonameid` INT,
`name` varchar(200),
`asciiname` varchar(200),
`alternatenames` varchar(5000),
`latitude` DECIMAL(10,7),
`longitude` DECIMAL(10,7),
`feature class` char(1),
`feature code` varchar(10),
`country code` char(2),
`cc2` char(60),
`admin1 code` varchar(20),
`admin2 code` varchar(80),
`admin3 code` varchar(20),
`admin4 code` varchar(20),
`population` bigint,
`elevation` INT,
`gtopo30` INT,
`timezone` varchar(100),
`modification date` date
)
CHARACTER SET utf8;
在创建完表格后,您可以从文件中导入数据。字段由制表符分隔,行由换行符表示:
LOAD DATA INFILE '/path/to/your/file/DO.txt' INTO TABLE `DO_test`.`DO_table`;
tr -d '\\' < geonames_cities.txt > cities.csv
- baptxmy.cnf
中的[mysqld]
下添加sql-mode =“”
,则无需删除反斜杠。 https://dev59.com/RF4b5IYBdhLWcg3w3k6Y#1dbonYgBc1ULPQZFWUjd - baptx我最近制作了一个shell脚本,从GeoNames网站下载最新数据并将其导入MySQL数据库。它基于GeoNames论坛上的知识,并节省了我很多时间。
它是第一个版本,但完全功能正常。也许它可以帮助你。
你可以在http://codigofuerte.github.com/GeoNames-MySQL-DataImport/访问它。
SET character_set_database=utf8;
LOAD DATA INFILE '/home/data/countryInfo.txt' INTO TABLE _geo_countries IGNORE 51 LINES(ISO2,ISO3,ISO_Numeric,FIPSCode,AsciiName,Capital,Area_SqKm,Population,ContinentCode,TLD,CurrencyCode,CurrencyName,PhoneCodes,PostalCodeFormats,PostalCodeRegex,Languages,GeonameID,Neighbours,EquivalentFIPSCodes);
SET character_set_database=default;
The main 'geoname' table has the following fields :
---------------------------------------------------
geonameid : integer id of record in geonames database
name : name of geographical point (utf8) varchar(200)
asciiname : name of geographical point in plain ascii characters, varchar(200)
alternatenames : alternatenames, comma separated varchar(5000)
latitude : latitude in decimal degrees (wgs84)
longitude : longitude in decimal degrees (wgs84)
feature class : see http://www.geonames.org/export/codes.html, char(1)
feature code : see http://www.geonames.org/export/codes.html, varchar(10)
country code : ISO-3166 2-letter country code, 2 characters
cc2 : alternate country codes, comma separated, ISO-3166 2-letter country code, 60 characters
admin1 code : fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)
admin2 code : code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80)
admin3 code : code for third level administrative division, varchar(20)
admin4 code : code for fourth level administrative division, varchar(20)
population : bigint (8 byte int)
elevation : in meters, integer
dem : digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.
timezone : the timezone id (see file timeZone.txt) varchar(40)
modification date : date of last modification in yyyy-MM-dd format
关于varchar(5000),我们需要知道MySQL 5.0或更高版本中每行的64kb大小: 在MySQL中,VARCHAR(20000)是否有效?
Download desired data file from the official download page. In my case I chose cities1000.zip because it's much smaller in size (21MB) than the all-inclusive allcountries.zip (1.4GB).
Create the following schema and table according to readme.txt on the download page, where the fields are specified below the text "the main 'geoname' table has the following fields".
CREATE SCHEMA geonames DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
CREATE TABLE geonames.cities1000 (
id INT,
name VARCHAR(200),
ascii_name VARCHAR(200),
alternate_names VARCHAR(10000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
latitude DECIMAL(10, 7),
longitude DECIMAL(10, 7),
feature_class CHAR(1),
feature_code VARCHAR(10),
country_code CHAR(2),
cc2 CHAR(60),
admin1_code VARCHAR(20),
admin2_code VARCHAR(80),
admin3_code VARCHAR(20),
admin4_code VARCHAR(20),
population BIGINT,
elevation INT,
dem INT,
timezone VARCHAR(100),
modification_date DATE
)
CHARACTER SET utf8;
Field names are arbitrary as long as the column size and field types are the same as specified. alternate_names
are specially defined with the character set utf8mb4
because the values for this column in the file contain 4-byte unicode characters which are not supported by the character set utf8
of MySQL.
Check the values of these parameters: character_set_client, character_set_results, character_set_connection. 7
SHOW VARIABLES LIKE '%char%';
If they are not utf8mb4, then change them:
SET character_set_client = utf8mb4;
SET character_set_results = utf8mb4;
SET character_set_connection = utf8mb4;
Import data from file using LOAD DATA INFILE ...
USE geonames;
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 5.7\\Uploads\\cities1000.txt' INTO TABLE cities1000
CHARACTER SET utf8mb4 (id, name, ascii_name, alternate_names, latitude, longitude, feature_class, feature_code,
country_code, cc2, admin1_code, admin2_code, admin3_code, admin4_code, population, @val1,
@val2, timezone, modification_date)
SET elevation = if(@val1 = '', NULL, @val1), dem = if(@val2 = '', NULL, @val2);
Explanation for the statement:
The file should be placed under a designated location by MySQL for importing data from files. You can check the location with SHOW VARIABLES LIKE 'secure_file_priv';
. In my case it's C:\ProgramData\MySQL\MySQL Server 5.7\Uploads
. In Windows you need to use double slashes to represent one slash in the path. This error would be shown when the path is not given correctly: [HY000][1290] The MySQL server is running with the --secure-file-priv option so it cannot execute this statement
.
With CHARACTER SET utf8mb4
you're telling MySQL what encoding to expect from the file. When this is not given explicitly, or the column encoding is not utf8mb4, an error prompt like this will be seen: [HY000][1300] Invalid utf8 character string: 'Gorad Safija,SOF,Serdica,Sofi,Sofia,Sofiae,Sofie,Sofii,Sofij,Sof'
. 5 In my case I found it's due to the existence of Gothic letters in the alternate names, such as (id 727011), (id 3464975), and (id 3893894). These letters need to be stored as 4-byte characters (utf8mb4) while my then encoding was utf8 which only supports up to 3-byte characters. 6 You can change column encoding after the table is created:
ALTER TABLE cities1000 MODIFY alternate_names VARCHAR(10000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
To check the encoding of a column:
SELECT character_set_name, COLLATION_NAME FROM information_schema.COLUMNS WHERE table_schema = 'geonames' AND table_name = 'cities1000' AND column_name = 'alternate_names';
To test if the characters can be stored:
UPDATE cities1000 SET alternate_names = '' WHERE id = 1;
Values for some columns need to be "improved" before they are inserted, such as elevation
and dem
. They are of type INT and values for them from the file could be empty strings, which can't be stored by an INT type column. So you need to convert those empty strings to null for those columns. The latter part of the statement just serves this purpose. This error would be shown when the values are not property converted first: [HY000][1366] Incorrect integer value: '' for column 'elevation' at row 1
. 3, 4