使用 LOAD DATA INFILE 将 CSV 文件导入到 MySQL 数据库中的 PHP 实现

16

我有一个像这样的 .csv 文件数据

Date,Name,Call Type,Number,Duration,Address,PostalCode,City,State,Country,Latitude,Longitude
"Sep-18-2013 01:53:45 PM","Unknown","outgoing call",'123456',"0 Secs","null","null","null","null","null",0.0,0.0,,,
"Sep-18-2013 01:54:14 PM","Unknown","outgoing call",'1234567890',"0 Secs","null","null","null","null","null",0.0,0.0,,,
"Sep-18-2013 01:54:37 PM","Unknown","outgoing call",'14772580369',"1 Secs","null","null","null","null","null",0.0,0.0,,,

我正在使用以下代码将数据插入到数据库中:

$sql = "LOAD DATA INFILE `detection.csv`
              INTO TABLE `calldetections`
              FIELDS TERMINATED BY '".@mysql_escape_string(",").
             "` OPTIONALLY ENCLOSED BY `".@mysql_escape_string("\"").
             "` OPTIONALLY ENCLOSED BY `".@mysql_escape_string("\'").
             "` ESCAPED BY `".@mysql_escape_string("\\").
              "` LINES TERMINATED BY `".",,,\\r\\n".
             "`IGNORE 1 LINES `"

             ."(`date`,`name`,`type`,`number`,`duration`,`addr`,`pin`,`city`,`state`,`country`,`lat`,`log`)";
      $res = @mysql_query($con,$sql); 

但是什么都没有插入;哪里出错了?

2个回答

47

如果在执行查询之前使用echo($sql);,你会发现你的查询语句存在以下问题:

  1. 文件名应该用引号括起来而不是反引号,因为它是字符串字面量而不是标识符。

  2. FIELDS TERMINATED BYENCLOSED BY以及ESCAPED BY从句中指定分隔符时,绝对没有必要调用mysql_escape_string()

  3. 你过度使用了反引号。实际上,在你的情况下,由于没有使用保留字,可以全部弃用。它们只会增加混乱程度。

  4. 在CSV文件的第一行末尾,必须有,,,,因为你将它们用作行分隔符的部分。如果不这样做,你将跳过不仅第一行,还包括第二行的数据。

  5. 不能多次使用ENCLOSED BY从句。你必须以不同的方式处理Number字段。

  6. 根据你的示例行,我认为你不需要ESCAPED BY。但是,如果你觉得需要,可以这样使用:ESCAPED BY '\\'

话虽如此,一个语法正确的语句可能看起来像这样:

LOAD DATA INFILE 'detection.csv'
INTO TABLE calldetections
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"' 
LINES TERMINATED BY ',,,\r\n'
IGNORE 1 LINES 
(date, name, type, number, duration, addr, pin, city, state, country, lat, log)

我认为您需要在加载时转换一些字段:

  1. 如果您表中的 datedatetime 数据类型,则需要进行转换,否则会出现错误:

    Incorrect datetime value: 'Sep-18-2013 01:53:45 PM' for column 'date' at row

  2. 您需要处理 Number 字段中值周围的单引号。

  3. 您可能想要将 "null" 字符串文字更改为实际的 NULL,以用于 addr, pin, city, state, country 列。

  4. 如果持续时间始终以秒为单位,则可以提取整数秒值并以这种方式存储它们,以便稍后轻松聚合持续时间值。

说到这里,一个有用的语句应该长这个样子:

LOAD DATA INFILE 'detection.csv'
INTO TABLE calldetections
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"' 
LINES TERMINATED BY ',,,\r\n'
IGNORE 1 LINES 
(@date, name, type, @number, @duration, @addr, @pin, @city, @state, @country, lat, log)
SET date = STR_TO_DATE(@date, '%b-%d-%Y %h:%i:%s %p'),
    number = TRIM(BOTH '\'' FROM @number),
    duration = 1 * TRIM(TRAILING 'Secs' FROM @duration),
    addr = NULLIF(@addr, 'null'),
    pin  = NULLIF(@pin, 'null'),
    city = NULLIF(@city, 'null'),
    state = NULLIF(@state, 'null'),
    country = NULLIF(@country, 'null') 

以下是在我的计算机上执行查询的结果:

mysql> LOAD DATA INFILE '/tmp/detection.csv'
    -> INTO TABLE calldetections
    -> FIELDS TERMINATED BY ','
    -> OPTIONALLY ENCLOSED BY '"' 
    -> LINES TERMINATED BY ',,,\n'
    -> IGNORE 1 LINES 
    -> (@date, name, type, @number, @duration, @addr, @pin, @city, @state, @country, lat, log)
    -> SET date = STR_TO_DATE(@date, '%b-%d-%Y %h:%i:%s %p'),
    ->     number = TRIM(BOTH '\'' FROM @number),
    ->     duration = 1 * TRIM(TRAILING 'Secs' FROM @duration),
    ->     addr = NULLIF(@addr, 'null'),
    ->     pin  = NULLIF(@pin, 'null'),
    ->     city = NULLIF(@city, 'null'),
    ->     state = NULLIF(@state, 'null'),
    ->     country = NULLIF(@country, 'null');
查询 OK,已影响 3 行(0.00 秒)
记录: 3  删除: 0  跳过: 0  警告: 0
mysql> select * from calldetections; +---------------------+---------+---------------+-------------+----------+------+------+------+-------+---------+------+------+ | date | name | type | number | duration | addr | pin | city | state | country | lat | log | +---------------------+---------+---------------+-------------+----------+------+------+------+-------+---------+------+------+ | 2013-09-18 13:53:45 | Unknown | outgoing call | 123456 | 0 | NULL | NULL | NULL | NULL | NULL | 0.0 | 0.0 | | 2013-09-18 13:54:14 | Unknown | outgoing call | 1234567890 | 0 | NULL | NULL | NULL | NULL | NULL | 0.0 | 0.0 | | 2013-09-18 13:54:37 | Unknown | outgoing call | 14772580369 | 1 | NULL | NULL | NULL | NULL | NULL | 0.0 | 0.0 | +---------------------+---------+---------------+-------------+----------+------+------+------+-------+---------+------+------+ 共 3 行(0.00 秒)

最后,使用PHP将查询字符串分配给$sql变量应该像这样:

$sql = "LOAD DATA INFILE 'detection.csv'
        INTO TABLE calldetections
        FIELDS TERMINATED BY ','
        OPTIONALLY ENCLOSED BY '\"' 
        LINES TERMINATED BY ',,,\\r\\n'
        IGNORE 1 LINES 
        (@date, name, type, @number, @duration, @addr, @pin, @city, @state, @country, lat, log)
        SET date = STR_TO_DATE(@date, '%b-%d-%Y %h:%i:%s %p'),
            number = TRIM(BOTH '\'' FROM @number),
            duration = 1 * TRIM(TRAILING 'Secs' FROM @duration),
            addr = NULLIF(@addr, 'null'),
            pin  = NULLIF(@pin, 'null'),
            city = NULLIF(@city, 'null'),
            state = NULLIF(@state, 'null'),
            country = NULLIF(@country, 'null') ";

2
非常感谢您详细的回答。 - chimbu
1
@chimbu 不用谢。祝你好运 :) 如果需要,随时提出跟进问题。 - peterm
@peterm,多么出色、清晰的回答啊。如果你不是一位薪酬非常丰厚的教师,你应该认真考虑一下。谢谢。 - chris
<?php include '../class/pdo.php'; $pdo = new PDO(); $pdo_connection = $pdo->pdo_connect(); $tt="1cs2cs1cs17409061651452795086two.csv"; $aa=","; $bb='"'; $cc='\n'; echo $query = "LOAD DATA LOCAL INFILE '".$tt."' INTO TABLE data2 FIELDS TERMINATED BY '".$aa."' OPTIONALLY ENCLOSED BY '".$bb."' LINES TERMINATED BY '".$cc."' IGNORE 1 LINES;"; $queryresource = $pdo_connection->query($query); ?> 我尝试使用和建议的相同方法,但只插入了2行,而CSV文件有100万行。 - Abhinav
是的,peterm,我已经在MySQL控制台中回显了查询并直接执行了它,然后它上传了数据,但只有两行而没有任何错误。请建议我哪里做错了。 - Abhinav
显示剩余3条评论

1

在1分钟内向数据库插入超过7000000条记录(带计算的超快速查询)

    mysqli_query($cons, '
    LOAD DATA LOCAL INFILE "'.$file.'"
    INTO TABLE tablename
    FIELDS TERMINATED by \',\'
    LINES TERMINATED BY \'\n\'
    IGNORE 1 LINES
    (isbn10,isbn13,price,discount,free_stock,report,report_date)
     SET RRP = IF(discount = 0.00,price-price * 45/100,IF(discount = 0.01,price,IF(discount != 0.00,price-price * discount/100,@RRP))),
         RRP_nl = RRP * 1.44 + 8,
         RRP_bl = RRP * 1.44 + 8,
         ID = NULL
    ')or die(mysqli_error());
    $affected = (int) (mysqli_affected_rows($cons))-1; 
    $log->lwrite('Inventory.CSV to database:'. $affected.' record inserted successfully.');

RRP、RRP_nl和RRP_bl不在CSV文件中,但我们已经计算出它们并在插入后进行了更新。


欢迎来到Stack Overflow!请不要在多个问题中添加相同的答案。回答最好的一个,并将其余的标记为重复。请参阅是否可以在多个问题中添加重复答案? - FelixSFD

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接