使用PHP自动将HTML表格转换为CSV格式?

22

我只需要使用PHP自动将这个HTML表格转换为CSV。有人能提供任何想法吗?谢谢。

$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

大家好,我只需要将$table转换为.csv文件,并且希望可以使用一些PHP函数自动生成。我们可以将该csv文件定义在路径/test/home/path_to_csv中。


我在谷歌上搜索了,但没有找到相关的内容。 - Thompson
我知道这不完全是你想要的,但你可以将它保存为xls文件,Excel会按预期打开它。 - dm03514
你到底想要实现什么?是要解析HTML并提取数据,还是进行一次性分析?如果是一次性的,我建议你查看这里找到的信息(https://dev59.com/XHVC5IYBdhLWcg3wixw0)。对于更通用和自动化的工作,需要解析HTML - 尝试使用简单解析器(http://simplehtmldom.sourceforge.net/)。或者,如果HTML具有固定的格式,则可以尝试使用正则表达式来抓取标签并仅留下数据。(例如,将td替换为某些字段分隔符)然后,CSV输出就是一个微不足道的问题。 - petr
我并不想解析任何东西。我有自己的变量**$table**,其中包含HTML表格数据。现在我想使用PHP将这些数据转换为.csv文件。 - Thompson
你试过这个吗?http://davidvielmetter.com/tricks/howto-convert-an-html-table-to-csv-using-php/ - Ozair Kafray
显示剩余2条评论
8个回答

26
你可以使用str_get_html http://simplehtmldom.sourceforge.net/
include "simple_html_dom.php";
$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

$html = str_get_html($table);



header('Content-type: application/ms-excel');
header('Content-Disposition: attachment; filename=sample.csv');

$fp = fopen("php://output", "w");

foreach($html->find('tr') as $element)
{
        $th = array();
        foreach( $element->find('th') as $row)  
        {
            $th [] = $row->plaintext;
        }

        $td = array();
        foreach( $element->find('td') as $row)  
        {
            $td [] = $row->plaintext;
        }
        !empty($th) ? fputcsv($fp, $th) : fputcsv($fp, $td);
}


fclose($fp);

爸爸,我们该怎么定义一些路径,让那个csv文件自动保存到例如**/test/home/path_to_csv**? - Thompson
1
是的,你可以......你需要做的就是删除头文件....并将 "php://output" 更改为你的路径。 - Baba
谢谢你的方法。我会在我的电脑上尝试一下(目前我在用手机),然后告诉你结果。 - Thompson
糟糕!巴巴,我把 php://output 改成了 /mohan/path_to_csv,当我加载页面时,它会给出多个警告!例如 Warning: fopen(/mohan/path_to_csv) [function.fopen]: failed to open stream: No such file or directory in C:\wamp\www\mohan\TableToCsv.php on line 29。第29行是 **$fp = fopen("/mohan/path_to_csv", "w");**。 - Thompson
fputcsv有其他的参数可以供您尝试,确保您过滤掉空数组。 - Baba
显示剩余3条评论

17

您可以将此函数用于单独的 js 文件中:

function exportTableToCSV($table, filename) {

        var $rows = $table.find('tr:has(td)'),

            // Temporary delimiter characters unlikely to be typed by keyboard
            // This is to avoid accidentally splitting the actual contents
            tmpColDelim = String.fromCharCode(11), // vertical tab character
            tmpRowDelim = String.fromCharCode(0), // null character

            // actual delimiter characters for CSV format
            colDelim = '","',
            rowDelim = '"\r\n"',

            // Grab text from table into CSV formatted string
            csv = '"' + $rows.map(function (i, row) {
                var $row = $(row),
                    $cols = $row.find('td');

                return $cols.map(function (j, col) {
                    var $col = $(col),
                        text = $col.text();

                    return text.replace('"', '""'); // escape double quotes

                }).get().join(tmpColDelim);

            }).get().join(tmpRowDelim)
                .split(tmpRowDelim).join(rowDelim)
                .split(tmpColDelim).join(colDelim) + '"',

            // Data URI
            csvData = 'data:application/csv;charset=utf-8,' + encodeURIComponent(csv);

        $(this)
            .attr({
            'download': filename,
                'href': csvData,
                'target': '_blank'
        });
    }

现在,要启动此功能,您可以使用:

$('.getfile').click(
            function() { 
    exportTableToCSV.apply(this, [$('#thetable'), 'filename.csv']);
             });

在您希望添加行动呼叫的按钮上,'getfile' 应该是分配的类。 (单击此按钮会出现下载弹出窗口)

“thetable”应该是分配给要下载的表格的ID。

您还可以在代码中更改自定义文件名以进行下载。


2
这个工作非常出色。我将以您的荣誉献祭100头公牛。 - OllyBarca
1
请更新代码,将HTML<a>元素作为可点击的“按钮”包含在其中。我一开始使用<button>元素,花了几分钟才意识到它必须是<a>,或者至少要提到它应该是<a>元素。谢谢。 - Owen Far

6
你可以使用数组和正则表达式来完成这个操作...请参见下面的代码:
$csv = array();
preg_match('/<table(>| [^>]*>)(.*?)<\/table( |>)/is',$table,$b);
$table = $b[2];
preg_match_all('/<tr(>| [^>]*>)(.*?)<\/tr( |>)/is',$table,$b);
$rows = $b[2];
foreach ($rows as $row) {
    //cycle through each row
    if(preg_match('/<th(>| [^>]*>)(.*?)<\/th( |>)/is',$row)) {
        //match for table headers
        preg_match_all('/<th(>| [^>]*>)(.*?)<\/th( |>)/is',$row,$b);
        $csv[] = strip_tags(implode(',',$b[2]));
    } elseif(preg_match('/<td(>| [^>]*>)(.*?)<\/td( |>)/is',$row)) {
        //match for table cells
        preg_match_all('/<td(>| [^>]*>)(.*?)<\/td( |>)/is',$row,$b);
        $csv[] = strip_tags(implode(',',$b[2]));
    }
}
$csv = implode("\n", $csv);
var_dump($csv);

然后,您可以使用file_put_contents() 将CSV字符串写入文件。


经典答案。如何处理要放入CSV中的HTML标签? - Vishal Kumar Sahu
1
我只是在当时提供的范围内工作,那是5年或更久以前的事了。如果您正确转义字符串,它们应该保持其列结构。 - Jacob Cruz
太好了!在当前情境下也是高效的答案。 :-) - Vishal Kumar Sahu

5
为了扩展已接受的答案,我做了这个,它允许我通过类名忽略列,并处理空行/列。
你可以使用str_get_html http://simplehtmldom.sourceforge.net/。只需包含它,然后就可以开始使用了! :)
$html = str_get_html($html); // give this your HTML string

header('Content-type: application/ms-excel');
header('Content-Disposition: attachment; filename=sample.csv');

$fp = fopen("php://output", "w");

foreach($html->find('tr') as $element) {
  $td = array();
  foreach( $element->find('th') as $row) {
    if (strpos(trim($row->class), 'actions') === false && strpos(trim($row->class), 'checker') === false) {
      $td [] = $row->plaintext;
    }
  }
  if (!empty($td)) {
    fputcsv($fp, $td);
  }

  $td = array();
  foreach( $element->find('td') as $row) {
    if (strpos(trim($row->class), 'actions') === false && strpos(trim($row->class), 'checker') === false) {
      $td [] = $row->plaintext;
    }
  }
  if (!empty($td)) {
    fputcsv($fp, $td);
  }
}

fclose($fp);
exit;

1

Baba的答案包含额外的空格。因此,我将代码更新为:

include "simple_html_dom.php";
$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

$html = str_get_html($table);



header('Content-type: application/ms-excel');
header('Content-Disposition: attachment; filename=sample.csv');

$fp = fopen("php://output", "w");

foreach($html->find('tr') as $element)
{
    $td = array();
foreach( $element->find('th') as $row)
{
    $td [] = $row->plaintext;
}

foreach( $element->find('td') as $row)
{
    $td [] = $row->plaintext;
}
fputcsv($fp, $td);
}


fclose($fp);


1
如果有人正在使用巴巴的答案,但是对添加了额外的空白感到困惑,那么这个方法可以解决问题:
include "simple_html_dom.php";
$table = '<table border="1">
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>';

$html = str_get_html($table);   

$fileName="export.csv";
header('Content-type: application/ms-excel');
header("Content-Disposition: attachment; filename=$fileName");

$fp = fopen("php://output", "w");
$csvString="";

$html = str_get_html(trim($table));
foreach($html->find('tr') as $element)
{

    $td = array();
    foreach( $element->find('th') as $row)
    {
        $row->plaintext="\"$row->plaintext\"";
        $td [] = $row->plaintext;
    }
    $td=array_filter($td);
    $csvString.=implode(",", $td);

    $td = array();
    foreach( $element->find('td') as $row)
    {
        $row->plaintext="\"$row->plaintext\"";
        $td [] = $row->plaintext;
    }
    $td=array_filter($td);
    $csvString.=implode(",", $td)."\n";
}
echo $csvString;
fclose($fp);
exit;

}


1
我基于此线程上的代码改编了一个简单的类,现在可以处理colspanrowspan。没有经过严格测试,我相信它可以进行优化。
用法:
require_once('table2csv.php');

$table = '<table border="1">
    <tr>
    <th colspan=2>Header 1</th>
    </tr>
    <tr>
    <td>row 1, cell 1</td>
    <td>row 1, cell 2</td>
    </tr>
    <tr>
    <td>row 2, cell 1</td>
    <td>row 2, cell 2</td>
    </tr>
    <tr>
    <td rowspan=2>top left row</td>
    <td>top right row</td>
    </tr>
    <tr>
    <td>bottom right</td>
    </tr>
    </table>';

table2csv($table,"sample.csv",true);

table2csv.php

<?php

    //download @ http://simplehtmldom.sourceforge.net/
    require_once('simple_html_dom.php');
    $repeatContentIntoSpannedCells = false;


    //--------------------------------------------------------------------------------------------------------------------

    function table2csv($rawHTML,$filename,$repeatContent) {

        //get rid of sups - they mess up the wmus
        for ($i=1; $i <= 20; $i++) { 
            $rawHTML = str_replace("<sup>".$i."</sup>", "", $rawHTML);
        }

        global $repeatContentIntoSpannedCells;

        $html = str_get_html(trim($rawHTML));
        $repeatContentIntoSpannedCells = $repeatContent;

        //we need to pre-initialize the array based on the size of the table (how many rows vs how many columns)

        //counting rows is easy
        $rowCount = count($html->find('tr'));

        //column counting is a bit trickier, we have to iterate through the rows and basically pull out the max found
        $colCount = 0;
        foreach ($html->find('tr') as $element) {

            $tempColCount = 0;

            foreach ($element->find('th') as $cell) {
                $tempColCount++;
            }

            if ($tempColCount == 0) {
                foreach ($element->find('td') as $cell) {
                    $tempColCount++;
                }
            }

            if ($tempColCount > $colCount) $colCount = $tempColCount;
        }

        $mdTable = array();

        for ($i=0; $i < $rowCount; $i++) { 
            array_push($mdTable, array_fill(0, $colCount, NULL));
        }

        //////////done predefining array

        $rowPos = 0;
        $fp = fopen($filename, "w");

        foreach ($html->find('tr') as $element) {

            $colPos = 0;

            foreach ($element->find('th') as $cell) {
                if (strpos(trim($cell->class), 'actions') === false && strpos(trim($cell->class), 'checker') === false) {
                    parseCell($cell,$mdTable,$rowPos,$colPos);
                }
                $colPos++;
            }

            foreach ($element->find('td') as $cell) {
                if (strpos(trim($cell->class), 'actions') === false && strpos(trim($cell->class), 'checker') === false) {
                    parseCell($cell,$mdTable,$rowPos,$colPos);
                }
                $colPos++;
            }   

            $rowPos++;
        }


        foreach ($mdTable as $key => $row) {

            //clean the data
            array_walk($row, "cleanCell");
            fputcsv($fp, $row);
        }
    }


    function cleanCell(&$contents,$key) {

        $contents = trim($contents);

        //get rid of pesky &nbsp's (aka: non-breaking spaces)
        $contents = trim($contents,chr(0xC2).chr(0xA0));
        $contents = str_replace("&nbsp;", "", $contents);
    }


    function parseCell(&$cell,&$mdTable,&$rowPos,&$colPos) {

        global $repeatContentIntoSpannedCells;

        //if data has already been set into the cell, skip it
        while (isset($mdTable[$rowPos][$colPos])) {
            $colPos++;
        }

        $mdTable[$rowPos][$colPos] = $cell->plaintext;

        if (isset($cell->rowspan)) {

            for ($i=1; $i <= ($cell->rowspan)-1; $i++) {
                $mdTable[$rowPos+$i][$colPos] = ($repeatContentIntoSpannedCells ? $cell->plaintext : "");
            }
        }

        if (isset($cell->colspan)) {

            for ($i=1; $i <= ($cell->colspan)-1; $i++) {

                $colPos++;
                $mdTable[$rowPos][$colPos] = ($repeatContentIntoSpannedCells ? $cell->plaintext : "");
            }
        }
    }

?>

0

假设out_str包含您的HTML表格数据

$csv = $out_str;
        $csv = str_replace("<table class='gradienttable'>","",$csv);
        $csv = str_replace("</table>","",$csv);
        $csv = str_replace("</td><td>",",",$csv);
        $csv = str_replace("<td>","",$csv);
        $csv = str_replace("</td>","",$csv);
        $csv = str_replace("</font>","",$csv);
        $csv = str_replace("</tr>","",$csv);
        $csv = str_replace("<tr>","\n",$csv);

从文件中删除任何CSS

        $csv = str_replace("<font color='yellow'>","",$csv);
        $csv = str_replace("<font color='red'>","",$csv);
        $csv = str_replace("<font color='green'>","",$csv);
        $csv = str_replace("</th><th>",",",$csv);
        $csv = str_replace("<th>","",$csv);
        $csv = str_replace("</th>","",$csv);

        file_put_contents('currentFile.csv',$csv);

将当前文件 currentFile.csv 输出给用户

希望能有所帮助!


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接