我有一个网站,每个月通过FTP收到一个CSV文件。多年来,这是一个ASCII文件。现在我接收UTF-8一个月,然后是UTF-16BE,下个月是UTF-16LE。也许下个月我会得到UTF-32。fgets返回UTF文件开头的字节顺序标记。如何让PHP自动识别字符编码?我尝试了mb_detect_encoding方法,但无论文件类型都返回ASCII。我改变了我的代码以读取BOM并将字符编码明确地放入mb_convert_encoding中。这起作用,直到最新的文件,它是UTF-16LE。在这个文件中,它可以正确读取第一行,但所有后续行都显示为问号(“?”)。我做错了什么?
$fhandle = fopen( $file_in, "r" );
if ( fhandle === false )
{
echo "<p class=redbold>Error opening file $file_in.</p>";
die();
}
$i = 0;
while( ( $line = fgets( $fhandle ) ) !== false )
{
$i++;
// Detect encoding on first line. Actual text always begins with string "Document"
if ( $i == 1 )
{
$line_start = substr( $line, 0, 4 );
$line_start_hex = bin2hex( $line_start );
$utf16_start = 'fffe4400';
$utf8_start = 'efbbbf44';
if ( strcmp( $line_start, 'Docu' ) == 0 )
{ $char_encoding = 'ASCII'; }
elseif ( strcmp( $line_start_hex, 'efbbbf44' ) == 0 )
{
$char_encoding = 'UTF-8';
$line = substr( $line, 3 );
}
elseif ( strcmp( $line_start_hex, 'fffe4400' ) == 0 )
{
$char_encoding = 'UTF-16LE';
$line = substr( $line, 2 );
}
elseif ( strcmp( $line_start_hex, 'feff4400' ) == 0 )
{
$char_encoding = 'UTF-16BE';
$line = substr( $line, 2 );
}
else
{
echo "<p class=redbold>Error, unknown character encoding. Line =<br>", $line_start_hex, '</p>';
require( '../footer.php' );
die();
}
echo "<p>char_encoding = $char_encoding</p>";
}
// Convert UTF
if ( $char_encoding != 'ASCII' )
{
$line = mb_convert_encoding( $line, 'ASCII', $char_encoding);
}
echo '<p>'; var_dump( $line ); echo '</p>';
}
输出:
char_encoding = UTF-16LE
string(101) "DocumentNumber,RecordedTS,Title,PageCount,City,TransTaxAccountCode,TotalTransferTax,Description,Name
"
string(83) "???????????????????????????????????????????????????????????????????????????????????"
string(88) "????????????????????????????????????????????????????????????????????????????????????????"
string(84) "????????????????????????????????????????????????????????????????????????????????????"
string(80) "????????????????????????????????????????????????????????????????????????????????"