我写了一个脚本,将文本块发送到Google进行翻译,但有时文本(即html源代码)会在html标签中间分裂,导致Google返回错误的代码。
我已经知道如何将字符串拆分成数组,但是否有更好的方法来确保输出字符串不超过5000个字符,并且不在标签上分裂?
更新:感谢回答,这是我在项目中最终使用的代码,并且运行良好。
我已经知道如何将字符串拆分成数组,但是否有更好的方法来确保输出字符串不超过5000个字符,并且不在标签上分裂?
更新:感谢回答,这是我在项目中最终使用的代码,并且运行良好。
function handleTextHtmlSplit($text, $maxSize) {
//our collection array
$niceHtml[] = '';
// Splits on tags, but also includes each tag as an item in the result
$pieces = preg_split('/(<[^>]*>)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
//the current position of the index
$currentPiece = 0;
//start assembling a group until it gets to max size
foreach ($pieces as $piece) {
//make sure string length of this piece will not exceed max size when inserted
if (strlen($niceHtml[$currentPiece] . $piece) > $maxSize) {
//advance current piece
//will put overflow into next group
$currentPiece += 1;
//create empty string as value for next piece in the index
$niceHtml[$currentPiece] = '';
}
//insert piece into our master array
$niceHtml[$currentPiece] .= $piece;
}
//return array of nicely handled html
return $niceHtml;
}