如何处理一个句子数组,返回另一个最长可能的句子字符数小于x的句子数组?

5

我有一个长度不同的句子数组。我们假设它看起来像这样:

sentences = [
   "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts."
   "I never thought that would happen!",
   "This one?",
   "No, no, that one.",
   "Okay but please ensure your sentences are long enough to be split when longer than 100 characters, although some could be too short as well.",
   "This is also a random text like all others",
]

我需要的是基于第一个数组构建另一个句子数组,使每个元素尽可能大,但不超过100个字符。相反,长度超过100个字符的句子应该被拆分成更小的块。所以,如果原始数组中有5个句子,长度如下:
[0] => 150
[1] => 10
[2] => 35
[3] => 5
[4] => 70

然后新数组应具有以下元素长度:
[0] => 100 // Split since longer than 100 chars
[1] => 100 // 50 carried forward from [0] + 10 + 35 + 5
[2] => 70

请注意,我不希望在过程中拆分单词。
我尝试了以下内容:
let para = [];

let index = 0;
let i = 0;
while(nsentences[i]) {
  let bigsentence = nsentences[i];
  let x = i + 1;

  let bs = bigsentence + ' ' + nsentences[x];
  console.log(bs);
  while(bs.length < 140){
    console.log(bs);

  }


  while(x) {
    let bs = bigsentence + ' ' + nsentences[x];
    if(bs.length < 100) {
      bigsentence += ' ' + nsentences[x];
      x++;
      i += x;
    } else {
      para.push(bigsentence);
      break;
    }
  }
}

但是,正如你所预料的那样,它不起作用。该代码片段只会返回前两个句子无限循环拼接在一起的结果!


1
如果[1]实际上是40长度,那么输出是否应该包含来自[0]的50个额外字符,来自[1]的40个字符和来自[2]的10个字符?或者输出元素[1]的长度应该是90,因为元素[2]是35长度? - briosheje
5个回答

2

通过空格将句子数组连接起来,然后使用正则表达式匹配最多100个字符,并在跟随空格(或字符串末尾)的位置结束,以确保最后匹配的字符位于单词结尾:

const sentences = [
   "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts.",
   "I never thought that would happen!",
   "This one?",
   "No, no, that one.",
   "Okay but please ensure your sentences are long enough to be split when longer than 100 characters, although some could be too short as well.",
   "This is also a random text like all others",
];

const words = sentences.join(' ');
const output = words.match(/\S.{1,99}(?= |$)/g);
console.log(output);

模式开头的\S用于确保第一个匹配的字符不是空格。

最初的回答:

在模式开头加上\S是为了确保第一个被匹配的字符不是空格。


哦,你是在告诉我我用代码做的一切都可以用一个正则表达式完成吗?真遗憾。 - briosheje

1
这里是一种略微不同的方法,依赖于一个函数生成器。
由于我并没有完全理解你的输出有多么有限,所以这个解决方案是:
- 获取由空格连接的单个字符串。 - 将该字符串按空格拆分。 - 产生一个长度小于等于100的句子,尽可能接近100。 - 继续直到字符串完成。
它可能需要进行审查以提高质量和性能,但仍应正确执行任务。下面的代码将生成一个包含99、95、96和70个元素的数组。

const sentences = [
   "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts.",
   "I never thought that would happen!",
   "This one?",
   "No, no, that one.",
   "Okay but please ensure your sentences are long enough to be split when longer than 100 characters, although some could be too short as well.",
   "This is also a random text like all others",
];

function* splitToLength(arr, length) {
  // Join the original array of strings and split it by spaces.
  let str = arr.join(' ').split(' ');
  let strlength = 0, acc = []; // Define a string length counter and an accumulator.
  for (let word of str) { // Iterate each word.
    if ((word.length + strlength + 1) <= length) acc.push(word), strlength += word.length + 1; // if the length of the looped word increased by one (empty space) is lower than the desired length, then accumulate the word and increase the counter by the lenght of the word plus one (empty space).
    else {
      yield acc.join(' '); // Otherwise, yield the current sentence.
      acc = [word]; // And reset the accumulator with just the current word.
      strlength = word.length + 1; // and reset the length counter to the current word length plus one (empty space).
    }
  }
  if (acc.length > 0) yield acc.join(' '); // finally, if the last sentence is not yet yield, do that.
}

const res = [...splitToLength(sentences, 100)];
console.log(res);
console.log(res.map(i => i.length));


1
我使用简单的循环完成了这个任务。算法如下:
  1. 创建一个包含所有单词的数组
  2. 逐个取出单词,确保不超过限制
  3. 当达到限制时,创建新行
  4. 当没有剩余单词时,返回所有行

const sentences = [
   "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts.",
   "I never thought that would happen!",   
   "This one?",   
   "No, no, that one.",
   "Okay but please ensure your sentences are long enough to be split when longer than 100 characters, although some could be too short as well.",
   "This is also a random text like all others"
];

const lengths = sentences => sentences.map(s => s.length); 

const words = sentences.join(' ').split(' ');

const takeWords = (charlimit,words) => {
  var currlinelength, lines = [], j=0;
  for(let i = 0;  ; i++){
    currlinelength = 0;
    lines[i] = "";
    while(true){
      if (j >= words.length) {
        //remove last space
        return lines.map(l => l.trim());
      }
      if ((currlinelength + words[j].length) > charlimit){
        break;
      }
      lines[i] += words[j] + " ";
      currlinelength += 1 + words[j].length; 
      j++;
    }
    
  }
};

console.log(lengths(sentences));
result = takeWords(100, words);
console.log(result);
console.log(lengths(result));

// output
[
  "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live",
  "the blind texts. I never thought that would happen! This one? No, no, that one. Okay but please",
  "ensure your sentences are long enough to be split when longer than 100 characters, although some",
  "could be too short as well. This is also a random text like all others"
]
// length of each sentence
[
  99,
  95,
  96,
  70
]

0
你也可以试试这个:

<!DOCTYPE html>
<html><script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script>

sentences = [
   "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts.",
   "I never thought that would happen!",
   "This one?",
   "No, no, that one.",
   "Okay but please ensure your sentences are long enough to be split when longer than 100 characters, although some could be too short as well.",
   "This is also a random text like all others"
]
function calculate(length){
var returnedArray = [];
index =0;
joint = sentences.join(' ');
 do{
  if(joint.length > length)
  {
   returnedArray[index] = joint.slice(0,100);
   index++;
   joint =joint.slice(101,joint.length);
  }
 }while(joint.length > length);
 if(joint.length)
 {
  returnedArray[index] = joint;
 }
 $.each(returnedArray, (key,value)=>{
  console.log(value.length);
 });
}
</script>
<body>
<button onclick="calculate(100)" value="click">Click</button>
</body>
</html>


0
"use strict";
const sentences = [
    'Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts.',
    'I never thought that would happen!',
    'This one?',
    'No, no, that one.',
    'Okay but please ensure your sentences are long enough to be split when longer than 100 characters, although some could be too short as well.',
    'This is also a random text like all others',
];
function lessThan100(arr) {
    const result = [];
    for (const item of arr) {
        if (item.textLength < 100 && item.used != true) {
            result.push(item);
        }
    }
    return result;
}
function perform(sentences) {
    let result = [];
    for (const sentence of sentences) {
        if (sentence.textLength > 100) {
            result.push(new Sentence(sentence.text.slice(0, 100), false, 100));
            const lengthLeft = sentence.textLength - 100;
            const less = lessThan100(sentences);
            let counter = lengthLeft;
            let required = [];
            for (const item of less) {
                if (counter + item.textLength <= 100) {
                    required.push(item);
                    item.setUsed();
                    counter += item.textLength;
                }
            }
            let str = sentence.text.slice(100, sentence.textLength);
            for (const r of required) {
                r.setUsed();
                str += r.text;
            }
            result.push(new Sentence(str, false, str.length));
        }
    }
    for (const item of sentences) {
        if (item.used == false && item.textLength <= 100) {
            result.push(item);
        }
    }
    result = result.sort((a, b) => {
        return b.textLength - a.textLength;
    });
    const resultLeft = result.filter(p => p.textLength < 100);
    if (resultLeft.length >= 2) {
        for (let i = 0; i < resultLeft.length; i++) {
            const sentence = resultLeft[i];
            resultLeft.splice(i, 1);
            const requiredLength = 100 - sentence.textLength;
            const less = lessThan100(resultLeft);
            let counter = sentence.textLength;
            let required = [];
            for (const item of less) {
                if (counter + item.textLength < 100) {
                    required.push(item);
                    item.setUsed();
                    counter += item.textLength;
                }
                else if (counter < 100) {
                    const requiredLength = 100 - counter;
                    required.push(new Sentence(item.text.slice(0, requiredLength), false, requiredLength));
                    item.text = item.text.slice(requiredLength, item.textLength);
                    item.textLength = item.text.length;
                }
            }
            let str = sentence.text;
            for (const r of required) {
                r.setUsed();
                str += r.text;
            }
            const newStr = new Sentence(str, false, str.length);
            const index = result.findIndex(p => p.id === sentence.id);
            result[index] = newStr;
        }
    }
    return result;
}
class Sentence {
    constructor(text, used, textLength) {
        this.id = ++Sentence.Ids;
        this.text = text;
        this.textLength = textLength;
        this.used = used;
    }
    setUsed() {
        this.used = true;
    }
}
Sentence.Ids = 0;
function ToFunctionUseful(arr) {
    const result = [];
    for (const item of arr) {
        result.push(new Sentence(item, false, item.length));
    }
    return result;
}
const result = perform(ToFunctionUseful(sentences));
console.log(result, result.map(p => p.textLength));
console.log(sentences.map(p => p.length));

这是使用typescript编译的


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接