如何使用grep（或其他LSB工具）计算.po文件中的空翻译？

Question

如何使用grep（或其他LSB工具）计算.po文件中的空翻译？

6

我可以通过像这样的命令在vim中执行空翻译的搜索：

/""\n\n

但我的任务是找到非翻译字符串的数量。有没有用标准工具来完成这个任务的想法，这些工具应该在每个Linux盒子上都有（请不要使用单独的软件包）。

这是一个包含2个已翻译和2个未翻译字符串（长变体和短变体）的.po文件示例。

msgid "translated string"
msgstr "some translation"

msgid "non-translated string"
msgstr ""

msgid ""
"Some long translated string which starts from new line "
"and can last for few lines"
msgstr ""
"Translation of some long string which starts from new line "
"and lasts for few lines"

msgid ""
"Some long NON-translated string which starts from new line "
"and can last for few lines"
msgstr ""

- Sergey P. aka azure

5个回答

7

我建议使用现有的gettext工具，而不是直接解析.po文件:

$ msggrep -v -T -e "." test.po 
msgid "non-translated string"
msgstr ""

msgid ""
"Some long NON-translated string which starts from new line and can last for "
"few lines"
msgstr ""

msggrep的标记如下：

-v 反转匹配
-T 应用下一个模式到 msgstr
-e 搜索模式

e.g. 显示任何不符合/./（即为空）的msgstr。 由于msggrep没有-c，在一行命令中计数为：

 msggrep -v -T -e "." test.po  | grep -c ^msgstr

自2002年1月起，msggrep 已经成为 gettext 包的一部分。虽然 LSB Core 即 ISO/IEC 23360-1:2006(E) 仅规定了 gettext 和 msgfmt 二进制文件，但我还没有见过没有它的系统，因此它应该能够满足您的要求。

- mr.spuratic

2

已经有一个关于 awk 的解决方案，以下是其他4种方法：

所有命令都已在您的示例和良好的 .po 文件上进行了测试。

使用 `sed`

sed -ne '/msgstr ""/{N;s/\n$//p}' <poFile | wc -l
2

解释：每次发现msgstr ""时，我会合并下一行，如果我能删除字符串的最后一个字符作为换行符 s/\n$//，我就会打印它们 p。最后统计行数。

仅限Bash

不使用除Bash之外的任何二进制文件：

total=0
while read line;do
    if [ "$line" == 'msgstr ""' ] ;then
        read line
        [ -z "$line" ] && ((total++))
      fi
  done <poFile
echo $total
2

解释：每次发现msgstr ""时，我会读下一行，如果为空，我就会增加计数器。

注：原文中的"than"应该为"then"。

mapfile -t line <poFile
count=0
for ((i=${#line[@]};i--;));do
    [ -z "${line[i]}" ] && [ "${line[i-1]}" == 'msgstr ""' ] && ((count++))
  done
echo $count
2

解释：将整个.po文件读入一个数组，然后浏览数组以查找前一个字段包含msgstr ""的空字段，递增计数器，然后打印。

Perl（命令行模式）

perl -ne '$t++if/^$/&&$l=~/msgstr\s""\s*$/;$l=$_;END{printf"%d\n",$t}' <poFile
2

解释：每当我发现一行为空并且前一行（存储在变量$l中）包含msgstr ""时，我就会增加计数器。

Dash（不是bash！）

count=0
while read line ; do
    [ "$line" = "" ] && [ "$prev" = 'msgstr ""' ] && true $((count=count+1))
    prev="$line"
  done <poFile
echo $count
2

基于Perl示例，这适用于bash和dash

- F. Hauri - Give Up GitHub

1

尝试：

grep -c '^""$'

它计算只有两个“。”的内容所在的行数。 编辑： 根据您的评论，我发现上述内容不符合您的需求。要执行多行匹配，您可以使用GNU grep按以下方式进行：

grep -Pzo '^msgstr ""\n\n' en.po | grep -c msgstr

这是使用GNU grep 2.14测试并发现可行的。但我不知道对于您来说，GNU grep是否足够标准。

第一个grep的说明： -P 激活Perl正则表达式扩展。 -z 用null替换行末的换行符，允许grep跟踪新行。 -o 打印“only-matching”，因为使用了-z，否则我们将打印整个文件。

第二个grep的说明： -c 计算匹配行数，在本例中为msgstr。这必须在单独的grep语句中使用，因为-c如果与-z一起使用，则会返回1。

- imp25

msgstr "" - 这是一个未翻译字符串的行。对于这样的grep调用，它不会计入其中。 - Sergey P. aka azure

grep -Pzo '^msgstr ""\n\n' language/locale/en_US/LC_MESSAGES/messages.po | grep -c msgstr 0文件中包含许多字符串，但没有翻译。例如： msgid "User Name" msgstr ""

msgid "Password"
msgstr ""

msgid "Forgot Password ?"
msgstr ""

- Sergey P. aka azure

使用以下命令可以在文件 language/locale/en_US/LC_MESSAGES/messages.po 中查找所有未翻译的字符串数量： grep -Pzo '^msgstr ""\n\n' language/locale/en_US/LC_MESSAGES/messages.po | grep -c msgstr 0该文件包含许多字符串，但没有翻译。例如： msgid "User Name" msgstr ""

msgid "Password"
msgstr ""

msgid "Forgot Password ?"
msgstr ""

- Sergey P. aka azure

是的，它可以生成一个字符串消息，指出文件包含与模式匹配的字符串。 - Sergey P. aka azure

你有其他的建议吗？ - Sergey P. aka azure

显示剩余3条评论

-1

grep -n ^msg your.po | grep -v '""' | uniq -D -f1

这段代码查找以msg开头的行，忽略空字符串("")，然后使用uniq查找重复行（忽略msgid/msgstr字段）。

CUPS文件的示例输出：

$ grep -n ^msg /usr/share/locale/es/cups_es.po | grep -v '""' | uniq -D -f1
3742:msgid "ParamCustominCutInterval"
3743:msgstr "ParamCustominCutInterval"
3745:msgid "ParamCustominTearInterval"
3746:msgstr "ParamCustominTearInterval"
3858:msgid "Quarto"
3859:msgstr "Quarto"
3967:msgid "Stylus Color Series"
3968:msgstr "Stylus Color Series"
3970:msgid "Stylus Photo Series"
3971:msgstr "Stylus Photo Series"
3973:msgid "Super A"
3974:msgstr "Super A"

- John Kugelman

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Steve · Accepted Answer

这里有一个使用 awk 的方法：

awk '$NF == "msgstr \"\"" { c++ } END { print c }' FS="\n" RS= file

结果：

解释：

将awk放入段落模式中。然后测试每个块的最后一行。如果最后一行与模式完全匹配，则计数。然后，在脚本结束时，打印出计数。如果您稍后决定要计算已翻译字符串的数量，只需将 == 更改为！=。 HTH。

从下面的评论中，处理包含空格的空行：

你需要使用正则表达式，例如：RS =“\n{2，} | \n（[ \t] * \ n）+ | \n $”（这可能可以简化）。但是，应该注意的是 RS作为正则表达式的能力是GNU awk扩展。其他awk将无法以某种方式处理包含多个字符的记录分隔符。幸运的是，上述文件格式看起来相当严格，因此不需要处理包含空格的行。

如果面对包含空格的分隔符，快速修复方法是调用sed：

< file sed 's/^ *$//' | awk ...

如何使用grep（或其他LSB工具）计算.po文件中的空翻译？

使用 sed

仅限Bash

Perl（命令行模式）

Dash（不是bash！）

使用 `sed`