我知道的可能方法有:
- Google Translate API - Bing Translator API
问题在于,所有这些服务都有文本长度限制、调用次数限制等,这使得它们使用起来不太方便。
你能推荐哪些服务/方法在这种情况下使用吗?
将您的大文本分解为标记化字符串,然后通过循环将每个标记传递给翻译器。将翻译后的输出存储在数组中,一旦所有标记都被翻译并存储在数组中,将它们重新组合,您将拥有完全翻译的文档。
只是为了证明一点,我把这个东西凑在一起:) 它还有些粗糙,但它可以处理很多文本,并且它的翻译准确度与谷歌相当,因为它使用谷歌API。我用这段代码处理了苹果公司整个2005年的SEC 10-K备案,只需点击一个按钮(大约需要45分钟)。
结果基本上与您逐句复制和粘贴到Google翻译中得到的结果相同。它不是完美的(结束标点符号不准确,我没有按行写入文本文件),但它确实显示了一个概念证明。如果您进一步使用正则表达式,它可能会有更好的标点符号。
Imports System.IO
Imports System.Text.RegularExpressions
Public Class Form1
Dim file As New String("Translate Me.txt")
Dim lineCount As Integer = countLines()
Private Function countLines()
If IO.File.Exists(file) Then
Dim reader As New StreamReader(file)
Dim lineCount As Integer = Split(reader.ReadToEnd.Trim(), Environment.NewLine).Length
reader.Close()
Return lineCount
Else
MsgBox(file + " cannot be found anywhere!", 0, "Oops!")
End If
Return 1
End Function
Private Sub translateText()
Dim lineLoop As Integer = 0
Dim currentLine As String
Dim currentLineSplit() As String
Dim input1 As New StreamReader(file)
Dim input2 As New StreamReader(file)
Dim filePunctuation As Integer = 1
Dim linePunctuation As Integer = 1
Dim delimiters(3) As Char
delimiters(0) = "."
delimiters(1) = "!"
delimiters(2) = "?"
Dim entireFile As String
entireFile = (input1.ReadToEnd)
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "." Then filePunctuation += 1
Next
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "!" Then filePunctuation += 1
Next
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "?" Then filePunctuation += 1
Next
Dim sentenceArraySize = filePunctuation + lineCount
Dim sentenceArrayCount = 0
Dim sentence(sentenceArraySize) As String
Dim sentenceLoop As Integer
While lineLoop < lineCount
linePunctuation = 1
currentLine = (input2.ReadLine)
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "." Then linePunctuation += 1
Next
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "!" Then linePunctuation += 1
Next
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "?" Then linePunctuation += 1
Next
currentLineSplit = currentLine.Split(delimiters)
sentenceLoop = 0
While linePunctuation > 0
Try
Dim trans As New Google.API.Translate.TranslateClient("")
sentence(sentenceArrayCount) = trans.Translate(currentLineSplit(sentenceLoop), Google.API.Translate.Language.English, Google.API.Translate.Language.German, Google.API.Translate.TranslateFormat.Text)
sentenceLoop += 1
linePunctuation -= 1
sentenceArrayCount += 1
Catch ex As Exception
sentenceLoop += 1
linePunctuation -= 1
End Try
End While
lineLoop += 1
End While
Dim newFile As New String("Translated Text.txt")
Dim outputLoopCount As Integer = 0
Using output As StreamWriter = New StreamWriter(newFile)
While outputLoopCount < sentenceArraySize
output.Write(sentence(outputLoopCount) + ". ")
outputLoopCount += 1
End While
End Using
input1.Close()
input2.Close()
End Sub
Private Sub translateButton_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles translateButton.Click
translateText()
End Sub
End Class
免责声明:虽然我确实认为令牌化作为翻译手段有些可疑,但是像后来 ubiquibacon所示 那样按句子拆分可能会产生符合您要求的结果。
我建议他通过将 30 多行字符串处理减少到他在 另一个问题 中请求的一行正则表达式来改进他的代码,但建议并未得到好评。
这里提供了使用 VB.NET 和 C# 中的 Google API for .NET 实现。
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using Google.API.Translate;
namespace TokenizingTranslatorCS
{
internal class Program
{
private static readonly TranslateClient Client =
new TranslateClient("http://code.google.com/p/google-api-for-dotnet/");
private static void Main(string[] args)
{
Language originalLanguage = Language.English;
Language targetLanguage = Language.German;
string filename = args[0];
StringBuilder output = new StringBuilder();
string[] input = File.ReadAllLines(filename);
foreach (string line in input)
{
List<string> translatedSentences = new List<string>();
string[] sentences = Regex.Split(line, "\\b(?<sentence>.*?[\\.!?](?:\\s|$))");
foreach (string sentence in sentences)
{
string sentenceToTranslate = sentence.Trim();
if (!string.IsNullOrEmpty(sentenceToTranslate))
{
translatedSentences.Add(TranslateSentence(sentence, originalLanguage, targetLanguage));
}
}
output.AppendLine(string.Format("{0}{1}", string.Join(" ", translatedSentences.ToArray()),
Environment.NewLine));
}
Console.WriteLine("Translated:{0}{1}{0}", Environment.NewLine, string.Join(Environment.NewLine, input));
Console.WriteLine("To:{0}{1}{0}", Environment.NewLine, output);
Console.WriteLine("{0}Press any key{0}", Environment.NewLine);
Console.ReadKey();
}
private static string TranslateSentence(string sentence, Language originalLanguage, Language targetLanguage)
{
string translatedSentence = Client.Translate(sentence, originalLanguage, targetLanguage);
return translatedSentence;
}
}
}
Imports System.Text.RegularExpressions
Imports System.IO
Imports System.Text
Imports Google.API.Translate
Module Module1
Private Client As TranslateClient = New TranslateClient("http://code.google.com/p/google-api-for-dotnet/")
Sub Main(ByVal args As String())
Dim originalLanguage As Language = Language.English
Dim targetLanguage As Language = Language.German
Dim filename As String = args(0)
Dim output As New StringBuilder
Dim input As String() = File.ReadAllLines(filename)
For Each line As String In input
Dim translatedSentences As New List(Of String)
Dim sentences As String() = Regex.Split(line, "\b(?<sentence>.*?[\.!?](?:\s|$))")
For Each sentence As String In sentences
Dim sentenceToTranslate As String = sentence.Trim
If Not String.IsNullOrEmpty(sentenceToTranslate) Then
translatedSentences.Add(TranslateSentence(sentence, originalLanguage, targetLanguage))
End If
Next
output.AppendLine(String.Format("{0}{1}", String.Join(" ", translatedSentences.ToArray), Environment.NewLine))
Next
Console.WriteLine("Translated:{0}{1}{0}", Environment.NewLine, String.Join(Environment.NewLine, input))
Console.WriteLine("To:{0}{1}{0}", Environment.NewLine, output)
Console.WriteLine("{0}Press any key{0}", Environment.NewLine)
Console.ReadKey()
End Sub
Private Function TranslateSentence(ByVal sentence As String, ByVal originalLanguage As Language, ByVal targetLanguage As Language) As String
Dim translatedSentence As String = Client.Translate(sentence, originalLanguage, targetLanguage)
Return translatedSentence
End Function
End Module
为了证明一点,我把这个东西拼凑在一起 :) 它有些粗糙,但它可以处理大量的文本,并且它的翻译准确度与谷歌相当,因为它使用谷歌API。我用这段代码处理了苹果公司整个2005年的SEC 10-K备案申请,只需点击一个按钮(大约需要45分钟)。结果基本上与您逐句复制和粘贴到Google翻译器中所得到的结果相同。它并不完美(结束标点符号不准确,我没有逐行写入文本文件),但它确实证明了概念。如果您进一步使用正则表达式,它可能会有更好的标点符号。
我们使用了http://www.berlitz.co.uk/translation/。
我们会向他们发送一个包含英文内容和所需语言列表的数据库文件,然后他们会使用各种双语人员提供翻译。他们还使用配音演员为我们的电话界面提供WAV文件。
这显然不如自动翻译快,也不是免费的,但我认为这种服务是确保您的翻译有意义的唯一方法。
这很简单,有几种方法:
以下是一个例子(第二种方法):
方法:
private String TranslateTextEnglishSpanish(String textToTranslate)
{
HttpWebRequest http = WebRequest.Create("http://translate.google.com/") as HttpWebRequest;
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";
http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2 (.NET CLR 3.5.30729)";
http.Referer = "http://translate.google.com/";
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(String.Format("js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&text={0}+&file=&sl=en&tl=es", textToTranslate);
http.ContentLength = dataBytes.Length;
using (Stream postStream = http.GetRequestStream())
{
postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
if (httpResponse != null)
{
using (StreamReader reader = new StreamReader(httpResponse.GetResponseStream()))
{
//* Return translated Text
return reader.ReadToEnd();
}
}
return "";
}
方法调用:
String translatedText = TranslateTextEnglishSpanish("hello world");
结果:
translatedText == "hola mundo";
您只需要获取所有语言的参数,并按顺序使用它们以获取所需的翻译。
您可以使用Firefox浏览器的Live Http Headers插件来获取这些值。