import time
start = time.time()
import pandas as pd
from deep_translator import GoogleTranslator
data = pd.read_excel(r"latestdata.xlsx")
translatedata = data['column']. fillna('novalue')
list = []
for i in translatedata:
finaldata = GoogleTranslator(source='auto', target='english').translate(i)
print(finaldata)
list.append(finaldata)
df = pd.DataFrame(list, columns=['Translated_values'])
df.to_csv(r"jobdone.csv", sep= ';')
end = time.time()
print(f"Runtime of the program is {end - start}")
我有220k个点的数据,并尝试翻译一列数据。起初,我尝试使用池方法并行程序,但出现了一个错误,即我不能同时访问API多次。我的问题是,如果有其他方法可以提高我现有代码的性能。
# 4066.826668739319 with just 10000 data all together.
# 3809.4675991535187 computation time when I run in 2 batch's of 5000