Czech-developed computer translator CUBBITT “reaching quality of human professionals”

Illustrative photo: Štěpánka Budková

A team of scientists from Charles University and the University of Oxford have developed a new computer translation programme called CUBBITT. After testing how the programme compared with its human counterparts, the team published a paper in the prestigious Nature Communications journal claiming that it challenges the long-held view that computers can’t translate as well as humans. I asked the main author of the study, Martin Popel from Charles University’s Faculty of Mathematics and Physics, why CUBBITT is so significant.

“Our study was the first which compared the quality of human professional proficiency in translating from English to Czech and compared it to the quality of machine translation. We tried to make this comparison fair, so we compared the translation quality of whole news articles.

“We showed that the adequacy - that means the preserving of text meaning - is significantly higher than that of the professional agency. At the same time, the fluency of the machine translation was slightly lower.

“So there are multiple dimensions of translation quality and, in some, the machine translator is already better on average.”

I understand that CUBBITT works on the principle of machine learning. That is nothing strictly new, so what is different about it?

Martin Popel,  photo: archive of Charles University

“The modern type of machine learning is deep learning on neural-based methods, with many parameters which need to train on powerful hardware for several weeks. However, even this is not new. These techniques are used today by Microsoft’s translator, Google Translate, or other online services. So we had to have the best technology, but also some special know-how.

“Our innovation is a special way of balancing the training sources.

We use both Czech and English authentic training data, but we also use monolingual Czech data, which we translate into English. We train the English to Czech translation on this so-called ‘synthetic’ data, but also balance the translation with the Czech data.”

How much do you expect CUBBITT, or systems based on it will affect daily life and perhaps certain professions?

“I am not exactly sure. It is already affecting our lives. The quality has improved a lot over the past years as far as publicly available services are concerned and it is still improving further.

“I should stress that we focus on the quality of news translation, not fiction or poetry. I think these fields are definitely more difficult to translate by machines and we still need humans. That is invaluable.

“However, for some purposes where human translators were needed before, we can now substitute humans by machine translators with very good results.”

Are you working on a project to make CUBBITT not just a test program but also a commercial programme?

“We are already offering [CUBBITT] to the public for non-commercial purposes for free on our website, but we are also working with several companies and looking for a commercial use.”