Solving Machine Translation, one step at a time

Source Node: 795289

As a kid, I thought I’d grow up to be a mathematician or a physicist. I understood very early on that I wanted to study and do research, or even become a teacher, in one of those fields. I didn’t know what AI was. In fact, during the first years as an undergrad student in Computer Science, a lot of times I felt that I should switch to mathematics. I’m glad I didn’t.

My grandmother doesn’t really understand what my job is though, because to do so, you have to use the internet. If you don’t, and I tell you that, at Unbabel, we’re making computers do human actions automatically, you’d probably just sit there and stare blankly back at me.

In a way, I didn’t end up in a very different place than I had envisioned as a kid. I mean, this whole machine translation field started with Warren Weaver after the Second World War, after Allen Turing, a mathematician, cracked the Enigma code.

The idea is that we can treat language as a code. The difference is that codes are formal, unambiguous; and what makes translation so hard is precisely ambiguity.

The state of machine translation

Some people have some kind knowledge of what Unbabel does: we translate a text in a specific language into a different language. But others don’t even know what Artificial Intelligence is. Some might think all AI does is “robot things”, but that’s not it. What AI is doing is mimicking human behaviour, in some way, and in some things it’s even better than humans at it.

Let’s start with the basics: what do machine learning systems do? You present them with a source object, in this case a sentence, and you ask them to predict something, a target sentence.

The difficulty with translation is that there is no gold standard. A gold standard stands for the actual truth. If you’re trying to get a machine to detect images by asking “is this a cat or a dog?”, there is a gold truth because a specific image would be one or the other. In machine translation this doesn’t exist, because you can have 20 different translations that are equally good. It’s a much harder problem to begin with. What is a good translation and what is not? There is also the fact that language is highly ambiguous. Words can mean very different things in different contexts. And so the problem with translation is largely unresolved.

If you look deeper into machine translation, you’ll see it’s not that much better than it was a few years ago, in spite of what most people think. Previous outputs of statistical machine translation systems seemed very unnatural or robotic. Today they might sound more fluent, but they’re less adequate than the previous ones, that normally had the right content even though it could be harder to understand. Machine translations nowadays might fail catastrophically in terms of content, but still sound fluent. Overall it’s a better system.

Machine translation has come to a point where one can at least understand the gist of the text. It’s becoming more fluent, despite the models still being very basic and having little knowledge of language. They’re still working mostly on a kind of sentence per sentence level. So anyone who thinks that machine translation is solved, clearly hasn’t used it.

For Unbabel as a company, who is selling its multilingual support solutions to major companies that interact with thousands or millions of customers everyday, it poses a problem because most of the time, when you mention machine translation, people immediately think of the mistakes it makes. You can’t just make up stories to make it seem like machine translation is perfect, it is where it is at this point. It still calls for a human in the loop to give it that extra bit of quality.

In chat, for example, there is a person who’s actually talking to the other person, which means you can recover from errors much faster. If you say something that doesn’t make sense, the person on the other end might say “what? I didn’t get that”, and then you’ll retry the translation.

This basically means you’re being your own quality estimation, because, at the end of the day, what you want is a dialogue that works.

The importance of quality estimation

Quality estimation — what we use to evaluate a translation system’s quality without access to reference translations or human intervention — is the secret to machine translation. In fact, some people have claimed it could solve the problem of “which is the correct translation?”, because now we have a system in place that assesses how good or bad a translation is. It doesn’t necessarily mean a translation is the correct one, but it’s a correct translation.

But quality estimation suffers from all the same difficulties as machine translation, which means you can expect the same level of accuracy from it. The biggest problem with machine translation is, it always makes mistakes because language is very hard to grasp. Either due to models that are all too simple due to computation power or to the fact that any machine learning system will make mistakes, the best equities are at around 90 something percent. That might seem like a lot, but if you think about it, that means that one in every ten sentences is going to be wrong.

Quality estimation is trying to predict those wrong sentences, or at least trying to judge whether an error is critical or not. It’s basically going to allow us to use machine translation with a much higher degree of confidence.

At Unbabel, we’ve been dedicating a lot of our time to solving the quality estimation problem. The fundamental AI team is the one who’s been mostly focused on it, discovering new models. Then there’s a lot of work put in from applied AI and production, to answer questions like:

  • How does this run on the pipeline?
  • Is it scalable? Do we need to change the goal?
  • How does it work with our practical data?
  • How do you do the adaption of these models?

Since fundamental AI works mostly on generic domain data, applied AI has to pick it up and make sure it works on our reality of chat or tickets, if it works with differentiated tones or not. There’s the research, then there’s working its findings into the product.

We are firm believers in our quality estimation systems. We also believe in reproducible and collaborative research, which is why a few months back we built Open Kiwi — an open-source framework that implements the best Quality Estimation systems, making it really easy to experiment and iterate with these models under the same framework, as well as developing new models.

We were probably one of the first companies that started using quality estimation in production and we’ve been doing research on the topic for a very long time. This means we have better models and a better understanding of the problem than other companies or researchers working on quality estimation.

And the awards go to…

This is why I was very happy we regained our title of best global Machine Translation Quality Estimation system at the Conference for World Machine Translation earlier this year. Not only that, but we also won the competition on automatic post editing.

It was very important for us for two reasons. The first is the impact that quality estimation is having on our production pipeline, the return on investment we’re getting from it. And for that, it doesn’t really matter if we win this or any other competition.

But on the other hand, winning such prestigious awards means recognition for the Unbabel brand, which is essential to getting customers’ and investors’ attention. It’s also an important recognition for the AI team, whose work is sometimes hard to understand and give credit to. AI is very high risk, high reward. You can work for a year and get nowhere. For instance, all the work we did on our human quality estimation didn’t work, because we just didn’t have the right tools for that.

And so these awards are good for recognition, to increase awareness of the Unbabel name in business and in academia, but they’re good for morale as well. Unbabel is a purely AI company. We’re not just using AI, we’re actually building and discovering AI that doesn’t exist yet. And to be publicly acknowledged for that means the world to me. I think my 9-year old, wannabe mathematician self would be proud.

Source: https://unbabel.com/blog/best-machine-translation-quality-estimation/

Time Stamp:

More from Unbabel