Google Translate needs a grammar device.

Linguistics,Science -- 12 / February / 2015

I think it was Steve Jobs who said for connecting the dots we should look backwards, and not forward. Regarding machine translation I think Google would improve much more its translator if it looked backwards. Last month Google launched their update for Google Translate, it is amazing technology resulting from machine learning area. I tested almost all the languages thinking that the translations would get better, but they didn’t. This is why Google cannot say it has now a “more powerful translation tool”. A powerful translation tool must translate better than Google Translate (GT).


The new update is based on machine learning and artificial intelligence. The system is still based on linguistic corpus and statistics. Google feeds the system with words from all languages the translator works with (the number now is more or less 200 billions) and applies the statistics; from high statistical matches among the languages come the translations we are used to see on Google Translate. The system does not translate, but learns from existing human translations, Google use a lot of UN Documents for it.


Now, according with Jeff Dean (the Google legend behind Google Brain), Google is searching “a better form of real-time translation” with deep learning and AI, this new model would use “only neural-nets to do end-to-end languages translation”.


Would a model like that build a better machine translation? I think no.


Let me explain like this:



A machine translation cannot be good enough without using a grammar.



Assuming that:

(a) a translation is a technique

(b) the syntactic-semantic relation is the core of any translation


We have:


Hypothesis 1.

The translator (human or machine) operates two processes:

(a) describes a source language (input L)

(b) generates a target language (output L) [no native speaker also describes output L]


Hypothesis 2.

The translator must generate output L with grammaticality.

[Grammaticality = understandable by native speaker]


Combining (a) and (b) to H1 and H2 is possible to see that a translator (machine or human) needs a grammar as a device for its/his technique.


Let us give one step forward.


In the GT video says by statistics the machine discovers grammar by itself from every text in every language analyzed. But this “grammar” is based on match, and therefore, it is very weak to describe or to generate languages.



Linguistic corpus is fundamental to any translation process (human or machine).


Accepting that every machine translation is a written machine translation, even using “speech to speech”, we have:



A machine can make a standard translation if and only if it has the assistance of a grammar device working like a filter.



Consider two languages, English E and Czech C, for translating one simple sentence S (“you are beautiful”), from E to C, testing in the Google Translate app:

You are beautiful < E (automatically gives) C > Jste krásná

But the translation S=E>C can be also:

C > Jste krásný / Jsi krásná – Jsi krásný) / Jsou krásné

For C > we have Second person plural feminine, masculine (formal); Second person singular feminine, masculine (informal); Third person plural feminine/masculine. In Google’s term “pattern” we have: one pattern E to six patterns C (with E using three elements and C using two). We have a gradation in the adjective ‘beautiful’, at E>C, and it can be translated for “nádhera” and “krásná”. The S in E does not have gender, whereas in C we can have two (f, m). With the S in E suppose the personal pronoun has a mark of emphasis, then the S in C will have more five possible “patterns” (ty, vy, oni/ony/ona). This if we accept the S without context, once in context the analysis would be even more complex for a machine. To describe or to generate this S we can see: machine translations need a grammar device working as a filter – something that a human native speaker has naturally. And if we accept the thesis that native speakers cannot “classify [naturally] any sentence over the vocabulary of their language consistently as either grammatical or ungrammatical” (R.J. Matthews, Are the grammatical sentences of a language a recursive set?, 1979, p. 211), Google Translate will need even more a grammar device working like a filter.


From now on my main subject is this grammar device. How should it be?


PS: Then, I finally re-start to write here after one year.

3 respostas so far

3 respostas para “Google Translate needs a grammar device.”

  1. Anna Maria Wladyka says:

    Dear Leonardo, I found your contribution via LinkedIn group. This is definitely some food for thought and I read it with great curiosity. I am also in course of writing my PhD thesis on automation in translation processes. I would like to quote some of your thoughts if you don’t mind. I was wondering what is your opinion on Noam Chomsky’s claim that it is possible to make numerous sentences which are perfectly grammatical (Colourless green ideas sleep furiously), yet it does not mean that they are semantically understandable. How would you relate to that with respect to the idea of adding a grammar device to Google? Kindest regards, Anna.

  2. Leonardo says:

    Hi Anna. Thank you for your comment. Sorry my delay in answering you. About the quotation, I really don’t mind, on the contrary, it is nice to know it will be useful; I’m writing a version more complete of the theorem and I will put in pdf in the same post. And concerning Chomsky, I am critical of part of his theory and I have been reading nice arguments “against” it; my thesis will deal with his theory, I think that claim can be possible but not in the way Chomsky has thought, so there is a straight relation between what I call “grammar device” and some part of Chomsky’s theory, but not in total agreement. I will put tomorrow a post here about an anecdote I found out about Chomsky’s theory, it’s a light way to start to write about the way I will deal with it. Are you a computer scientist, a linguist? Nice contact, thanks.

  3. Anna M W says:

    Thank you for your answer, Leonardo! In fact, I noticed it only thanks to a Google search ;) For a future reference, you can use my e-mail address directly. I am a linguist interested in computational linguistics. I will follow your posts because I am interested in what kind of model you would like to propose. Cheers!

Deixe uma resposta