I think it was Steve Jobs who said for connecting the dots we should look backwards, and not forward. Regarding machine translation I think Google would improve much more its translator if it looked backwards. Last month Google launched their update for Google Translate, it is amazing technology resulting from machine learning area. I tested almost all the languages thinking that the translations would get better, but they didn’t. This is why Google cannot say it has now a “more powerful translation tool”. A powerful translation tool must translate better than Google Translate (GT).
The new update is based on machine learning and artificial intelligence. The system is still based on linguistic corpus and statistics. Google feeds the system with words from all languages the translator works with (the number now is more or less 200 billions) and applies the statistics; from high statistical matches among the languages come the translations we are used to see on Google Translate. The system does not translate, but learns from existing human translations, Google use a lot of UN Documents for it.
Now, according with Jeff Dean (the Google legend behind Google Brain), Google is searching “a better form of real-time translation” with deep learning and AI, this new model would use “only neural-nets to do end-to-end languages translation”.
Would a model like that build a better machine translation? I think no.
Let me explain like this:
A machine translation cannot be good enough without using a grammar.
(a) a translation is a technique
(b) the syntactic-semantic relation is the core of any translation
The translator (human or machine) operates two processes:
(a) describes a source language (input L)
(b) generates a target language (output L) [no native speaker also describes output L]
The translator must generate output L with grammaticality.
[Grammaticality = understandable by native speaker]
Combining (a) and (b) to H1 and H2 is possible to see that a translator (machine or human) needs a grammar as a device for its/his technique.
Let us give one step forward.
In the GT video says by statistics the machine discovers grammar by itself from every text in every language analyzed. But this “grammar” is based on match, and therefore, it is very weak to describe or to generate languages.
Linguistic corpus is fundamental to any translation process (human or machine).
Accepting that every machine translation is a written machine translation, even using “speech to speech”, we have:
A machine can make a standard translation if and only if it has the assistance of a grammar device working like a filter.
Consider two languages, English E and Czech C, for translating one simple sentence S (“you are beautiful”), from E to C, testing in the Google Translate app:
You are beautiful < E (automatically gives) C > Jste krásná
But the translation S=E>C can be also:
C > Jste krásný / Jsi krásná – Jsi krásný) / Jsou krásné
For C > we have Second person plural feminine, masculine (formal); Second person singular feminine, masculine (informal); Third person plural feminine/masculine. In Google’s term “pattern” we have: one pattern E to six patterns C (with E using three elements and C using two). We have a gradation in the adjective ‘beautiful’, at E>C, and it can be translated for “nádhera” and “krásná”. The S in E does not have gender, whereas in C we can have two (f, m). With the S in E suppose the personal pronoun has a mark of emphasis, then the S in C will have more five possible “patterns” (ty, vy, oni/ony/ona). This if we accept the S without context, once in context the analysis would be even more complex for a machine. To describe or to generate this S we can see: machine translations need a grammar device working as a filter – something that a human native speaker has naturally. And if we accept the thesis that native speakers cannot “classify [naturally] any sentence over the vocabulary of their language consistently as either grammatical or ungrammatical” (R.J. Matthews, Are the grammatical sentences of a language a recursive set?, 1979, p. 211), Google Translate will need even more a grammar device working like a filter.
From now on my main subject is this grammar device. How should it be?
PS: Then, I finally re-start to write here after one year.