Google Translate needs a grammar device.

Publicado sob a(s) categoria(s) Linguistics,Science em 12 de February de 2015

I think it was Steve Jobs who said for connecting the dots we should look backwards, and not forward. Regarding machine translation I think Google would improve much more its translator if it looked backwards. Last month Google launched their update for Google Translate, it is amazing technology resulting from machine learning area. I tested almost all the languages thinking that the translations would get better, but they didn’t. This is why Google cannot say it has now a “more powerful translation tool”. A powerful translation tool must translate better than Google Translate (GT).


The new update is based on machine learning and artificial intelligence. The system is still based on linguistic corpus and statistics. Google feeds the system with words from all languages the translator works with (the number now is more or less 200 billions) and applies the statistics; from high statistical matches among the languages come the translations we are used to see on Google Translate. The system does not translate, but learns from existing human translations, Google use a lot of UN Documents for it.


Now, according with Jeff Dean (the Google legend behind Google Brain), Google is searching “a better form of real-time translation” with deep learning and AI, this new model would use “only neural-nets to do end-to-end languages translation”.


Would a model like that build a better machine translation? I think no.


Let me explain like this:



A machine translation cannot be good enough without using a grammar.



Assuming that:

(a) a translation is a technique

(b) the syntactic-semantic relation is the core of any translation


We have:


Hypothesis 1.

The translator (human or machine) operates two processes:

(a) describes a source language (input L)

(b) generates a target language (output L) [no native speaker also describes output L]


Hypothesis 2.

The translator must generate output L with grammaticality.

[Grammaticality = understandable by native speaker]


Combining (a) and (b) to H1 and H2 is possible to see that a translator (machine or human) needs a grammar as a device for its/his technique.


Let us give one step forward.


In the GT video says by statistics the machine discovers grammar by itself from every text in every language analyzed. But this “grammar” is based on match, and therefore, it is very weak to describe or to generate languages.



Linguistic corpus is fundamental to any translation process (human or machine).


Accepting that every machine translation is a written machine translation, even using “speech to speech”, we have:



A machine can make a standard translation if and only if it has the assistance of a grammar device working like a filter.



Consider two languages, English E and Czech C, for translating one simple sentence S (“you are beautiful”), from E to C, testing in the Google Translate app:

You are beautiful < E (automatically gives) C > Jste krásná

But the translation S=E>C can be also:

C > Jste krásný / Jsi krásná – Jsi krásný) / Jsou krásné

For C > we have Second person plural feminine, masculine (formal); Second person singular feminine, masculine (informal); Third person plural feminine/masculine. In Google’s term “pattern” we have: one pattern E to six patterns C (with E using three elements and C using two). We have a gradation in the adjective ‘beautiful’, at E>C, and it can be translated for “nádhera” and “krásná”. The S in E does not have gender, whereas in C we can have two (f, m). With the S in E suppose the personal pronoun has a mark of emphasis, then the S in C will have more five possible “patterns” (ty, vy, oni/ony/ona). This if we accept the S without context, once in context the analysis would be even more complex for a machine. To describe or to generate this S we can see: machine translations need a grammar device working as a filter – something that a human native speaker has naturally. And if we accept the thesis that native speakers cannot “classify [naturally] any sentence over the vocabulary of their language consistently as either grammatical or ungrammatical” (R.J. Matthews, Are the grammatical sentences of a language a recursive set?, 1979, p. 211), Google Translate will need even more a grammar device working like a filter.


From now on my main subject is this grammar device. How should it be?


PS: Then, I finally re-start to write here after one year.

Nenhuma resposta

Beyond cultural layer. (in Proti šedi)

Publicado sob a(s) categoria(s) Culture,Philosophy em 11 de February de 2014

This is the article that I wrote to Czech web cultural magazine Proti šedi (there in Czech).


Let’s think about ourselves like humans made by layers. It is the way the Yoga or the Vedānta knowledge traditions analyze we human beings, for example. It seems to me that such analysis is a good way to understand even other kinds of layers we have, like superficial ones. A cultural layer, in this case. How much we are influenced by all the cultural stuffs that we are exposed to? And how much of them are really valuable? To be a foreigner can give us good answers to these questions, because when we are a little bit out from our zone we can see what is genuine and what is not. Not about the others but about ourselves. This “way to see” can be learned.


To be a foreigner in this case is only a way to call someone who can observe what is the reality, which is not so easy. Of course our cultural layer is part of our reality, but it is also possible to say that this is the last layer we have. It means that many times we label ourselves by this cultural layer. But who are we indeed? Can we say we are some behaviors and habits coming from this layer? It is just we move from a place we are used to living and dealing with some behaviors and habits, which we realize an amazing difference between who we really are and all these behaviors and habits. One habit or one way we behave can be changed according to the culture we adopt, even if we never leave the place of our original culture. But not everything can be changed.


When we are in this position to see what can change and what cannot, when we are in front of something greater than us and our conscience begs us for choosing do not get rid of the responsibility we have in knowing what is right and what is not, it is exactly the moment when we realize what really matters; or in one different explanation: what is accidental and what is essential. Every cultural influence we have are accidental, it means we can change them according to our choice or the place or time we are, and if it is so, the cultural layer is not enough to say who we are. On the other hand, there is the character, which is individual and do not depend completely on culture, we can be in a specific culture, with specifics habits and behaviors, but even then to have a kind of character that has nothing to do with the culture we are, it seems, then, we can say that the character is more essential, because it depends on learning to be excellent. It is the character, determined by reason, which many times will choose the kind of culture (habits and behaviors) we want for our life or want to live in. Then, while we are introducing to each other putting many cultural labels on us to show who we are, it is exactly the character, which cannot be seen so easy in the first moment, that will show who we really are. In the end, it is not the fact to be a Brazilian or a Czech, a punk, a rock’n roll guy or a classic one, or to have that taste or this, travel to many places or to live in one or other way that determines who we are, it is more about what stays, the character. It is like an axle that is fixed (character, essential) for the wheel turns without stopping (cultural influences, accidental). And a wheel with an unfixed axle doesn’t turn well. Each day more it seems to me that the character has nothing to do with the place we came, how we were born, the cultures we adopt, a social or physical environments, heredity or anything else, but with our responsibility to become a better person, it is about the way to become excellent.


Post Script:


It is from the character that we start to do one of the important practices in the Yoga tradition, called svādhyāya, a self-meditation (self-study) about who we really are.

Nenhuma resposta

Sanskrit Phonetics and Devanāgarī literacy (a course).

Publicado sob a(s) categoria(s) Linguistics,Sanskrit em 23 de September de 2013


I would like to invite you to the course on Sanskrit I will teach in this school year at Charles University, Prague. The course is an opportunity for everyone who wants to learn this language. The examples will come mostly from Yoga ancient texts and the course is very useful for those who practice Yoga or have any interest in the Sanskrit culture. It is also an opportunity to see and deal with a very scientific alphabet, in the same way which all the ancient grammars and scholars have been teaching it since thousand years. The course will be taught in English.


The course will start at October 3rd. Every Thursday at 9:10 am, in the room 102.

The address:

Charles University, Institute of Comparative Linguistics

Celetná 20

116 42 Prague 1


Here is the course description:


The aim of this course is to learn the Sanskrit phonetics in its two modalities, Vedic and Classic, by three traditional works: Taittirīya-prātiśākhya (Vedic), Aṣṭādhyāyī of Pāṇini (V B.C., Classic) and Harinamāmṛta Vyākaraṇa of Jīva Gosvāmi (XIV A.C., Classic). The course will make you learn the traditional alphabet as well. By the end of the course you will be able to recognize all the phonemes, write and pronounce exactly the Sanskrit words in the devanāgarī script.


Course plan:

1)    Sounds of Vedic Sanskrit and differences between Vedic and Classic.

2)    The characteristics and differences among the three works.

3)    Simple vowels, in its script (devanāgarī).

4)    Diphthongs.

5)    Consonants, first group.

6)    Consonants, second group.

7)    Consonants, third group.

8)    Consonants, fourth group.

9)    Consonants, fifth group.

10) The Pāṇini’s alphabet.

11)  Peculiarities of the Sanskrit sounds (Phonology and Indo-European studies).

12)  Some euphonic combinations (sandhi technician).



Written activities with one written test at the end.



MACDONELL, A.A. A Vedic Grammar for Students. Delhi, D.K.Printworld’s, 2005 (1916).

DAHIYA, Yajanveer. Panini as a Linguist: ideas and patterns. Delhi, Eastern Book, 1995.

WHITNEY, W.D. The Taittiriya Pratisakhya. Delhi, Motilal Banarsidass, 1973 (1871).

VASU, S.Ch. The Ashtadhyayi of Panini. Delhi, Motilal Banarsidass, 1997 (1891).

GOSVAMI, Jiva. Tr. Swami Purushatraya and Yadu Dasa. Harinamamrta Vyakarana. (non-published).

3 respostas

Samizdat (second text, Protišedi).

Publicado sob a(s) categoria(s) Culture,Literature em 22 de July de 2013


Here is my second text-post at Czech cultural magazine Protišedi:


In English:


In Czech:


Hope you enjoy the reading.



Nenhuma resposta

First post at Protišedi.

Publicado sob a(s) categoria(s) Notices em 21 de June de 2013


Here is my first text at Protišedi, a Czech cultural magazine:

In English

In Czech

Hned se vrátím…


Nenhuma resposta

Some words about karma.

Publicado sob a(s) categoria(s) Philosophy,Vedanta em 02 de June de 2013


The people think that karma is metaphysical, but it is not. Karma is a law of nature, like the law of gravity. Karma is part of the physics, of this world. The Vedānta says that you are a soul, not a body. You have a body. The soul (ātman, jīva) is pure in essence; this means that the soul cannot be influenced by anything from nature. In this way, to say that you are suffering the past karma is like to say that this karma has influenced you (a soul). Karma means “action”, if we think about it like a law, we can say: for each action there is a reaction. In this world every action has a reaction and this is not good or bad, it is nature. Karma is not a fate, a fatality. Forget ‘reincarnation’ to talk about karma. First, because the Veda does not speak clearly about it; second because there is not the sense of “re-something” at Sanskrit language. If karma has nothing to do with metaphysics, it means that does not exist “past karma” or “future karma”. What exists is an “ancestry karma”, which is exactly what makes you similar with your parents or family (physical features). When you are born the body that you accept already has a family, with all the characteristics made by the action-reaction law. Karma cannot jail you in this world, because you (soul) are free in essence. Some people say that karma acts in the conscience. It is wrong. Soul and conscience is the same thing (to the Vedānta, at least). So stop to think karma as a metaphysical law. The Vedānta does not put it in this way. We could have a calculus to represent the karma. And could be a simple physical law. If you look at the Bhagavad Gīta (3.5), you will see it written almost like a physical law, saying that karma has your original source in nature (prakṛtijaiḥ).


Nenhuma resposta

A jewel.

Publicado sob a(s) categoria(s) Sanskrit,Science em 26 de May de 2013

From Bhagavad Gīta (7.7).


मयि सर्वम् इदं प्रोतं सूत्रे मणिगणा इव

mayi sarvam idaṃ protaṃ sūtre maṇigaṇā iva

On me all this universe is strung like clusters of jewels on a string.

It’s a beautiful line of poetry. A metaphor, of course. Lives inside life.

But perhaps we can even see the string theory on it.

Nenhuma resposta

Palavras para 2013.

Publicado sob a(s) categoria(s) Philosophy,Sanskrit em 20 de December de 2012


श्रद्धावाँल् लभते ज्ञानं

śraddhāvām̐l labhate jñānaṃ

Possuindo fé, ele alcança o conhecimento.



Esta é a primeira oração (aqui, no sentido gramatical) da estrofe 39 do capítulo 4 do Bhagavad Gītā.


Embora ela já tenha sido um farol para mim ao longo de 2012, em 2013 este farol será ainda mais luminoso.

Em 2013 apresento à banca a minha dissertação de mestrado e sigo, se Deus quiser, para um doutorado (em Praga).

Estarei totalmente imerso na obra de Pāṇini, este gênio que seduziu minha mente desde que comecei a estudá-lo, em 2000.


De lá para cá são 12 anos em que não passo um único dia sem revisar mentalmente um sūtra de sua gramática (sim, sou um fanático, rs), isso fez com que ao longo destes anos eu saiba de cor 1150 sūtra(s).

Como costumo dizer: uma bela brincadeira de adulto.


É o que desejo aos demais também, porque a única coisa que eu sei (cada vez mais), é que só o Conhecimento salva.

E sem fé (sobretudo em Deus) o Conhecimento não pode ser alcançado.

Um Feliz Natal e um 2013 de grandes realizações a todos!


4 respostas

Brasil, ainda uma viela.

Publicado sob a(s) categoria(s) Literature em 19 de October de 2012


(Minha resposta à novela Avenida Brasil, que termina hoje)


Não ler Machado

é machadada

numa cabeça

que vê novela

e se esparrama

pela viela

querendo ser

uma avenida,

sai Capitu

entra Carminha,


baixo escalão,

esta tevê


ao imaginário

de um povo heróico

sem retumbância,

um emergente,

mas sem visão.


Silêncio, nada.

Barulho, tudo.

E vemos graça

onde a pirraça

de um mensalão

onera um povo

sem escrutínio,

uma avenida

sem pavimento,


de asfalto velho.

José Dirceu


o barbudinho

sem julgamento,

com este povo

quase jumento,

relincha agora

como jamais

relinchou antes,

e segue a rua

até Brasília,




por este barro

que faz a lama

e apaga a chama

da geração

de “jovens” loucos

que não lêem livros

e bradam forte:


3 respostas

Um jesuíta compreende Pāṇini.

Publicado sob a(s) categoria(s) Linguistics,Sanskrit em 23 de August de 2012

Eis um post bem específico, mas não menos interessante – creio.


O estudo da linguagem, até o século XVIII, tinha como modelo as gramáticas do latim e do grego, e as gramáticas vernáculas eram escritas a partir delas. Isso só começa a mudar com as missões jesuíticas no Oriente, uma época muito fértil na descoberta de estudos e novas formas de se observar uma língua. A carta escrita pelo padre jesuíta Jean-François Pons, em 1740, enviada ao amigo du Halde, em Paris, é um excelente exemplo da época áurea destas descobertas. Nela encontramos a primeira menção a Pāṇini (Pania) e uma descrição exata da gramática tradicional do sânscrito; a esta altura, Pons já tinha começado, em 1738, a escrever sua gramática de sânscrito em latim.


Esta gramática foi estudada, mais tarde, por grandes filólogos do século XIX, tais como A.L. Chézy (fundador do curso de sânscrito no Collège de France) e Franz Bopp.


Eu traduzo e comento esta carta na minha dissertação de mestrado, no primeiro capítulo.


Aqui, coloco o terceiro parágrafo adaptado, o mais gramatical deles:


Está publicada em Lettres édifiantes Et Curieuses, écrites Des Missions Étrangères, par quelques Missionaires de la Compagnie de Jesus, XXVI, II, pag. 39.


Il est étonnant que l’esprit humain ait pu atteindre à la perfection de l’art qui éclate dans ces grammaires. Les auteurs y ont réduit par l’analyse, la plus riche langue du monde, à un petit nombre d’élémens primitifs, qu’on peut regarder comme le caput mortuum de la langue. Ces élémens ne sont par eux-mêmes d’aucun usage, ils ne signifient proprement rien ; ils ont seulement rapport à une idée, par exemple Kru à la idée d’action. Les élémens secondaires qui affectent le primitif, sont les terminaisons qui le fixent à être nom ou verbe ; celles selon lesquelles il doit se décliner ou se conjuguer, un certain nombre de syllabes à placer entre l’élément primitif et les terminaisons, quelques propositions, etc. A l’approche des élémens secondaires le primitif change souvent de figure ; Kru, par exemple, devient, selon ce qui lui est ajoûté, Kar, Kra, Kri, Kir, etc. La synthése réunit et combine tous ces élémens, et en forme une variété infinie de termes d’usage.



É espantoso como o espírito humano pôde atingir a perfeição da arte que começou nestas gramáticas. Os autores reduziram a análise, da mais rica língua do mundo, a um pequeno número de elementos primitivos, que pode se considerar como a caput mortuum da língua. Estes elementos não são de nenhum uso, eles não significam propriamente nada, eles só relatam uma idéia, por exemplo Kru à idéia de ação. Os elementos secundários que afetam o primitivo são as terminações que o fixam em ser nome ou verbo, aquelas segundo as quais ele deve se declinar ou se conjugar, um certo número de sílabas para empregar entre o elemento primitivo e as terminações, algumas orações e etc. Com a aproximação dos elementos secundários, o primitivo muda muitas vezes de figura; Kru, por exemplo, devem, segundo este que lhe é acrescentado, Kar, Kra, Kri, Kir, etc. A síntese reúne e combina todos estes elementos e os forma uma variedade infinita de termos de uso.



Este é o parágrafo mais gramatical, onde Pons mostra que realmente conhecia a língua que descrevia em sua carta. Ele descreve aqui os élémens primitifs, ou dhātu, as raizes verbais, de onde se origina toda palavra declinada ou conjugada do sânscrito. O modo como descreve a estrutura da língua é exato. Estas raizes são, de fato, o caput mortuum da língua, o elemento da língua que não pode ser reduzido, que traz a ideia primária (e mais profunda). As raizes não podem ser expostas como palavras sem antes passarem por uma mudança morfológica. Y. Dahiya, em seu Panini as a Linguist, afirma, sobre estas raizes, que “Pāṇini has not defined the dhātu in his treatise. But he has listed 2014 roots under ten groups in an appendix called Dhātupāṭha (pg.100).”


Em todo caso, um bom exemplo para ter ideia como Pāṇini expõe estas raizes verbais, é o aforismo (sūtra) um do capítulo três da primeira parte de sua obra:

bhūvādayo dhātavaḥ  (I.3.1)

As raizes verbais [são] de bhū (ser/estar) em diante.

Um outro exemplo parecido, porém mais específico:


sanādyantā  dhātavaḥ  (III.1.32)

As raizes verbais [são aquelas com o afixo] san, do começo ao fim.

Ou ainda um outro, mais direto:

dhātoḥ (III. 1. 91)

Da raiz verbal.

Este é um adhikāra sūtra, um tipo de aforismo que “governa”, “lidera” outros.


Uso o próprio exemplo de Pons para expor, mais descritivamente, o que ele trata.


a) Peguemos a raiz verbal kṛ (Kru), que traz a ideia de ‘fazer’, ‘agir’.

b) Apliquemos a ela a estrutura morfológica que ele descreve no fim do parágrafo:


b.1) Substantivo:

kṛ > kar + ma + n > karman + 0 > karma =  as ações, as atividades (nominativo, singular)


b.2) Verbo:

kṛ > kar + o > karo + ti > karoti = ele faz, ele age (terceira pessoa, presente)


Das dez classes de raizes verbais sânscritas, que se encontram no Dhātupāṭha (coleção de raizes) da obra de Pāṇini, a raiz kṛ aparece nas classes primeira, sexta e oitava. Ela, portanto, sofre uma mudança (kṛ > kar) morfológica (na verdade, morfofonêmica) que é descrita como (as regras) vṛddhi e guṇa, colocadas no primeiro e segundo aforismos da obra Aṣṭādhyāyī – para vermos o grau de importância. Na última frase do parágrafo, Pons descreve a tecnologia da gramática sânscrita pela qual as raizes verbais, mediante a transformação morfofonêmica e estrutural correspondentes, dão origem a um número infinito de palavras – inclusive atuais. Uma tecnologia procurada por Willem von Humboldt no século XIX, e, mais tarde, por Noam Chomsky.


Quando Pons fica espantado com a perfection de l’art desses gramáticos, sendo Pāṇini o primeiro com uma obra escrita, podemos entender a diferença entre a gramática latina e sânscrita, assim como a importância desta para os estudos linguísticos.




Nenhuma resposta

Artigos mais antigos »