Corpora | Moscow Aramaic Circle

Turoyo corpora

A baseline corpus of previously assembled texts, almost exclusively collected from speakers in the diaspora, which we have assembled for our provisional database. Presently it is password protected.

An annotated corpus, which was created with the help of the morphological analyzer UniParser designed by T. Arkhangelskiy. The automatic analysis includes lemmatisation and morphological tagging of tokens. Each token (word form) has been provided with its lemma and translation (into English and German). English lemmata for nouns, adjectives, adverbs and particles have been translated from the German lemmata in H. Ritter’s dictionary (1979). The verbal lemmata were translated from the German of H. Ritter’s grammatical sketch of Ṭuroyo (1990), with some corrections. Translations for the lexemes absent from both above sources were obtained by elicitation from informants and by translating Swedish glosses from the lexicon of J. Beṯ-Şawoce (2012).

Another annotated corpus with texts, recorded by MAC members in 2019-2022. It is not published yet, however, texts without annotation can be found in the first corpus.

NENA corpora

The corpus of Christian Urmi. This corpus of Christian Urmi Neo-Aramaic comprises 46 printed editions of Neo-Aramaic texts in a variety of the Latin script (the Assyrian New Alphabet), which were issued during the 1930s in the Soviet Union. For the history of the Assyrian New Alphabet project and the details of its orthography, see A. Lyavdansky “Neo-Aramaic Texts in the New Alphabet Published in the Soviet Union 1929-1938” (forthcoming). When selecting texts for the corpus, preference was given to literary texts printed according to the rules of the stabilized orthography adopted in 1933. Most of the selected texts are translations of Russian and other literature (fiction) and popular science texts. Some original literary compositions in Christian Urmi have also been included. Some newspaper and oral texts have also been digitized for inclusion within this corpus, but they have not yet been included within the annotated corpus because they are transcribed according to other systems of orthography. The complete list of the texts included within this annotated corpus is available via the ‘Select subcorpus’ button.

The corpus of NENA varieties spoken in Russia. This corpus contains texts in several Northeastern Neo-Aramaic (NENA) varieties spoken Russia. Most recordings have been made during field trips to the village of Urmiya, Krasnodar Krai, the only settlement in Russia where ethnic Assyrians constitute the majority of population.

MWA corpus

At the moment there is only one corpus of Modern Western Aramaic language. It is a baseline corpus, where one can search for both Aramaic text and German translation.