Skip navigation
18.07.2023

New language technology developed for Uralic languages

This year, there has been progress in developing modern software for the Uralic languages. Programmes related to Udmurt and Mari are being developed, in addition to a Uralic-language machine translation engine at the University of Tartu.

For Udmurt, the 11th-grade student of the State Boarding Lyceum, Arseniy Pozdeev created an application to learn the Udmurt language and train one’s memory. He participates in a competition organised by the Ministry of National Affairs of Udmurtia, winning second place, which granted him an award of 110 thousand roubles. With this he was able to develop the software WORDskon that is now available to download for Android phones.

In Mari El, a speech recognition software Alisa is developed for the Meadow Mari language. The system will translate speech to text and create voice assistants. The software’s material is the National Corpus of the Mari Language that was over 20 million word usages. In addition, a subcorpus of Hill Mari is being created.

Researchers at the University of Tartu Institute for Computer Science have been working on a Uralic machine translation engine since 2021, called NeuroTõlge. This year, they added support for 17 new languages. In total, the engine supports 23 languages, many of the languages not part of any larger software, such as Google Translate. In addition to the more commonly supported Estonian, Finnish and Hungarian, it now includes Livonian, Võro, Karelian, Livvi Karelian, Ludian, Veps, North Sami, South Sami, Inari Sami, Skolt Sami, Lule Sami, Komi, Komi-Permyak, Udmurt, Hill Mari, Meadow Mari, Erzya, Moksha, Mansi and Khanty.

The researchers have invited people to test the machine translation and give feedback to improve the system. This can be done by editing translations at translate.ut.ee. Texts like poems, articles, books and others in these languages are also welcomed and can be sent to ping@tartunlp.ai.