aiconix schriftzug red 2

Speech-to-Text with Dialect Recognition


This project has indirectly received funding from the European Commission’s Horizon 2020 Framework Programme through the STADIEM project (Grant Agreement 957321).


Speech to text conversion is the process of automatically converting spoken words into written texts. All speech-to-text systems rely on at least two models: an acoustic model and a language model. In addition large vocabulary systems use a pronunciation model.

To get the best transcription quality, all of these models can be specialized for a given language, dialect, type of speech, and communication channel.

The speech transcript accuracy is highly dependent on the speaker, the style of speech and the environmental conditions.

The biggest challenge, however, is to recognise dialects - because only if this is possible, automatically generated transcripts and subtitles can be almost error-free.

As part of the STADIEM acceleration programme, aiconix has set itself the goal of improving the automated recognition of Austrian dialects in audio-visual media. For this purpose, the company is training a language model that specialises in the recognition of Austrian dialects. The focus of the project is on the recognition of "Wiener-Standard-Deutsch" (Viennese Standard German), followed by other Austrian dialects and the recognition of other dialects within Europe.

Crucial to the success of the project is the data available to the start-up for training the language model. For this purpose, aiconix worked closely with Austrian partners such as the Austrian Parliament, Austrian Television (ORF), Austria Presse Agentur (APA) and Russmedia.
They all supported the company with data such as videos and audios as well as matching hand-made transcripts, which feature content with a strong dialect. The collection of this training data already represents a major success for aiconix and forms an essential cornerstone for the training of a dialect language model.

In the development phase of the project, the aiconix development team focused on training the language model with the existing data. The aim is to reduce the error rate in automatically generated transcripts. At the end of the project, the partners will use the developed dialect model for automated transcription and subtitling of audio-visual content. This should result in cost and time savings for the partners. In addition, the target audience should be guaranteed digital accessibility to the content of the respective partners. 

Austria flag speech to text
Social Posts-2
"In Austria, about 450,000 people live with a permanent hearing impairment. Without subtitles, it is very difficult for them to understand video content. Especially in the field of information, it is important that a large part of the population understands the transmitted content. By providing its most important news programmes with live subtitles, ORF plays an important role in the dissemination of information. Especially in times of crisis, this is an essential asset and almost a unique selling point on the Austrian media market."


Lisa Zuckerstätter, Head of "Access Services" at ORF

Read more

Quotes from our partners

Oesterreichisches Parlament_weiß

"The Parliamentary Directorate promotes equal and self-determined participation in democracy. However, this is only possible if barriers are removed. Automated speech recognition and other modern technologies can help."

Tatjana Novakovic

Accessibility Officer

ORF Logo

“The cooperation between ORF and aiconix is running smoothly and purposefully." are no other suitable solutions on the market.”

Lisa Zuckerstätter

Head of "Access Services"


"The cooperation with aiconix in the "Stadiem" project represents an important milestone for APA - Austria Presse Agentur in the further development of its Media Intelligence Services. APA uses Speech-2-Text technology for monitoring audiovisual media (radio, TV, podcasts, web TV). The better this technology and the speech model are individualized for the Austrian media market, the higher the benefit. In addition, quality leaps in the speech model also result in further areas of application for use in the core target group of APA's media owners and beyond."

Klemens Ganner

CEO APA-DeFacto GmbH

Russmedia Logo

"Russmedia sees an increased need for dialect recognition. This can make content accessible to a larger target audience and achieve higher reach. lt also enables barrierfree access to the produced content and offers an USP compared to other broadcasters."

Georg Burtscher


Want to learn more about our audio-visual content solutions, such as how live event transcription works or how to make your content discoverable, searchable and actionable?


Feel free to contact us!