As part of the STADIEM* programme, the Austrian Broadcasting Corporation (ORF) and the Hamburg based startup aiconix are cooperating to jointly develop a speech-to-text solution with automated recognition of Austrian dialects.
In an interview with Lisa Zuckerstätter: Zuckerstätter was head of the ORF TVthek for 3 years and has been the main person responsible for the newly created area of "Access Services" since 2020. In this function, she wants to advance the accessibility of programmes at ORF.
In your opinion, what forms of accessibility are urgently needed in the media world?
Subtitles are clearly essential for moving images.
What is ORF already doing today for the digital inclusion of people?
As already answered in the previous question, ORF is doing a lot for inclusion. As we know from experience, the provision of UT, AD, NIES or the barrier-free design of our online presences is not only important to reach people with a permanent disability. Statistically, most people are dependent on (technical) support at some point in his/her life. Whether it's because someone breaks their arm while skiing, a prolonged illness means that even small everyday tasks (like watching TV or surfing the internet) become a challenge, or simply carrying a toddler around all day and only having one hand free at a time - inclusive design and accessible services are relevant to all of us (at some point).
How did the cooperation with aiconix come about, what exactly does it involve and what do you expect from the output?
A colleague who is very technology-savvy gave us the tip to get in touch with aiconix. And so, at the end of last year, the ORF entered into a cooperation with aiconix, which provides for the ORF to send data in the form of audiovisual content paired with the associated texts (e.g. transcripts) to aiconix. aiconix uses the material to train its speech recognition software. The aim is (to put it simply) that Austrian dialects are also correctly recognised and transcribed by the speech recognition software. This should eventually make it possible to automatically create subtitles and/or transcripts of content with dialects on a large scale.
Do you already use solutions from aiconix and if so, which ones?
We currently use a solution from aiconix to provide automatic live subtitles in the online area. The software is only used for press conferences because the setting of a press conference (quiet environment, clear pronunciation of the participants, etc.) has a positive effect on the result. We are still in a pilot phase where we provide editorial support for the press conferences. This means that an editor monitors the events and intervenes in case of gross recognition errors and corrects the words before they go online. For this purpose, the stream will be delayed by a few minutes. In a next step, the PKs could be subtitled "unaccompanied" - but this will not be possible until we can "train" the software independently with proper names, e.g. Omikron, 2.5-G rules, in advance so that the recognition accuracy increases.
What solution enhancements / features would you like to see from aiconix in the future?
In addition to the recognition of Austrian dialects, a translation service would be desirable. It would be great to be able to translate video content into the most widely spoken languages in this country (Serbo-Croatian, Hungarian, Slovenian, Turkish, etc.) by means of subtitles, so that we can also reach members of these groups with our information.
* The STADIEM (Startup Driven Innovation in European Media) project is funded by the European Union's Horizon 2020 research and innovation programme under grant agreement no. 951981.