Automatic speech recognition in Spain: the Basque and Catalan case

Ana Tamayo (University of the Basque Country – UPV/EHU) and Irene de Higes Andino (Universitat Jaume I)

This contribution aims at analyzing the speech to text recognition of news programs in Basque and Catalan. It presents results of QuaLiSub (The Quality of Live Subtitling: A regional, national and international study, led by Universidade de Vigo), in which automatic speech recognition is analyzed applying criteria from the NER model (Romero-Fresco and Martínez, 2015).

For Basque, 20 samples of approximately 5 minutes of news programs from the autonomic channel ETB1 were recorded in May 2022. Since automatic live subtitling is not a reality in Basque TV, the Elhuyar Foundation collaborated by generating subtitles through speech recognition of 19 samples (1 sample was not recognized by the program) using their technology ADITU. A total of 97 minutes and 1737 subtitles were analyzed.

In Catalan, the analysis was done on 26 samples of approximately 5 minutes of news programs from the bilingual regional news bulletin in Spanish national television (La 1). These bilingual subtitles (in Spanish and Catalan) were broadcast from April to July 2021 and recorded by TVE for quality assurance. In this contribution, results on the accuracy rate of the Catalan language in 2116 subtitles (a total of 130 minutes) will be presented.

The results in both languages show an average accuracy rate below the minimum threshold of 98% set by the NER model. A qualitative analysis based on quantitative data foresees some room for improvement regarding language models of the software including proper nouns, punctuation, recognition of numbers and percentages and character identification. Conclusions show that, although quantitative data does not reach the threshold to consider the quality of recognition fair or comprehensible with regards to the NER model, results seem promising. When presenters speak with clear diction and standard language, accuracy rates are fair enough for these two minority languages like Basque and Catalan in which speech recognition software are still in early phases of development.

Ana Tamayo is an Associate Professor at the University of the Basque Country (UPV/EHU). She obtained her BA and MA at Universitat Jaume I (Castellón, Spain). At the same university she defended her PhD about captioning for d/Deaf and Hard of Hearing children in 2015. She has completed two international research stays, at the University of Roehampton (London, UK) and at the Universidad César Vallejo (Lima, Perú). Currently, she is a member of the research group TRALIMA/ITZULIK (UPV/EHU) and collaborates with TRAMA (UJI) and GALMA (Universidade de Vigo). Her research interests focus on audiovisual translation and accessibility in different modalities. She is especially interested in contributing to the research on accessible filmmaking and captioning and accessible filmmaking and sign language.

Irene de Higes Andino is a full-time lecturer and researcher of the Translation and Communication department at Universitat Jaume I (Castelló de la Plana, Spain) and member of the research group TRAMA (Translation and Communication in Audiovisual Media). She mainly teaches audiovisual translation (voice-over, dubbing and subtitling) and audiovisual accessibility (audio description for the Blind and Visually-impaired and Subtitling for the Deaf and Hard-of-Hearing). Her research interests focus on multilingualism, identity, audiovisual translation and accessibility
She holds a bachelor’s degree on Translation and Interpreting by the Universitat Jaume I and PhD on Translation and Interpreting by this same university with a thesis on dubbing and subtitling multilingual films into Spanish. She has worked as a production assistant in a dubbing studio and as a freelance translator specialised in articles about cinema, dubbing and voiceover for TV, subtitling and audiodescription.