The quality of automatic and human live captions in English – and beyond

Pablo Romero Fresco (Universidade de Vigo) and Nazaret Fresno (University of Texas-Rio Grande Valley)

Closed captions play a vital role in making live broadcasts accessible to many viewers. Traditionally, stenographers and respeakers have been in charge of their production, but this scenario is changing due to the steady improvements that automatic speech recognition has experienced in recent years. This technology is being used to create intralingual live captions, and broadcasters have begun to explore its potential use. Human and automatic captions co-exist now on television and, while some research has focused on the accuracy of human live captions, comprehensive assessments of the accuracy and quality of automatic captions are still needed. This presentation will tackle this issue by introducing the main findings of the largest study exploring the accuracy of automatic live captions conducted to date. Through four case studies including approximately 17.000 live captions analysed with the NER model from 2018 to 2022 in the UK, the U.S. and Canada, this presentation will track the recent developments of automatic captions, compare their accuracy to that achieved by humans and wrap up with a brief discussion of what the future of live captioning looks like for both human and automatic captions.

Beyond this, and within the framework of the Spanish-government-funded Qualisub project, the presentation will end by addressing the initial findings of two related studies: the automation of the NER model and quality assessment of two workflows used by the European Parliament to provide live interlingual captions (a completely automatic workflow and one involving simultaneous interpreting and automatic speech recognition). These findings help to shed light on the future landscape of live speech-to-text communication in the near future.

Pablo Romero Fresco is senior lecturer at Universidade de Vigo (Spain) and Honorary Professor of Translation and Filmmaking at the University of Roehampton (London, UK). He is the author of the books Subtitling through Speech Recognition: Respeaking (Routledge), Accessible Filmmaking: Integrating translation and accessibility into the filmmaking process (Routledge) and Creativity in Media Accessibility (Routledge, forthcoming). He is on the editorial board of the Journal of Audiovisual Translation (JAT) and is the leader of the international research group GALMA (Galician Observatory for Media Access), for which he is currently coordinating several international projects on media accessibility and accessible filmmaking. Pablo is also a filmmaker. His first short documentary, Joining the Dots (2012), was used by Netflix as well as film schools around Europe to raise awareness about audio description. He has just released his first feature-length documentary, Where Memory Ends (2021), which has been selected for the London Spanish Film Festival and the Seminci, in Spain.

Nazaret Fresno

Nazaret Fresno is Assistant Professor at the University of Texas at Rio Grande Valley (United States), where she teaches a variety of courses in translation and interpreting. Before joining academia, she worked as freelance translator and opera audio describer for several years. Her research interests include audiovisual translation and media accessibility, and she is particularly interested in exploring how audio described and closed captioned audiovisual products are received, comprehended and enjoyed by end users.

After participating in several research projects that focused on subtitling for the deaf and hard of hearing and audio description for the blind and visually impaired in Europe, Nazaret is now investigating closed captioning in the United States, both in pre-recorded materials and, especially, in live programming. She is also involved in the QuaLiSpain study, which is aimed at assessing the quality of live subtitling on Spanish television. She is member of ATISA (American Translation and Interpreting Studies Association)