Automatic Galician Subtitles: Towards the Creation of a Live Subtitling Tool – Live Subtitling and Speech-To-Text Interpreting

The steadily increasing audiovisual material production has contributed to the implementation of live subtitling as an effective way to access audiovisual material, not only for the deaf and hard of hearing but also for viewers who need visual support in order to access media content. Speech recognition (SR) technology plays a fundamental role in the provision of real-time subtitling since it allows capturing what is being heard and automatically transform those words into small segments of written text. As a consequence, further research is being carried out and new technological developments are emerging in order to produce better SR software.

Nowadays, there is a considerable amount of SR programs employed in the production of live subtitling, which usually need human involvement in order to provide subtitles, as in the respeaking technique. However, the automation of the real-time subtitling process is also being explored as a feasible option and automatic subtitles are being broadcast in several countries and for some languages, as with Spanish in several local channels in Spain.

Alongside with Spanish as the official language, there are four co-official languages in Spain, including Galician. Despite the fact that the number of Galician speakers has been reduced in recent years and the presence of such language in the mass media is limited, it is still considered as a vehicular language by a high percentage of the population in its respective autonomous community. With the purpose of promoting audiovisual media accessibility in this Spanish region, the Galician Observatory for Media Accessibility (GALMA), in collaboration with the Multimedia Technology Group (GTM) of the University of Vigo, is currently working on the development of a SR software for live subtitling in Galician language. For that purpose, a first prototype of an automatic SR application was developed. The pilot test of this technology involved a set of 20 audio samples of news programs aired by the Galician public TV channel (TVG). They were used as input for the current version of the Galician automatic speech recognizer and analyzed following the NER model in order to examine the quality of the resulting written text.

This presentation, framed within the QuaLiSpain project, aims to (1) describe the functioning and potential of the automatic SR software prototype and the impact of language models to ensure quality, (2) present the initial results of the research conducted to verify its efficiency, and (3) remark the strengths and weaknesses of the software based on the results obtained in order to improve its performance and, therefore, to enhance access to Galician audiovisual content.

References

Romero-Fresco, P. & Martínez, J. (2015). Accuracy Rate in Live Subtitling: The NER Model. In J. Díaz- Cintas & R. Baños-Piñeiro (Eds.), Audiovisual Translation in a Global Context: Mapping an Ever-changing Landscape (pp. 28-50). Palgrave.

Martín, A.P., García-Mateo, C., Docío-Fernández, L., & Regueira, X.L. (2018). Estudio sobre el impacto del corpus de entrenamiento del modelo de lenguaje en las prestaciones de un reconocedor de habla. Procesamiento del Lenguaje Natural, 61, 75-82.