Automatic Speech to Text technologies are likely to transform captioning and media accessibility over the next five to seven years. As well as fundamentally transforming the production methods for creating and delivering captions, there is also the opportunity, by reducing costs, to increase the amount of captioning across global media and communications.

Understanding how the media industry can optimise the potential benefits of ASR and minimise the risks or negative impact on audiences has the potential for significant commercial, social and reputational impact. This presentation will first look at the reason for thinking ASR technologies will become central to accessibility captioning. It will do this by looking at the rate of ASR development, and direct comparison of automatic caption quality metrics with existing methods, as well as the commercial motivations for using the technology.

I will then expand the frameworks for understanding caption quality – analytical, commercial and pragmatic – and the implications for Automatic Speech Recognition of deploying into these frameworks.

This section will also serve the purpose of identifying the key stakeholders of captioning, on the basis of understanding how quality can mean different things to different people and organisations.

“Stakeholders” can sound abstract, so to be clear, stakeholders means core audiences for captioning and wider users of it, but is also intended to identify wider commercial and regulatory captioning interests, such as broadcasters, providers and regulators. It will identify ‘early adoption’ use cases, and use cases where more caution is likely to be necessary.

I will then look at what modelling these frameworks of quality will tell us about how the uptake of ASR may be accelerated or delayed.

I will suggest the time is right for ASR-driven captioning, and this will provide the basis for conclusions of how we can optimise for overall benefits across all stakeholders, and the things we need to do to ensure these benefits can be realised.