Knowledge Base

Impact Of Contextual And Lip-Sync-Related Visual Cues On Speech Intelligibility Through Immersive Audio-Visual Scene Recordings In A Reverberant Conference Room

Author (s)

Guastamacchia Angela 1, Galletto Andrea 2, Riente Fabrizio 3, Htrepi Louena 1, Puglisi Giuseppina, Emma 4, Albera Andrea 5, Astolfi Arianna 1

Affiliation

1 Department of Energy, Politecnico di Torino
2 Department of Control and Computer Engineering, Politecnico di Torino
3 Department of Electronics and Telecommunications, Politecnico di Torino
4 Campus Management, Logistics and Sustainability, Politecnico di Torino
5 Department of Surgical Sciences, Università degli Studi di Torino

Publication date

2024

Abstract

Recent hearing research has benefitted from the latest Virtual Reality systems that allowed the reproduction of immersive Audio-Visual scenarios to achieve more ecological listening tests. Indeed, efforts have been spent to identify the aspects that convey actual ecological validity, particularly investigating the effects of visual cues and self-motion on Speech Intelligibility through tests mainly based on simulated scenes. However, work must still be addressed when sceneries developed through real recordings inside reverberant environments are concerned. This study used 3rd-order ambisonics recordings and stereoscopic 360° videos inside a reverberant conference hall to create three virtual audio-visual scenes where speech intelligibility tests were performed, introducing informational noise from different angles. A 16-speaker spherical array synced with a headmounted display was used to administer the immersive tests to 50 normal-hearing subjects. Firstly, tests only composed of the auditory scenes were compared, based on the achieved scores, with tests also providing contextual and positional source-related visual cues, both with and without self-motion, for a total of four different test configurations. Then, to complete the investigation of the visual cues’ impact on speech intelligibility, ten normal-hearing subjects were recruited to performaudio-visual tests incorporating lip-sync-related visual cues for the target speech.

Full paper

https://www.ingentaconnect.com/content/ince/incecp/2024/00000270/00000002/art00022

Keywords

speech intelligibility​, immersive audio-visual environments​, ambisonics recordings​, visual cues in acoustics​, virtual reality listening tests