Knowledge Base

Influence Of Stand Configurations On Ecological Validity Of Audiovisual Recording Systems

Author (s)

Guastamacchia Angela 1, Puglisi Giuseppina 2, Bottega Andrea 1, Shtrepi Louena 1, Riente Fabrizio 3, Astolfi Arianna 1

Affiliation

1 Department of Energy (DENERG), Politecnico di Torino
2 Campus Management, Logistics and Sustainability, Politecnico di Torino
3 Department of Electronics and Telecommunications, Politecnico di Torino

Publication date

2023

Abstract

Lately, auditory research has stressed the importance of maximizing the ecological validity of the laboratory tests used for assessing hearing loss degree, empowering hearing-aids and performing their fitting. Thus, simulations and in-field recordings of spatialized audiovisual scenes returning complex acoustic environments are now being integrated into these tests. However, because of the complicated interplay between visual and aural human perception, capturing through in-field recordings synchronized audiovisual scenarios might be tricky. Indeed, to accomplish ecological high-quality audio-video footage, audio-video coherence is essential, which occurs when visual and acoustical scenes’ origins match. Yet, most of the available recording devices embed high-resolution 360° stereoscopic cameras but support spatial audio recording only up to 1st ambisonics order, often requiring the use of two different devices placed on the same stand, one for video footage and one for high-order ambisonics audio acquisitions. This work proposes a method for choosing the best stand configuration, for a case study involving Zylia ZM-1 3rd-order ambisonics microphone and Insta360 Pro or Insta360 ONE X2 cameras, based on the evaluation of (i) the influence of the camera on the captured sound field and of the microphone on the visual field, (ii) the degree of quasi-coincidence between ZM-1 and camera devices.

Full paper

https://iris.polito.it/handle/11583/2980450

Keywords

audio-visual coherence, ecological validity, ambisonics recording, spatial hearing assessment, multimodal scene capture