Author (s)
Schörkhuber Christian 1, Höldrich Robert 2, Zotter Franz 2
Affiliation
1 sonible GmbHm, Graz, Austria
2 Institute of Electronic Music and Acoustics, University of Music and Performing Arts, Graz, Austria
Publication date
2020
Abstract
Introduction Recently, Six-Degrees-of-Freedom (6DoF) audio recording and rendering approaches have been proposed to enablea variable-perspective playback for a virtual listener. We assume that a distributed sound scene is simultaneously recorded by various main microphone arrays positioned atmultiple perspectives. Variable-perspective rendering enables the virtual subject to freely move within and listento the (virtual)sound scene. Main representatives of suchapproaches use single-perspective recordings and map them onto an outer convex hull of the room [1, 2, 3], and moreover, there are works and patents about the interpolation from perspective recordings synchronously taken atmultiple perspectives in the room, with parametric con-cepts to extract and render the sources detected there in and the diffuse or unlocalized parts [4, 5, 6, 7, 8]. Some works explicitly avoid short-term time-frequency-filtering for artifact-free baseline rendering [9, 10, 11, 12, 13, 6, 14], which, however, may stay limited in spatial precision.This contribution establishes a theory in order to en-sure directionally consistent variable-perspective render-ing from multi-perspective recordings. While the theory is based on physical metrics such as the intensity and energy density, as, e.g. [15], it actually is meant to work for any kind of distributed-perspective spatial recordings, such us-ing resolution-enhanced first-order Ambisonic microphone recordings (HARPEX [16] or DirAC [17]), higher-order Ambisonic microphone recordings (Eigenmike EM32 orZylia ZM-1), or ESMA [18], or distributions out of different arrays, for instance. For a moment, let us assume that each of these recordings is able to consistently relate to energy densities and intensities reproduced at the listener to describe perceived sound levels and directions.The goal of our theory is to obtain a nearly signal-independent strategy avoiding annoying artifacts that signal and position extraction of a complex recorded scenecan entail. We assume the presence of a single source anda diffuse soundfield seen by any local triplet of recording perspectives. Related to triplet-based panning approaches such as [19, 20], our contribution shows a model involving linear mixing weights related to the aerial coordinates corrected by distance and diffuseness ratios. This linear combination rule to mix neighboring recording perspectives provides a consistent intensity vector direction at the listener. Moreover, the rendering yields reasonable and robust energy densities.
Full paper
https://pub.dega-akustik.de/DAGA_2020/data/articles/000295.pdf
Keywords
6DoF audio rendering, variable-perspective sound reproduction, multi-perspective recording, spatial audio interpolation, directional consistency, energy density and intensity vector, higher-order ambisonics (HOA)