How to: Immersive Experience training : 5 – Immersive audio

Immersive Audio

.sound is the only one to be experienced 360…as awareness
.not dolby5.1 surround but spacialised sound that allows us pinpoint the source
surround format is 5.1. (also 7.1) That means there are 6 speakers present, with the .1 representing a sixth low frequency speaker.
.spacialised sound — captured as such or manipulated in studio

Audio captured with spatial properties and audio synthesized to obtain spatial properties, there are two further main techniques that are employed in Immersive sound production: binaural and ambisonics.

Stereo audio was panned or placed sonically within what we could visualize as a two-dimensional horizontal plane.

Holophonic sound is another term for spatial audio and has been the next great paradigm shift towards 3d. We pan sounds within a sphere!

third dimension means that height is available, and audio sources can be placed anywhere within the sphere and it is speaker agnostic

Previous audio formats like stereo or surround sounds have been defined by the number of speakers present at playback, 2 and 6 for stereo and surround sound respectively, the 0.1 in 5.1 surround sound is in fact its own dedicated low frequency speaker. Spatial audio formats can be speaker agnostic and scalable meaning the audio can be more freely placed within a spherical representation with great precision and that positioning can remain the same, theoretically, whether there are 16, 22, 36, or infinity number of speakers in the playback system.

7.1 surround

Spacial Dolby Atmos

https://www.darkfield.org/home

AMBISONIC is speaker agnostic WXYZ direction

RODE Ambisonic mic

https://www.gear4music.com/Recording-and-Computers/Rode-NT-SF1-Ambisonic-Microphone/2FP8?origin=product-ads&gclid=Cj0KCQjw16KFBhCgARIsALB0g8LcWAoqzRafZFsRwqO4QT2rPSDxIyBNkh4HyUybsT1pro4LqrU_S40aAmvJEALw_wcB

Ambisonics is a spatial audio format that aims to represent the sound field at any position, or from any perspective in space. When we use the term “sound field” this concept is generally visualised as a full 3-Dimensional sphere with the position of the listener directly in the centre, but it just means the full world of sound surrounding us.

Unlike previous surround sound formats, Ambisonics includes height information to recreate true 3D audio. This creates a combination of 4 recorded audio tracks that are described as “A-Format”. This raw audio in A-Format needs to be converted in order to be useful, however, into “B-Format”. Realistically, for the vast majority of immersive experience productions, B-Format is the only type of Ambisonics that will be used.

The four channels in B-Format are defined as WXY and Z and represent the 3-Dimensional axis respectively (XYZ) with W at the centre.

This means that Ambisonics is very well suited to “in person” experiences or those intended for more than one person.

Deep listening exercise …

BINAURAL SOUND – dummy ears microphones

VR Binaural Audio has established itself as the main driving force behind the audio component of immersive experiences.

3Dio Free Space Binaural Microphone

£400

The defining feature of Binaural is that it attempts to replicate how humans hear naturally and to “hijack” our brain’s audio processing. The way that our brain processes the sounds around us is very dependent on the positioning of our ears and the differences between the soundwaves reaching our two separate ear drums. When we hear a sound to our left, the sound wave generated reaches our left ear before it reaches our right. This Interaural Time Delay (ITD) gives our brain some foundational information in order to start determining the position of the sound’s source.

In addition to this, all sound waves deform based on the geometry they encounter on their path towards the ears. This includes, importantly, the geometry of our own bodies. The shape of our ears and our shoulders / head, affects all soundwaves in different ways before they reach the eardrum. Essentially, certain frequencies are deformed by this upper body geometry. This “filtering” of the sound waves by our own body shape is another factor that the brain takes into consideration when figuring out a sound’s origin.

Binaural recording, then, attempts to “trick” the brain, by faking the distance between the ears, and the geometry of our bodies. Put simply, it places two microphones (one for each ear) in such a way that the microphone capsule surface is placed exactly where our eardrums would be.

This can be done by placing microphone capsules in real human ears, using recording equipment that look like standard bud earphones. It can also be done by placing microphone capsules within a “fake” pair of ears using something called a Dummy Head Microphone.

In order to playback Binaural Audio, headphones are vital. We want the left and right loudspeakers containing the audio feeds to be as close to our actual eardrums as possible, and without any bleeding between ears. Unlike Ambisonics Binaural doesn’t work through loudspeakers.

Sanctuaries of Silence

Dynamic audio – responds to a listeners head movement in VR Real-time experiences with an HMD

Reactive and adaptive audio

Unity audio does that …. Plus SDK further dynamism

Leads to Bianural synthesis – spacialising any sound through a process called convolution and then make it dynamic (in games)

Binaural Synthesis let’s us spatialise any audio (even if it hasn’t been captured spatially) with specialist microphones. Essentially, any mono audio source, no matter it’s origin, can undergo the process of Binaural Synthesis and become spatialised.

through a process called convolution, the feeds from standard audio files can be summed with a kind of silence that contains spatial information, to imbue the standard audio feed with the spatial information.

Head Related Transfer Functions (HRTF) are a mathematical representation of a sound in relation to our head position at any one time. This abstract concept is difficult to understand, and the origin “silence” of Head Related Impulse Respones (HRIR) creates HRITFs even more so. Essentially, there is a way to manipulate audio by doing complex calculations that synthesise the effect of a point in space. With this synthesis complete, any old sound recorded with any old microphone can be placed spatially as if it originated from any point in space around us. If you’d like to try and understand these concepts further, please do make use of the resources in the final section.

1. The first is through the use of a “binaural panner” limited good use for non interactive stuff
2. The audio processing within a game engine, allows us to apply the same binaural synthesis techniques that a binaural panner would, but vitally, it lets us update this constantly. This means that in any production made with a game engine it’s possible to reposition an audio source in space relative to head movements, every frame. So if you’re using head tracking, as with most Head Mounted Displays (HMDs), you can keep sound sources persistent with our shifting perspective.

Dynamic Binaural Synthesis allows sound source positioning to move alongside our real world head movement, maintaining immersion. The ability to update synthesis as our head moves is extremely important.

Spatial Capture vs. Spatial Synthesis

Use a Binaural Panner
Check the output plug in in Logic – and the plugin downloaded
https://en-uk.sennheiser.com/ambeo-orbit
Free specialised plugin
https://freesound.org
https://info.dear-reality.com/en/download-dearvr-micro-for-free?utm_campaign=amb-mi_gbr_en&utm_source=amb-mi_gbr_en&utm_medium=button&utm_content=micro
dearVR MONITOR

Recording:

Binaural microphones – these microphones place microphones near the opening in our ears:

For Ambisonics, two solutions currently dominate the marketplace, manufactured by Rode and Zoom respectively.

Editing:

Importantly, for Binaural Audio, we want to do as little processing through effects as possible. The phenomenon of tricking the brain through Binaural is very fragile, and since it depends so heavily on the frequencies of the sound in both ears, the effect is easily ruined. Ideally, we want to import binaural recordings and do absolutely nothing to them, beyond cutting / trimming.

For Ambisonics, however, we need a very important plug-in to carry out the conversion from “A-Format” as it is recorded and “B-Format”. Rode, who manufacture the Soundfield microphone detailed above, have created a free plug-in that does exactly this:

Soundfield by Rode Plug-In

https://en.rode.com/soundfieldplugin

Designing a Soundscape

Get a pad and pen, or an image editing program like Photoshop. In this editing software, open up an image of a 3D panner. We’ve attached one here as a PDF that you can download and print off if you prefer to draw. You’ll notice it looks like the 3D sound panner we used in VIDEO 4. If you’re using a pen and paper, draw the same shape.

Try and also note the correct elevation for each sound source. Remember that height is the third dimension that differentiates spatial audio from traditional surround formats. The column alongside the circular paneer diagram can be used for this, as in the example PDF.

If we were preparing to recreate this soundscape within a game engine, for instance, we would use this sound list to create an asset list of sounds that we needed to record. We would then place these sounds within the scene, where we had plotted them in relation to each other.

Francesca Panetta is an industry leader in the world of immersive and experimental storytelling. As an immersive artist and journalist she uses emerging technologies to innovate new forms that have social impact.

https://www.francescapanetta.com/portfolio/6×9/
6×9: a virtual experience of solitary confinement.
Editing process of sound — speak directly to the audience
Hard to absorb in VR — leave a lot of silence
Headset editing —
Enough sounds –

Rode have also created a fantastic source of ambisonic recordings with their Sound Library:

Oculus also offer a newer and even more detailed guide to VR Audio Design which also does a fantastic job of explaining Spatial Audio fundamentals, with good visual aids:

Unity Learn: Audio
Oculus Spatial Audio for Cinematic VR and 360 Videos
Oculus VR Audio Design, Engineering and Mastering Guide
https://creator.oculus.com/learn/spatial-audio/