Written by Rob Bridgett.
As we work towards greater control over the mix of a video game, and by mix, I also especially refer to the installation and removal of DSP effects, routing and sends (not just volume changes), and as we gain easier access to high-level sound culling features like HDR mixing (hitting the mainstream soon via Wwise), there starts to emerge a clear split in technology and approach.
HDR, essentially an automatic (passive) mixing system, is built to do the majority of the interpretive work on what sounds should be granted permission to play. The feature itself is an ingenious blending of a sound priority system, a playback culling system, and a mixing engine that simulates a perceptual realism based on dB SPL. Compare this with the somewhat hard-to-manage, but powerful (active) mixing technology of state-based mixer snapshots, which puts all of the control and decision making in the hands of the mix implementer (also now prevalent in Wwise and FMOD Studio) and you start to see two distinct aesthetic methods, in fact two distinct approaches at work. The first is an aesthetic of extreme realism, of vérité, almost to the point of an entirely objective listener depending on the realism of the dB SPL tagging, where sounds are prioritized by loudness rather than emotional or narrative value*. The latter, a system which seems like a method of overriding reality, of controlling sound that is completely implementer-defined, requires manual (or at least programmatic) installation, but requires the designer to make all the decisions themselves of what is or is not important at that particular moment.
An HDR exclusive system is perhaps not a good choice for games that require even the most occasional subjectivity from the point of view of its characters. I would also say, having co-developed and wrestled with exclusively state-based snapshot systems, that these systems can be a nightmare of one-off cases where the rules you’ve created fail at some point down the line, of little use when it comes to complex simulations of ‘realism’, story, or player-choice (however you choose to define that).
What is clear, is that these two systems can absolutely work together, hand in hand, passing over from one system to another, or, to think of it another-way, layering state-based unique-case narrative control over the top of generic HDR-controlled simulation sub-systems. What I think this hybrid approach brings sound designers is the ability to have complete control over a rich and complex interactive sound world – a passive system that allows an active system to grab control at moments that require additional control. It allows windows of extreme realism, and the ability to quickly change perspective to that of an objective point of view, focusing only on very specific sounds, filtering out other sounds, or subtly pushing and pulling reverb saturation to orchestrate a malaise, claustrophobia, or paranoia.
What I think is missing however (and I think we have this problem in all aspects of sound production, not just mixing), is the shorthand for us to communicate and play with these ideas in the context of a collaborative team. These sound techniques are of no use whatsoever unless coupled, orchestrated, and integrated as part of a narrative and visual experience. These are powerful concepts and technologies that we now have access to, so how best to employ them?
I don’t necessarily think we need a lexicon, or a new sound or mix-centric language, because every game developer and team is different and relying on these rules and the definitions tie us to our past. I think what is needed are passionate, enthusiastic, socially mobile people, who are able to communicate with people from top to bottom in a game team.
When watching a film, and thinking about the audio-visual mix of a film, the narrative and point of view of the characters in the film, I tend to think of the broad strokes. These are the marks of the director, and are usually something you can see reflected in the approach of the cinematography, whether that is through use of color palette, lenses, or focal depth. These are often clues to what is rippling underneath in the mix.
The hybrid mix approach allows us to bounce from simulation to point-of-view, to color the scene with effects, remove sounds, or groups of sounds, focus, filter, and mangle the entire sound track, or even just a single part of it. We can begin to think in broad strokes, about different lenses that the player (or main character) sees and hears things through. They may be called ‘modes’ by game design, but I think there is a way to de-clutter that technical thinking and apply it more to narrative.
These modes are opportunities to think about perspectives and treatments of perception. One of the easiest and most common ways of talking about this with producers and art directors, is to talk in terms of ‘sound lenses’. Examples that are often relayed to me by these other disciplines are often of ‘sound lenses’ too. I find the broad strokes of the sound lens are very easy to describe and demonstrate, easy to mock-up off-line in an audio only, low-cost context, and easy and quick to implement in run-time as a state-based layer of DSP effects over the top of an HDR system.
Two Types of Lens
Again, very broadly, just as there are broadly two categories of mixing technology, those of passive systems and active systems, there are two categories of sound lens, though these are aesthetic and not tied to either of the mix-type categories per se.
An ‘objective’ lens, for example of a hand held video camera, with compression, distortion, EQ – could give a cold, remorseless eye/ear on a scene.
Switching to a reverb bathed, low pass filter reduced soundscape applied to all but the musical score, allows on the other hand an entirely ‘subjective’ perspective on the same interactive environment. Apply a snapshot of convolution reverb using not an impulse recording, but a shard of ambient sound, onto all in-game environmentally sound, and you can produce a completely broken, disturbing perspective, perhaps emulating ears that no longer function at all, and then coordinate an instant switch back to ‘reality’.
Arguably, another element, or parameter, of the sound lens is the way you transition from one to another. This can be a straight ‘cut’ or a blend over time, though this language and type of articulation is already thoroughly discussed and written about in the practice of film editing, and of course still relies heavily on co-ordination with the other disciplines to create an effect.
Using all of these perspective changes together in concert with transition times gives the game an underlying structure, or drama. It allows the creative team to think about moments, about contexts, and about the meaning for the player of the other sub-systems and meta-game systems that are at work under-the-hood.
This culmination of technologies, of enriched runtime mixing systems, of state-based DSP rich processing gives such powerful control to the sound designer, that the real challenge remains to fully leverage the technology to the service of narrative and how the orchestration of these lenses can create a powerful effect. This can only be done by seeding it into game design, narrative design, and art direction, and by listening to and coordinating all these elements together into a single experience. All of these elements are at the service of the experience, and the technology underneath, while fascinating and exhilarating to discuss among other technically minded implementers, should always be invisible to the player.
The notion of a mix is starting to expand for me into other areas beyond audio, and when you begin to think about mixing and transitioning lenses and broad emotional states, there seem to be a lot more creative possibilities for collaboration on the actual sound mix.
*This is not to say that you cannot tag dB of each sound in a narrative way, rather than a realistic way.