Written by Raison Varner.
What does it mean to be “next gen”?
As I write this, the first round of excited customers are opening their new Xbox One and PS4 consoles. We’re here! The next generation is upon us! So – as follows with any console generation transition – we are all getting asked what we can do to be “next gen.”
The 8 gigabytes of memory that the new consoles offer is certainly a generous improvement over the previous generation’s 512 MB. But will that memory increase lead to an improvement in audio format quality that in and of itself would be impressive enough to declare next gen status?
The next hallmark in audio fidelity that could be considered “next gen” will likely be the abandonment of compression altogether. However, it’s unlikely that even with 8 gigabytes of memory, we will see audio memory budgets large enough to allow uncompressed audio to become the new normal. My guess is that we’ll have to wait out this generation before uncompressed audio becomes a new standard.
If that is true, then we’ll need to look to different areas to find our next gen bona fides.
Perhaps those bona fides lie within the pursuit of a true mixing AI? This has always been at the top of my list of dream projects, and if successfully done, is probably one of the single biggest tools we would have to begin to approach the quality of a film mix. But budgets being what they are, this is probably an effort beyond any but the most lavishly funded audio departments.
However, that doesn’t mean we can’t start building the foundation now!
After all, a mixing AI is a search for context; a frame to frame real-time contextual understanding of what the player is experiencing and matching that experience with resulting changes in our games. To that end, I believe that building emotional awareness will be the true hallmark of a “next gen” video game.
The games we remember best are the games that succeeded in establishing strong emotional connections with us. Titles that succeeded in that effort are the real gems of our industry and it is within those emotional connections that we will find the greatness of our art.
Music as the first battleground for context
My first forays into contextually-aware systems have always involved dynamic music systems. Not only is music one of the most easily identifiable elements of any game, it is also one of the strongest tools we have to tug on people’s emotional strings. It’s hard to think of other methods that are as universally accessible and have the same power to improve an emotional connection or evoke an emotional response as music.
The games that do dynamic music the best are the ones that are able to get away from a 1:1 relationship with basic game state changes. We are all familiar with combat music systems that are married to game state changes so literally that they become enemy telegraph systems. This level of implementation meets the minimum need for a change in tone during combat, but when a systemic music change is guaranteed, it becomes a redundant layer of messaging that impoverishes its emotional power. The player doesn’t need the musical cue to understand that they are in a fight and the utterly predictable nature of the music change ends up ruining what emotional impact it could have otherwise had.
Nuance is often treated as a luxury in game development, but without nuance, a game will never reach its true emotional potential. Without nuance, we’re left with a detached and repetitive mental exercise. While this can sometimes satisfy the needs for many projects and players alike, our most ambitious titles will have to explore greater domains than the purely intellectual.
So how do we begin to bridge that emotional gap? How do we account for all the myriad of situations that can develop unpredictably and reject the meaningless for the meaningful?
Building the Foundation of a Contextualized System
Identifying your context
- To use Borderlands 1 and 2 as an example, I tried to approach the problem of redundant combat music by focusing on the context of the player and the level of intensity they were experiencing at any given moment. For other games, this could be any subjective value or critical theme that makes the game unique. For Borderlands, the quality that seemed most important to key off of was intensity.
- Developing a system that was aware of when the player felt the game was most intense necessitated a way to reject the meaningless and unimportant fights for the more challenging and meaningful engagements.
- In order to get that balance right, a lot of different variables needed to be tracked and interpreted. As those variables became identified, edge cases emerged that needed to be tamed with additional controls and constraints.
- In Borderlands, this resulted in an experience where players that are over-leveled and unchallenged will generally not hear combat music until they reach a point in the game at which power levels between the player and enemies begin to even out and the player is once again introduced to threatening situations.
Establishing a Context Pool
- Making a determination about context first requires that the system be capable of recording multiple key values over time and distilling these values down to a single variable. I’ll refer to this single variable as our “context pool.” By monitoring a context pool over time, we can establish a delta which informs us about the rate of change in key values. Once we understand the rate of change, we can then use that delta to form some conclusions about the behavior of the context pool.
- In other words, you can use the rate of change to understand sudden spikes of the context pool vs slower increases over time. This opens up possibilities of reacting differently to slow vs. fast changes in your key values, providing some differentiation we would have lacked without the delta.
- To give another Borderlands example where I was focused on intensity…
- Three new monsters spawning and then attacking the player over 20 seconds is a very slow ramp up in intensity (or possibly none at all).
- Three new monsters spawning and then attacking the player over 3 seconds is a much more aggressive ramp up in intensity.
- If I were only keying into the state change from [no combat] -> [combat], these two situations would appear to be identical scenarios – but they are actually quite different situations and are best served with different approaches.
An Alternative to Using the Delta
- If the rate of change itself becomes a problematic control for the context pool, another way to approach interpretation of your key values is to consider your context pool to be an upwards pressure on the system. If a constant downward pressure is then applied that operates at a fixed rate, you can use that downward pressure to institute a control against a slow increase in the context pool. For Borderlands, this approach became ideal for us because we wanted to generally ignore ramps in intensity over long periods of time.
- To try and display that in flow chart format, here is how our high level logic worked in Borderlands 1 & 2 for determining when to play combat music.:
Building the Music System for Borderlands 1 & 2
Now that we have an overview of an approach to understanding context, let’s explore how these ideas were applied to the Borderlands universe and some of the logic or thinking behind the decisions that led to the music system in both titles.
Establishing key values
Key values had to be chosen that inform us of intensity. Since intensity revolved solely around combat in Borderlands, the bulk of our key values came from the enemies themselves. Some of these values were:
- Enemy level vs. player level
- Inherent threat/challenge of enemy type
- The badass rank of an enemy (Badasses = Elite Monsters)
- Number of enemies
- Player health
- Shield health
- Time since last combat.
Setting an upwards threshold
In Borderlands 1, I found that on average, five basic enemies of equal level (skags in this case) was a point at which an average player would begin to experience some pressure. The sum value of five skags became my initial reference point for what constitutes a dangerous situation. This value then became the upwards threshold and was tweaked slightly over time.
Deciding against keying into acceleration of threat
Originally, I thought that it would be really cool to have different changes in music based on how fast the context pool changed. If we keyed into the acceleration of threat, we could accent a slow intensity ramp differently vs. a fast build. But when considering budget and time constraints, it became apparent that we didn’t realistically have the resources to support that level of detail. Additionally, it would have complicated the writing process. So to simplify things, we used the raw threat value in the context pool instead of the delta of threat.
The importance of the decay rate
When imagining how the system would work, it became apparent that once an enemy added its threat value to the context pool, we couldn’t just remove it when they died. Even though technically dead enemies are no longer threats, if we removed them, the context pool could spike or plummet so often that it would render much of our data unusable.
The decay rate also served as a control against sudden loss of threat within a group of enemies. This meant that in a situation where 3/4 of an enemy group died within a small window of time (not uncommon in 4-player Co-op), we would not lose the combat music.
The converse situation where enemies gradually add threat over time was also offset by the decay rate. If the rate of incoming enemies couldn’t exceed the decay rate for an entire combat encounter, then the player was experiencing a relatively relaxed fight and didn’t hear combat music.
Use of directional thresholds
Since a meter provides bi-directional information, we set thresholds that only responded to a specific direction of movement. The threshold at which combat music was activated would only respond to an increasing value. It ignored decreasing values. This helped solve for situations that can produce a “ping pong” effect when the context pool hovers above and below that combat music threshold. Had a single threshold value been used, we would have had rapid messages sent to turn the music on or off. This seemed like the simplest way to bypass that problem without having to rely on timers.
Soft exits
Another benefit of having directional thresholds is that it allowed us to create a soft exit from the combat state. Because we could set the exit threshold lower than the trigger threshold, we created a zone where we could prepare for an exit without having to actually trigger an exit. We used that zone to start fading the music to a lower volume level. Then, if we hovered in that zone long enough or exited downward, we would fade out the combat music more rapidly.
Conversely, if the context pool increased from new threats enough to exit upward, we surged the music volume back up (avoiding awkwardness that can arise when triggering music too soon after exiting a combat state).
The necessity for overrides
Because Borderlands has plenty of boss and mini-boss battles, we had to design a way to set overrides where we forced the system into a combat or an ambient state and ignored the context pool entirely. This also created a requirement for returning the music system to an automated state after player death and other circumstances that could lead to the player exiting the context of a boss fight or scripted moment.
To help put this all together in a visual way, here is a diagram that represents how this would all look if we actually had a visual meter to represent the context pool and all the controls that were mentioned above.
Truth be told, it would have been awesome to have this as a visual meter in the game, but the time to build a visual tool just wasn’t there and more pressing needs had to take priority. So we used debug text on screen to balance the system or reveal how it was behaving while testing the game.
A Story of Successful Contextualization
Before wrapping this article up, I have a story that I’d love to share. The experience I had highlights why I strongly feel that when we get context right, we provide some of the best experiences that modern games can offer.
So, we were a few weeks out from ship on Borderlands 2 and I had loaded up the Crater Lake level to run a balancing pass on some combat music parameters. I was balanced to the recommended level for this map with average gear and proceeded to wade into combat atop a long elevated path with a ravine of lava to my right and a high mountain wall to my left. I had a safe distance to travel before encountering any enemies and began to run down the path, deciding to just barrel right into them.
In my first wave of enemies, I fought some run of the mill bandits who did some damage, but really were nothing to worry about. I dispatched them easily – no combat music had played yet, good! It was working!
The second wave had spawned by now, and as I was killing off one of the last bandits from this group, more powerful enemies entered the fight before I could completely finish the wave.
While I was trying to get rid of a particularly obnoxious shock nomad, I accidentally shot the helmet off a Goliath, who then began to perform his rage transformation.
At this point, detecting the increased threat value from the now raging goliath, the music system kicked in and I started to get a little more pumped/worried. As the music surged, the Goliath began to charge towards me, bellowing insults. I started back-pedaling furiously as I now was in real danger of dying. Having not paid attention well enough, I emptied a clip in to the Goliath and went to reload… only to hear an empty *click click* from my gun… to which my character suddenly spoke “Oh no… not NOW!!!” as the Goliath closed in roaring and finished me off.
None of it was staged, all of it was emergent gameplay and I had not expected any of it when I started walking towards the enemies. I didn’t even mind that I died, I just thought… “Man, that was awesome!”
That moment has stuck in my memory ever since.
In conclusion…
As artists, we almost never reach the point of perfection we desire before we have to release our work into the wild. This is definitely true for me when it comes to the music system for the Borderlands series, but I’m happy that even under aggressive timelines and other constraints, we were able to take a small step in a new direction. I’m excited to improve the system further in the future and to also experiment with applying this technique to non-music related areas.
It’s an exciting time for games and I’m eager to see how other teams tackle the challenge of building emotionally-aware systems over the next several years!
Thank you for reading!
-Raison Varner
Raison Varner is a composer and sound designer at Gearbox Software. He was the audio lead for Borderlands 1 and 2 and is a contributor to the sound track for both games. Raison can be contacted via Twitter (@raisonvarner) and his music, including his contributions to the Borderlands soundtracks, can be heard on Soundcloud (www.soundcloud.com/rvarner)
Hi Ariel,
Thank you, thank you for the detailed explanation of your and the Gearbox group’s design of this system. I’ve played BL 1 and a little bit of 2, but it was before I started really paying attention to sound/music systems. I’ll have to go back and play BL2 to enjoy this more deeply.
Thanks Dave, glad you liked the article. I know it might be confusing, but Raison Varner is the author of this article. I just posted it on his behalf. Raison is the man to thank!
I’m a big fan of Borderlands 1-2. I have all DLCs for BL2 hehe, further more im an aspiring sound designer myself.
i can honestly say that your work is really inspiring for me.
that was one of the most interesting article i’ve read for a long time. thank you for that. really looking for BL3
I really enjoyed this article and thoroughly enjoyed the ending sequence; those moments where everything feels so perfect that it must have been scripted are wonderful to encounter.
I understand the memory limitation issue and its implications on un/compressed audio. I was wondering how large a factor load times are and whether that will remain an additional thorn in our side even when memory budgets increase to the point where uncompressed assets become viable.
Thanks again for taking the time to write the article Raison and thanks for posting it Ariel.