Nicolas Fournel is an audio programmer, and founder of “tsugi” (tsugi-studio.com), a company based in Japan that does R&D (including tools development) for video games, animation, music, and other disciplines. I’m excited to share this interview with Nicolas about his work and the fascinating area of audio programming.
What is your background and education?
I’m originally from the North-East of France (more precisely from the region where they make the Champagne) but I studied in Paris where I obtained a diploma from a computer engineering school with a specialization in artificial intelligence, digital signal processing and automation. My final project was actually sound-related as it was the simulation of the singing voice. It was using different methods to analyze and synthesize vocal signals. The main system I built was called DiVA (short for Digital Vocal Artist). It was using a DSP chip from the Motorola 56000 family for the sound synthesis and was connected to a microcontroller responsible for parameter input, display, and MIDI interfacing. A few knobs were controlling the formants frequencies and bandwidths, the amount of vibrato, etc. Of course at the time that looked a bit more impressive than it would now, as resources about this kind of things were pretty much nonexistent.
How long have you been involved in audio programming? What games have you worked on?
I have been involved in audio programming professionally for more than 20 years, although it started many years before that as a hobby (I guess I’m showing my age here ;-)). The first commercial products I worked on were sample editors, small MIDI utilities, audio file converters etc… They were developed on Amiga and everything was coded in assembler and blazing fast! Then I started working on commercial PC software around the time of Windows 3.1 and focused more on sound design-related applications. My main project at the time was a modular software synthesizer called Virtual Waves. It was pretty good at generating really weird sound effects so many clients started to come from the game industry and I worked closely with some of them.
I made the jump to the game industry itself when I took a job at Factor 5 in California. There, I worked on the Star Wars: Rogue Squadron series as well as on the MusyX audio middleware and on some DSP for the GameCube itself, as Factor 5 was working closely with Nintendo on the audio system of that console. I remember that on “Rogue Squadron: Rebel Strike” I managed to convince the game designers to sit down with them and to add a second – simplified – geometry of the levels for the audio in the editor, tagging it with the appropriate materials. The reverb I developed for MusyX was a variation of the Schroeder reverberator model (not enough cycles for a convolution reverb!), so including both comb filters for the early reflections and allpass filters for the dense reverberation. In the code, I was performing a very crude audio ray casting using that simplified geometry and the material absorption coefficients. Then I calculated the parameters of the filters at run-time from that. At the time, it was really awesome as an audio programmer to be able to do this kind of stuff! The people at Factor 5 took the audio very seriously which was great.
After Factor 5, I moved to Konami in Honolulu where I worked on all kinds of games from “Frogger” to “Dance Dance Revolution” and on many platforms. The next stop was EA in Burnaby, and finally Sony in London. In both cases I was working in central services, so your stuff ends up being used in many games and by many teams around the world, sometimes without you even knowing it. For example, I recently was told that I had been credited on the last “WipeOut” for Vita and that was quite a nice surprise because I just wrote a small audio analysis tool for them. But it definitely made my day as I have always been a huge “WipeOut fan”!
What inspired you to work in the audio field, and specifically programming?
I started programming on the Sinclair ZX81, like a lot of French kids from that period I suspect. At the time I was 11 and I was very much into electric organs and synthesizers. My dad was teaching electronics and was very interested in personal computers so he thought it would be great to order the ZX81. We ended up having a sound card for it and I spent a lot of time playing with that. That’s probably how it all started.
While I moved to other computers such as the ZX Spectrum, Apple IIc, etc., I continued to play with the audio, but I was also interested in many other things, especially artificial intelligence. However, when I switched to the Amiga with the famous Paula chip and its four channels of digital audio with modulations, things became more serious. There were all these trackers and other audio programs. At the same time, the electronic music scene was really booming in France so what started as a hobby slowly evolved into a passion and a career.
What programming languages would you say are essential to have? Do they vary from project to project?
Indeed, it depends on the platforms you are targeting, but if you want to work in the game industry on AAA titles on platforms from the big three, C++ is pretty much a given. Then of course something like C# can be useful to quickly write tools. Knowledge of at least a scripting language such Lua and Python is helpful as well. Whatever language you use, it’s important to understand its strengths and its limitations, and to know how they will impact your project.
That being said, I’m not really a huge advocate of a language in particular. Like everything else, you should just use what works best for your project and within your constraints. Sometimes, people who are very knowledgeable about one language tend to use any little trick in the book, which makes it pretty hard to read and to maintain for the rest of the team and ultimately costs more to the company. In most cases, I believe it’s more important for an engineer to have good basics in mathematics, physics, computer architecture and algorithms and to be a creative problem-solver than to be a guru in a given language.
Provide a brief definition of each of these functions:
Oh, I can think of about a gazillion of serious things that should go in there, but since you asked me to be brief…
Do you mostly work with proprietary tools or third-party middleware (like FMOD, Wwise, Miles, etc.)? What are some benefits and challenges of each?
In my career I have exclusively developed and worked with proprietary tools and engines. This is due to the fact that I often ended up working in central services which were big enough to be able to develop their own technology or on R&D departments. That is also the kind of work I was looking for. But of course I have also often examined the features of the available middleware such as Wwise or Fmod. Also a notable exception is actually now since at tsugi we are – among other things – assuring the technical support for AudioKinetic in Japan, so obviously we have engineers who know the ins and outs of the Wwise API and how to use the authoring tool in detail.
One of the major advantages of having a proprietary audio tool and engine is to be able to drive its development in the direction you want, which is adapted to the type of games you are doing and on a schedule that is compatible with your studio’s releases. It’s also easier to interface it closely with the other tools and run-time components of your studio such as the physics engine or the animation system. Finally, when something is not working, it’s easy to just jump into the code and debug, without having to wait for a fix from the middleware provider (although these days you can sometimes get access to the source code, depending on the type of license you have).
However, nowadays developing your own audio engine and associated tool is a big investment. It can be very demanding in terms of human resources and time, not only to develop it but also to maintain, document and update it. Game audio has evolved and it’s not only about playing back a sample anymore, you need to offer a way to script sound effects, to manage 3D emitters and occlusion, to do real-time processing, to have a mixing system, reactive music and so many other things… Also, tools often take a long time to develop and to mature into something that has a usable workflow and is stable. In the game industry where the end product is the run-time, people often underestimate the amount of time required for the development and maintenance of good tools.
So unless you are doing something very specific audio-wise, or you are a big studio with already an audio system in place, it’s probably safer at this point to pick an audio middleware than to start developing your own. Actually, I even know several big companies which had internal R&D audio teams in the past and have recently started using commercial middleware, or at least have given their teams the choice between the in-house tools/engines and a 3rd party offering.
You’re a strong proponent of procedural audio. What can this do to help advance game audio forward, and what role do audio programmers have in it?
Basically, procedural audio is to sample playback what 3D is to sprites. It allows you to deal with that one single asset that will be played differently depending on the context. If you take a 3D object in a game, it will appear differently depending on the rotation, the scale, the lighting, etc. If you wanted to do the same with sprites you would have to draw many of them – from different angles, with different sizes, and colors. And of course you would not be able to recreate the infinity of possible views of that object; you would have to limit yourself to a subset of them.
That’s what happens with samples. Let’s say you want to play an impact sound, you will have to record many of them, with different materials, on different surfaces, with different forces etc. With procedural audio you can just create a model of that impact and synthesize the corresponding sound at run-time after selecting the right parameters. Moreover, if it’s well connected to your physics engine, you will probably be able to generate many contact sounds as well with the duration and intensity you want without having to record anything.
Procedural audio allows the sound to be more reactive and better integrated with the other subsystems of the game. Because the sound is generated at run-time depending on the context, you can create an infinity of variations. Therefore procedural audio helps fighting the repetitiveness due to sample playback and of course also saves memory as you would need to record a lot of samples otherwise to provide even a limited number of variations. That being said, it’s not a silver bullet that will solve all your audio problems. There are a lot of interesting things you can do with other methods to get a rich and interactive audio experience, and it does not apply necessarily to all situations either.
As for the role of audio programmers, since it’s a more technical approach than recording samples, they are responsible for building new, adapted tools and workflows, and for educating sound designers about synthesis. I don’t think they should be solely responsible for creating the models though.
What are some of the challenges in using more procedural audio into video games, and what are some ways we can overcome them?
Procedural audio is still in its infancy. Making the creation of the procedural models easier and accessible to the sound designers is primordial. Focusing more on the final sound than on the production mechanisms will allow us to build higher quality models. Until now, most of the models have been created using a bottom-up approach. Although this makes sense in an academic or research context, it is ill-suited for creative work, especially in a production environment because it requires more knowledge from the sound designer, more time spent building the model, and without offering any guarantee that the resulting sounds will correspond to what the sound designer wanted in the first place. Although they generate recognizable sounds, most of the models created that way are nowhere close to the quality you would expect in a game. That’s why I have been advocating a top-down approach that takes existing sounds, analyze them and create a model from them. That model can then re-synthesize the original sound the designer wanted as well as many variations. However, this is clearly a more complicated approach to develop.
Generally, procedural audio is progressively gaining traction. I think we will soon be in an interesting position when the newest generation of sound designers and programmers – more aware of procedural audio (and certainly more excited about it from my discussions!) – will accede to more senior positions; and when more researchers will have joined the game industry who understand the context in which procedural audio has to work, better tools and models will then start appearing.
You’ve also spoken, and written about, dynamic mixing. How will this make audio engines “smarter”, and thus provide a better sound experience for players?
While at Sony I spent some time thinking about spectrally-informed audio engines: basically, making the audio engine aware of what it is playing frequency-wise and allowing it to make the best decisions for the game at a given moment. The idea is to analyze the audio assets in the frequency domain either at run-time or before-hand on the tool side, and then have the engine keep track of what it is playing and where in the listening space. For any given frame, you end up with a frequency/location map, with a level in dB for each point.
This makes dynamic mixing possible: you can define mix profiles or targets and then have the engine slightly adjust the mix based on what is actually happening in the game. Other interesting features can be implemented such as perceptual voice limiting or audio shaders, for example.
Do you feel there is a mutually creative link between audio programmers and sound designers/ composers? What can we do to help the creative process of sound designers, and what can we learn from them that we can apply to our efforts?
Yes, definitely. It’s often by bouncing ideas between audio programmers and sound designers / composers that you come up with great new features for the game or the technology. Audio programmers often act as the link between these creatives and the other programmers on the game team, especially when other systems are involved. One of their roles is indeed to facilitate the implementation of the sound effects / music (and in addition, they will need a good dose of diplomacy to negotiate these few extra cycles or kilobytes…).
But they also should be here to educate the sound designers about what is possible to achieve with DSP, sound analysis and synthesis, or even just in terms of interface / features for the sound tools. You can’t expect a sound designer to ask for a new feature if he or she does not know what is achievable technically. On the other hand, sitting with sound designers, examining their workflow, and having them describe the issues and bottlenecks is probably the best way for an programmer to understand what the tool should do and how it should do it.
Can you think of any other audio features or technologies that have yet to be implemented in games that can add to the player experience? Or perhaps existing ones that can be improved upon?
We talked about a few of them, like procedural audio for which we need better models and tool suites, and dynamic mixing. Better simulation of the acoustics of closed spaces also comes to mind. At tsugi (www.tsugi-studio.com) this is part of our daily work: imagining how game technology (and especially audio) will evolve and what will be needed. For example, we are already thinking about what will happen after procedural audio or more precisely once most of your assets are patches or models instead of sample files. This brings some interesting challenges and requires new concepts. We also do audio pipeline audits. We visit clients, sit down with them and examine the workflow of the sound designers or the architecture of the audio engine with the programmers. Then we make recommendations about optimizations in the engine, or new features for the tool and we can also implement these changes ourselves if needed.
Although I can’t really go into the details of our projects here, there are a few obvious trends. One of them is that with the ever-growing amount of assets used in games, and the size of the sound databases, we need new ways to browse, to visualize and to select data, as well as to perform quality assurance once the assets are in the game. For example, one of our research projects, tsugi DataSpace, uses audio features extraction and self-organizing maps to visualize thousands of audio assets in a 3D world. It also supports gesture controllers such as the Leap to make the experience even more immersive. Because similar sounding assets are classified together it opens new possibilities, like finding appropriate sounds to layer together for example.
In the same area, i.e. dealing with numerous assets, the availability of full-featured, stable, batch processors is very limited and none of them boasts features dedicated to game audio development, so that’s why we built AudioBot. It offers all the functions of a regular batch processor but can also automatically create files compatible with game audio middleware such as Wwise work units or FMOD projects. It supports proprietary importers, exporters and processors through plug-ins so you can use your studio’s technology, you can edit chains of VST or VAMP plug-ins, it has a command line version to integrate with your audio pipeline etc… It has really been made with game audio people in mind.
But to come back to your question, in general, I believe that any technology which makes either the tool or the engine a bit smarter and therefore improve the workflow of the sound designer or the quality of the run-time would be a welcome addition, thus the importance of audio analysis.
How do you see the demand for audio programmers evolving in the industry? It seems most positions are in mid-size to larger studios. Can smaller companies benefit from an audio programmer on the team?
This is closely related to both the evolution of game audio technology and the definition of the audio programmer. On one hand, as we talked about it earlier, it’s becoming harder for a studio to justify developing an in-house audio solution, because of the complexity and the costs involved. So it would seem to confirm your feeling, that only mid-size or larger studios will be able to do this and actually hire audio programmers. Smaller studios would usually use audio middleware and therefore allocate a generalist programmer to load banks, trigger the sound effects etc.
This brings us to the second point which is the definition of an audio programmer. In a lot of cases, we see job offers for audio programmers which involve very little more than just hooking sounds in the game. But in my opinion this has very little to do with game audio. You could be loading and triggering something entirely different than audio assets. An audio programmer for me is someone who can design audio systems, offer audio-centric solutions to problems, propose audio-related gameplay to the team and so on. So if you take that definition, then yes an audio programmer can definitely add something to any game, even a small one. But of course it’s a question of priorities and most likely the finances will go towards hiring another graphics engineer anyway ;-).
But from talking with recruiters and other people in the industry, it seems audio programmers are still very much in demand, also because there are not so many of them to start with.
What are some of your favorite DSP effects to use and/or design?
There are always new parameters to add or new tricks to use, even for effects as simple as a flanger for example. In that sense, I guess I don’t really have a favorite effect to design because they all offer countless hours of fun! However, I am very interested in effects which first analyze the sound – either by using a FFT, calculating MFCCs or any other feature extraction method – and then process it differently based on its characteristics; basically some kind of audio shaders.
List some of your favorite sound effects that have been procedurally generated/synthesized:
I usually mention the work of Staccato Systems in the early 2000s. They had really nice procedural footsteps and car engines, a long time before everybody. More recently, I worked on impacts / contacts sounds for Sony and got some nice results. Also I have been told that AudioKinetic’s SoundSeed Air has been used in quite a few titles and I like their presets. However, sounds such as impacts / contacts, wind, rain etc., are still relatively easy to simulate.
I have to admit that until now, there hasn’t really be a sound effect for which I learned after the fact that it was generated procedurally and told myself “wow, how the heck did they manage to model that?” I’m really looking forward to this moment though! One of the projects we are currently developing for a client at tsugi could definitely belong to that category… if we ever manage to make it work of course!
What have you learned about audio as an audio programmer?
After all the mathematics have been studied, the filters designed, and the anti-aliasing programmed, there is still a space to experiment, break the rules, and where you can find this ridiculous little trick that will make your engine , plug-in or tool sound awesome. There is still this magic part into it and it’s great. That, and a plug-in with a great GUI will always sound better ;-).