Here is a quick tutorial that shows how I did two of the sounds in Retro City Rampage (RCR). Over 95% of the sounds are synthesized using the original Nintendo Entertainment System (NES) Ricoh 2A03 soundchip specifications. This means that basically all of the sounds are generated using two pulse waves, a triangle wave and a noise channel. For the rare instance in which sampled sound was needed, we simulated the use of the DPCM sampled sound channel of the NES by downsampling to 4kHz or lower. For the music, we frequently used the sampled sound channel for snare drums and kick drums but the music is entirely a different story. If you are interested in the music of RCR, you can have a listen and look on our Bandcamp page. I have also written a book chapter on chiptunes which further describes the techniques used for the music of RCR for the upcoming Oxford Handbook on Interactive Audio.
When RCR was first in development, the tools to generate actual NES sound files (or .NSF files) were very limited. The composer Jake “virt” Kaufman had already produced many songs in the Impulse Tracker (.IT format) in the NES style and, due to the ease of implementing the .IT format with OpenMPT being open source, it was decided that we would use the .IT format for the game.
When people think of classic video games, they think of “bleeps and bloops,” but they often don’t realize how much effort actually went into trying to get the soundchips to synthesize and simulate the actual sounds. For example, explosions were usually simulated by noise waveforms which were generated by a pseudo-random number generator. When I decided that I would recreate all the sound effects using synthesis rather than just modifying sound effect samples from old games, I quickly found that I was starting a long journey into classic synthesis techniques.
I’ll give two sound examples that use different approaches for creating the sound design for RCR. The first example uses an analysis of the spectrograph of a sampled sound and translates it into a stream of notes and pitch bends in an attempt to recreate the original sound of an “over here” whistle sound. The second sound demonstrates an analysis of an existing NES sound and uses tools to disassemble the note data from the original cartridge ROM.
Synthesis by Analysis
For our first sound, I wanted to get an authentic-sounding whistle when the Jester signals for the bus to drive into the bank in one of the opening scenes. I could have recorded my own whistle but I thought that I would use the very handy FreeSound.org to find a good sound. The sound I decided to model my whistle after was this sound.
Now that I had the sound, all I needed to do was “NES-tify” it. I could have sampled the sound on the DPCM channel (giving Benboncan credit due to the Creative Commons license), but the frequency content of this sound is quite high and the aliasing due to the low sampling rate would likely have made a mess of it. With our sampling rate of 4kHz, our Nyquist frequency is 2kHz, and we can see from the spectrogram that there are a lot of frequencies (including the low energy of the fundamental) which exceed 2kHz. The way to do this is to look at the blue line while referencing the Hz (Hertz) axis on the left and see that it ranges between roughly 2000Hz and 3000Hz. Even when we consider this, we’re ignoring the energy in the spectrum which is higher than the fundamental. If you’re not used to reading spectral displays then the following resource might help you on “How do I read a spectrogram?”
To NES-tify the sound, I decided that I wanted to have a look at the spectrograph of the sound to see what the frequencies were in the sound. I used Wavelab at the time, but for the purposes of the article, Wavesurfer is free and specially designed for spectral analysis. It is both a Mac and PC program which is quite convenient as I work on a Mac running both OS X and Bootcamp.
From here, one does a formant analysis and the program will detect the contours of the changes in the resonant areas of the wave which correspond to formants. From the article on reading a spectrogram above we can see formants are defined as the following:
A formant is a dark band on a wide band spectrogram, which corresponds to a vocal tract resonance. Technically, it represents a set of adjacent harmonics which are boosted by a resonance in some part of the vocal tract.
For our purposes, it basically helps track the change of the frequency of the whistle for us. In our example the formants which we’ll be using to track the pitch are the blue and green lines. All I needed to do now was to convert the frequencies into pitch values using a frequency to MIDI note chart and then play around with the pitch bend values until I found a result that I was happy with. The additional element here is the first “whoosh” of the whistle which can be approximated by using a bit of the noise channel at the beginning. The resulting .IT file can be found here. As we typically do with analysis and resynthesis, we can check our resulting spectrum against the original. We can definitely tell the difference between the two, but to our ears it should be a pretty good match given the limitations of the NES sound.
Here you can see the whistle used in the game:
(see @ 3 mins 3 seconds)
This example shows how fun it can be to reconstruct a sound from its elements which is a good exercise for any sound designer!
Synthesis by Reverse-Engineering
For the second example, I’ll demonstrate how I used the data in an .NSF as a basis for my own sound design. The duck hunt dog laugh is an NES iconic sound and we wanted to use a similar sound in RCR when, well a dog pops out and laughs at the action or when the homage to “toasty” appears onscreen. I’ll admit that I actually did this sound for the game using the same method as above but for the sake of example, I’ll show how to use an .NSF to inspire a sound design.
The first thing to do is to find the .NSF for Duck Hunt. A good spot is Zophar’s Domain.
The .NSF not only contains the music but also the sound effects for this game. The format itself is basically a “clean rip” of the sound code separated from the rest of the game code into a common format. When the file is played, the program is actually emulating the required parts of the NES to reproduce the sound. So, although there isn’t a consistent data format to where the notes are or what effects are being used, eventually it all comes down to modulating registers on the sound chip so this is what we can capture to find out which notes the NES is playing.
A cool recent development was the ability to load .NSF files into FamiTracker so that one can now see what the composers and sound designers were doing when they were creating audio on the NES. This simply captures the notes and related data as the sound is playing. It doesn’t capture notions of “patches” or “instruments,” but is a great way to see at a low-level what composers were doing. As an example, it won’t notate a vibrato as a held note with a vibrato of a certain range and depth, but it will show exactly the frequencies that it is modulating the channel to which still gives a good idea of what the composer had done. For those people that are coders, it is like looking at the machine language disassembly of a program instead of the C++ source code. If you would like to use the modified version of FamiTracker which allows importing NSF files you can download it here. For people that use just OS X, you can run FamiTracker just fine using Darwine.
SIDENOTE: Setting up Darwine isn’t too difficult but it is beyond the scope of this article. However, know that I have tested FamiTracker in Darwine and it runs well. Another handy thing is that OpenMPT also runs in Darwine as well. Rather than get into too many details with FamiTracker in the article, we can just download NSFPlay which uses the same code.
We open the Duck Hunt NSF from within NSFPlay and then open the keyboard window to see the notes as they are being played. For the “laugh,” we just need to go to song #22 and we can hear it. It goes quite fast so we can use the time multiplier and slow down the sound as much as we’d like. If you’re interested in recording NSFs as a WAV, then you can also do this in NSFPlay but to be somewhat safe from copyright trolls, you should own a copy of the original game as well.
From here, we can do an analysis of the notes as they are being played and which NES channels are playing them. You can download the .IT file here. We can then enter the notes into OpenMPT and we’ll have a result similar to:
You can also have a look to see when it is used in the game as well:
(see @ 53 seconds)
Although I am showing how to model sounds from existing sounds in these two examples, almost all of the sounds in RCR were done entirely from scratch. Similar to working with film or with other games, I would have a look at the scene where the sounds would occur or imagined the scenario in which the sounds might be used and would design the sounds entirely in OpenMPT. I would try different approaches with modulating the different channels and eventually arrive at something that was a good fit. There are hundreds of different sound effects in the game and hopefully this article gives a glimpse into the amount of effort and detail that was spent on the sound effects over the four years that I was involved with the project.
If you’re interested in learning more about the sound and music of Retro City Rampage a good starting point is our music page at: http://RetroCityRampage.com/music.php