It’s been a little while since the last audio programming post here at Creating Sound. I spent some time trying to decide on a good, straightforward way to introduce DSP effects programming without having to deal with the complications of creating plug-ins, or having to statically write the result to an output file (boring!). The clear answer was to use Portaudio to interface with the PC audio system.  This open-source, free API was mentioned (and is listed) in the Audio Programming Primer. It’s simple to use, cross-platform, runs in real-time, and doesn’t require a host application the way a plug-in would.  That will hopefully make this a fun and enlightening look at DSP, and I’m very excited to be working on bringing this to you!

The purpose of this series is educational in introducing basic DSP effects.  I’ll be discussing the theory and implementation of the effects that you can then experiment with and extend upon.  To facilitate this, full source code of the projects will go up on Github for both Mac (using Xcode) and Windows (using Visual Studio) that you can clone, fork, or just download and work with.  The readme file included in the repositories have additional information on building and working with the projects, including the external Portaudio dependencies.

So it’s with this that we begin part 1 of this series!


We begin by looking at delay, a basic and fundamental audio effect that forms the foundation of many other more sophisticated effects such as reverb, filtering, chorus, etc.  Delay is an incredibly useful (and widely-used) effect on its own, however.  It is implemented simply through the use of a circular buffer that is initialized to 0.

zerobuff

Each cell in the buffer represents one sample of audio.  Therefore, the length of the buffer is directly related to the length of the delay, which further depends on the sample rate.  A delay of 1.5 seconds at a sampling rate of 44.1kHz, for example, equates to 66150 samples (1.5 x 44100).  To keep track of our position in the buffer, we use an integer variable whose value is the current index of location in the buffer.  This variable advances by 1 each time we do a combined read/write into the buffer, and wraps around to the beginning when it reaches the end.

When new audio input is acquired, instead of writing it directly to the output, we write it into the delay buffer, which means that the output is read from the delay buffer.    The following diagram illustrates this process.

buffreadwrite2

(Compare this diagram with the code given in the audio callback routine further below.)

Initially all the values read from the buffer will be 0, effectively outputting silence, but when the position locator wraps back around to the beginning it starts to read the samples that were written to the buffer from the incoming audio.  We can see how this delays the output of the audio directly proportional to the length of the buffer.

This process is handled by the CSDelay class in this project.  The read/write operations look like this:

inline float read () const
{
    return mBuffer[mPos] * mGain;
}
inline void write (const float value)
{
    mBuffer[mPos] = value + (mBuffer[mPos] * mFeedback);
    incr();
}

Here we can see the addition of a few extra variables, mGain and mFeedback (mPos is the position index in the buffer).  mGain defines the amplitude level of the delayed signal.  If set to 1.0, the delayed audio will have the same volume as the original.  With mFeedback we control how much of the delayed audio is fed back into the buffer, essentially delaying the delayed samples.  If this value is equal to 1.0, oscillation will occur and the audio, once started, will continue to delay indefinitely.  If it’s greater than 1.0, it will cause overflow as the output continues to grow and grow.  Warning: don’t do this as it can potentially damage your speakers!  

The incr() function call is given below, and simply advances mPos by 1, or wraps it around to the beginning if the end of the buffer is reached.

inline void incr ()
{
    mPos = (mPos < mBufferLength ? mPos + 1 : 0);
}

To process the dry/wet mix of the delay effect, it’s a simple matter of multiplying the dry value against the original sample and the wet value against the delayed sample.

inline float processSample (const float drySample, const float wetSample) const
{
    return drySample * mDry + wetSample * mWet;
}

The value it returns is the final output sample that we send back to the Portaudio engine.  With the basics of the delay implementation out of the way, our final task will be to look at where the audio processing takes place: in the audio callback routine.

The Portaudio callback function we implement is asynchronous (non-blocking, meaning it runs continuously in the background as long as we tell it to continue) to which we pass it our CSDelay instance to process the incoming audio.  This function is launched automatically by Portaudio once it initializes (we pass the Portaudio instance the address of the function we wish to use as the callback, aka a function pointer), and runs at high-priority to ensure audio drop-outs do not occur.  As such, the body of the callback function needs to be kept as lightweight as possible.  We define the entirety of this function ourselves, but its signature (i.e. the parameters) are fixed by Portaudio.  Our callback routine implementing the delay effect looks like this:

int audioCallback (const void* input, void* output, 
        unsigned long samples, const PaStreamCallbackTimeInfo* timeInfo, 
        PaStreamCallbackFlags statusFlags, void* userData)
{
    const float *in = (const float*)input;
    float *out = (float*)output;
    CSDelay *delay = (CSDelay*)userData;
    float delaySample;
 
    for (int i = 0; i < samples; ++i) {
        // left channel
        delaySample = delay->read();
        delay->write(*in);
        *out++ = delay->processSample(*in++, delaySample);
 
        // right channel
        delaySample = delay->read();
        delay->write(*in);
        *out++ = delay->processSample(*in++, delaySample);
    }
 
    return paContinue;
}

for (int i = 0; i < samples; ++i) {
// left channel
delaySample = delay->read();
delay->write(*in);
*out++ = delay->processSample(*in++, delaySample);

// right channel
delaySample = delay->read();
delay->write(*in);
*out++ = delay->processSample(*in++, delaySample);
}

return paContinue;
}

In it we are given a set number of samples that we are required to process.  Since we’re dealing with stereo input/output, we need to apply the delay twice — once for each channel.  The stereo data in this case is stored in interleaved format.  As opposed to de-interleaved where each channel has its own buffer, both channels appear in the same buffer with the samples alternating L R L R L, etc.  We need to mirror this behavior in our internal delay buffer in CSDelay,  mBuffer.

Remember back to the beginning where we determined the length of our delay buffer to be the product of the delay time and the sampling rate?  In a stereo situation, we need to double this length in order to accomodate the additional channel of audio.  That allows us to process the delay as shown above, once for the left channel, then with the next sample for the right channel.

This concludes our look at CSDelay.  Some of the finer details of interacting with Portaudio are documented in the source code included in the repositories.  Regarding DSP, there is much that can be done to expand upon this effect to achieve interesting sonic results.  Some suggestions are given in the readme file included in the repositories.  Here are the links for the Xcode project and the Visual Studio project on Github:

CSDelay for Mac on Github: csdelay-mac
CSDelay for Windows on Github: csdelay-win