Posts Tagged ‘c’

PCM Audio | Part 3: Basic Audio Effects – Volume Control

January 12th, 2010

So now we know what data is stored in a PCM stream, let’s look at some real waveform examples. The easiest is a simple sine wave:

sine wave

Now if we “amplify” that wave by 5, we’d get a much louder sound, represented by a wave that looked like this:

sine wave times 10

So if you want to increase the volume of your PCM stream, just multiply every PCM value by some number. If we had 2048 bytes of audio (remember… that’s 1024 samples since each sample is two bytes), we could amplify the stream with this type of code:

int16_t pcm[1024] = read in some pcm data;
for (ctr = 0; ctr < 1024; ctr++) {
    pcm[ctr] *= 2;

Volume control is almost that simple. There's two catches.


Clipping occurs when your resulting value increases above the maximum value for a sample. So since we're dealing with signed 16 bit integers our maximum positive sample is 32767. If we have a PCM sample value of 5000 and we multiplied it by 10, the resulting value is -15536, not the expected 50000. When clipping occurs, you end up with noise in the audio. You should always check to see if the result of your multiplication would cause clipping, and if so, set the value to 32767 (or -32768) instead.

So our code above becomes:

int16_t pcm[1024] = read in some pcm data;
int32_t pcmval;
for (ctr = 0; ctr < 1024; ctr++) {
    pcmval = pcm[ctr] * 2;
    if (pcmval < 32767 && pcmval > -32768) {
        pcm[ctr] = pcmval
    } else if (pcmval > 32767) {
        pcm[ctr] = 32767;
    } else if (pcmval < -32768) {
        pcm[ctr] = -32768;

Volume Is Logarithmic

The other catch is that volume as perceived by humans (measured in decibels) is logarithmic, not linear. Your first instinct would be to think "Well if I wanted to double the volume, I should just multiply the samples by 2." Unfortunately, it's not quite that easy.

Multiplying a value by 1 will obviously give you no amplification. So to decrease volume, you would multiply by a value less than 1 and greater than 0. To increase volume, multiply by a number greater than one. Unfortunately, I didn't pay enough attention to logarithms in school, so I don't have a clever answer as to how to implement a proper volume control, but I've found that this function works pretty well:

int some_level;
float multiplier = tan(some_level/100.0);

If some_level is set to a value between 0 and 148 or so, this will give you a rather linear sounding multiplier. 79 is almost a multiplier of 1 (no amplification). It is far -- really far -- from perfect, but it worked well enough for my needs of implementing a volume slider. Graphing that function from 0 to 148 gives you this:

volume multiplier

So to set an appropriate level, now we have a volume slider at 39 (roughly half volume):

int16_t pcm[1024] = read in some pcm data;
int32_t pcmval;
uint8_t level = 39; // half as loud
// uint8_t level = 118 // twice as loud (79 * 1.5)
float multiplier = tan(level/100.0);
for (ctr = 0; ctr < 1024; ctr++) {
    pcmval = pcm[ctr] * multiplier;
    if (pcmval < 32767 && pcmval > -32768) {
        pcm[ctr] = pcmval
    } else if (pcmval > 32767) {
        pcm[ctr] = 32767;
    } else if (pcmval < -32768) {
        pcm[ctr] = -32768;

I wasn't able to find a simple logarithmic slider example, so if you have one, please post in the comments. I'd love to replace my hack.

Using some simple algorithms and that function above, you could easily implement a fade-in/out effect on PCM data by stepping through all 148 possible values over a period of time. And don't worry, we'll get to "time" later in the series.

That's pretty much all there is to know about volume, in the next part of the series, we're going to discuss mixing two streams together to create one stream.

General , ,

PCM Audio | Part 2: What does a PCM stream look like?

January 9th, 2010

In Part 1, we looked at how a PCM stream is described. Once you know all of the parameters for your PCM stream, we can examine the data and put it in memory as useful data.

So, let’s assume we have a file that contains signed 16-bit little endian mono PCM. That means that data in the file is just a collection of 16 bit integers. Each integer represents one sample. So the first 9 samples in the file could be:

|  500 |  300 | -100 | -20  | -300 |  900 | -200 |  -50 |  250 |      

Each of those integers is stored in the file as 2 bytes (16-bit), so the 9 samples above take up 18 bytes of space. The value of each sample, obviously, can range from -32768 to 32767. If you take those samples and plot them on a graph, you’ll end up with a visualization of the waveform for the audio that you see in your music player.

If we wanted to read that into an array in C, we would do something like this (obviously this is pseudo-code):

FILE *pcmfile
int16_t *pcmdata;
pcmfile = fopen(your pcm data file);
pcmdata = malloc(size of the file);
fread(pcmdata, sizeof(int16_t), size of file / sizeof(int16_t), pcmfile);

Of course, if you’re dealing with large files, you probably shouldn’t read the whole thing into memory. You should buffer the data and read it in chunks at a time.

If you take that data and send it to your sound card, you’ll hear the sample being played. However, the sound card will require you to know the sample rate. If you have an 8kHz stream and tell the sound card to play it at 16kHz, it’s like playing a 33.3 RPM record at 45 RPM. For the younger crowd out there, that means it will be too fast and it’ll be high pitched… think Alvin and the Chipmunks here.

Since this is a description of the waveform, a stream of all zeros would be silence (a flat line if you graphed it).

I haven’t really explained what those samples actually MEAN though… just what they are. It will be incredibly obvious what those samples mean starting in the next post, when we get to the fun stuff: basic audio effects processing (don’t get scared… it’s actually really easy).

General , ,

PCM Audio | Part 1: What is PCM?

January 8th, 2010

It’s been a long time since I posted anything. Most of my free time has been spent working on my ventrilo client for linux project. Of course, that project adds tons of things to discuss, such as how PCM audio works. I’m going to make this a multi-part series, because there is so much information to discuss.

When I first started working on that project, I knew nothing about how audio worked. I knew a little bit about encoders and decoders, but not really the inner workings. What are they encoding/decoding? It turns out, that the answer is PCM (pulse control modulation) audio. After messing with PCM for a few months, there are a lot of things that are painfully obvious now that were confusing. This guide is meant to be an introduction to at least give you the working knowledge you’ll need to ask proper questions and perform simple tasks. So let’s get started…

If you’ve ever used a computer MP3 player, you’ve probably seen those options to display the waveform of the audio or the little bars that pop up and down showing you treble and bass levels. What those are measuring is the PCM audio as it plays it. So what does all that crap mean?

Let’s start with the basics. There’s five terms that are important to know for PCM:

Sample Rate

Real actual audio (like someone talking to you in person) is transmitted as a wave. PCM is a digital representation of that audio wave at a specified sample rate. The sample rate is measured in Hz (cycles per second) and more often in kilohertz. So when you hear someone talk about about 128kHz vs. 160kHz audio, what they’re talking about is the sample rate. If you’ve ever done integrals in calculus, it’s a lot like that. The higher the sample rate, the better your quality (at the cost of size). There is no guessing here. You need to know what the sample rate is.


Whether the data is signed or unsigned. It is almost always signed. Treating a signed PCM stream as unsigned will hurt your ears… painfully… (I speak from experience here).

Sample Size

This determines how many bits make up one sample. 16-bit seems to be the most common.

Byte Ordering

Byte ordering refers to little-endian vs. big-endian data. If you don’t know what endian-ness means, you can probably assume little endian. If you have the option to choose endian for your data, you should always choose little-endian.

Number of channels

I’m mostly going to cover mono (1 channel), but multichannel PCM is usually handled by interleaving the PCM samples. Don’t worry about this for now. Once you understand mono, stereo is easy.

Add those five things together and you’ll come up with a description of a PCM stream. For example: signed 16-bit little-endian mono @ 44.1kHz. In order to actually play audio, you’ll need to know those 5 things.

Various sound devices support various types of streams, but there’s usually a set list of sign, sample size, and endian-ness options. Different APIs use different constants to specify, but usually you’ll see them as something like S16LE (signed 16-bit little-endian) or S32BE (signed 32-bit big-endian) and so on.

In my next post, I’ll go over how those are represented in a PCM stream.

General , ,

PulseAudio: An Async Example To Get Device Lists

October 13th, 2009

I have a love/hate relationship with PulseAudio. The PulseAudio simple API is… well…. simple. For 99% of the applications out there, you’ll rarely need anything more than the simple API. The documentation leaves a little to be desired, but it’s not to hard to figure out since you have the sample source code for pacat and parec.

The asynchronous API, on the other hand, is really complex. The learning curve isn’t really a curve. It’s more like a brick wall. Compounding the issue is that the documentation is atrocious. If you know exactly what you’re looking for and if you already know how it works, the documentation can be helpful.

More importantly, simple example code is nearly impossible to come by. So, since I took the time to figure it out, I figured I would document this here in the hopes that this little example will help someone else. This is not production ready code. There’s a lot of error checking that’s not being done. But this should at least give you an idea of how to use the PulseAudio asyncrhonous API.

Update: I spoke with the PulseAudio team and they encouraged me to put this source code on their wiki. So now you can find it at the main PulseAudio wiki:

Read more…

General , , ,

Detecting Xlib’s Keyboard Auto-repeat Functionality (and how to fix it)

June 23rd, 2009

If you’ve ever messed around with listening for XWindows keyboard events, you may have noticed something that’s odd. XWindows has a very strange way of dealing with keyboard auto-repeating. Let’s take a scenario:

Hold down the “h” key on the keyboard for a few seconds and you’ll get something that looks like this:


Now, if you try this and watch carefully, you’ll notice a couple of things. The first thing to notice, is that after the first character, there is a slight delay before it starts repeating. That delay is about a half of a second. After that initial delay, it repeats every 30ms. These delays are configurable in X. It may be somewhere in your window manager control panel, but the xset command will tell you everything you need to know:

$ xset q
Keyboard Control:
  auto repeat:  on    key click percent:  0    LED mask:  00000000
  auto repeat delay:  500    repeat rate:  30
  auto repeating keys:  00fffffffffffbbf

Now, when you grab the keyboard in X using XGrabKeyboard or XGrabKey (so you’ll get the keyboard events), something strange happens with all those h’s. When holding down a key, you would at first expect to see 2 events. The KeyPress event and the KeyRelease event. But in order for auto-repeat to work, there must be more. For each auto-repeating character, you should get an additional KeyPress event. This solves the auto-repeat problem, because you can write some simple code that can easily detect auto-repeat and ignore it if you just care if the user is holding down a key.
Read more…

General , , ,