Video Link: A Guide to Digital Audio

So far we have only looked at how sounds work in the “real world;” we’ve looked at sounds in the form of pressure waves in the air, and in the form of analog electrical signals. We have not yet looked at how sounds are represented in the computer, in their digital, numerical representation. Digital sound behaves in more or less the same way as real-world, “analog” sound, but there are still a number of special considerations that apply, so it is worth examining the basic ideas behind it.

The defining characteristic of any kind of digital data, be it text, pictures, or movies, is that it is made of a bunch of numbers. Numbers are all that computers know how to work with. When computers work with audio, the situation is no different: they must figure out how to take the continuous time domain waveform of a sound and reduce it to a series of numbers.

They accomplish this by “sampling” the waveform. What this means is that, when you record an audio signal into your computer, it captures it by measuring the instantaneous amplitude of the waveform at regular intervals. These individual measurements are called “samples.” This process of sampling turns the continuous, analog waveform into a numeric, “digital” approximation that looks a lot like a staircase. Figure 1.2 illustrates the effect.


The numeric value of a sample represents its amplitude. One of the limitations of digital systems is that they have a sharp, absolute limit on the maximum amplitude of the signals that can be represented; the computer will only count so high. Any amplitudes that are higher than the maximum countable amplitude will simply be “clipped” off. As you might guess, digital clipping generally sounds quite bad, and it is to be avoided in most circumstances. Whenever you are working with digital audio, you must make sure that it never exceeds the maximum digital amplitude.

Sampling Resolution

Besides clipping, the process of analog to digital conversion can have a number of other detrimental effects on the quality of audio. Furthermore, processing audio when it is in digital form can further degrade the quality, due to rounding errors in the numerical digital processing algorithms.

There are two attributes of a digital audio system that determine its fidelity: sampling rate and sampling resolution. If both of these attributes are sufficiently good, then digital recording and processing will create little or no audible degradation of the sound quality.

The sampling resolution of a system is the numeric accuracy of the individual samples. The more possible numeric values for a sample, the higher the sampling resolution is. Because computers work in binary, sampling resolution is typically described in terms of “bits.” A 4-bit digital system has 16 possible numeric values for each sample. An 8-bit system has 256 possible values. A 16-bit system has 65,536 possible values, and a 24-bit system has 16,777,216 possible values.

A low sampling resolution will degrade the quality of the audio by introducing “quantization noise.” Quantization noise is the audible artifact that results from the “rounding errors” inherent in analog to digital conversion. It usually manifests in the form of a low-volume hissing sound, somewhat similar to the sound heard in quiet sections on analog tapes and vinyl. This sound will mask subtle details in the sound and make sufficiently quiet sounds inaudible.

Dynamic Range

The higher the bit resolution of a digital system is, the quieter the quantization noise is. The level of the quantization noise is what determines the system’s total “dynamic range;” that is, the ratio between the quietest possible sound and the loudest possible sound. The quietest possible sound is restricted by the level of the quantization noise, and the loudest possible sound is restricted by the threshold for clipping.

A digital system has a dynamic range of 6dB times the bit resolution. In other words, each bit of sampling resolution adds roughly 6dB of dynamic range. Thus, the dynamic range of a 16-bit system is about 96dB. The dynamic range of a 24-bit system is about 144dB, larger than the dynamic range of human hearing.

Volume levels in the digital world are measured in “full-scale decibels,” or dBFS. The digital full-scale measurement system measures peak volume, not average volume. The 0dB reference point is set at the highest representable amplitude; in other words, 0dBFS is the loudness of the loudest possible sound. All other volume levels are negative; a sound with a level of -6dBFS has a peak level 6dB below the digital maximum, for instance.

Standard Sampling Resolutions

There are two commonly used sampling resolutions: 16-bit and 24-bit. 16-bit is the resolution of audio CDs and most MP3s. It is typically used for the distribution of mixed-down music. Its dynamic range is sufficient for the vast majority of music.

In the actual mixing process, it is preferable to use 24-bit. 24-bit has more dynamic range than 16-bit. While the difference doesn’t matter much for finished mixdowns, it can make a difference when in the mixing process, because the extra dynamic range gives some “slop room,” allowing for the rounding errors introduced by digital processing to occur without significant audible effects.

Some DAWs also have a “32-bit” resolution. This usually refers to the so called “floating point” representation of digital audio, as opposed to the usual “fixed-point” representation, which is what we have discussed so far.

32-bit floating point and 24-bit fixed point are, in a certain sense, the same thing. Without going into the technical differences between the two, 32-bit floating point audio has the same dynamic range as 24-bit fixed point audio, with the added advantage that audio above the 0dBFS threshold will not clip. Instead, the computer will effectively take bits from the bottom and add them to the top. This raises the quantization noise, but also raises the maximum representable amplitude, resulting in a net effect of the same amount of dynamic range.

It is generally not a good idea to take advantage of floating point’s ability to exceed the 0dBFS ceiling, because even in DAWs that fully support floating point, many plugins will convert their input audio to fixed point internally; when they do this, the audio will clip. So, even if you are working in floating point, it is best to act as if you were not, and keep all levels below 0dBFS at all times.

Sampling Rate

The sampling rate of a digital system is the number of samples per second that it uses to represent the audio. For instance, audio CDs uses 44,100 samples per second. Sampling rates are measured in hertz (Hz), just like frequencies. Thus, the audio CD sampling rate might be written as 44,100Hz, or 44.1kHz.

Intuitively, you might expect that a higher sampling rate would yield higher quality audio, and this intuition is correct. Specifically, sampling rate affects the “frequency response” of the digital system; that is, the range of frequencies that it can represent.

Digital systems have no minimum representable frequency; they can go all the way down to 0Hz. They do, however, have a maximum representable frequency, and it is determined by the sampling rate. Specifically, the maximum representable frequency is half of the sampling rate. Thus, with a sampling rate of 44.1kHz, the maximum representable frequency is 22.05kHz. This maximum frequency is referred to as the “Nyquist frequency.”

The most common sampling rates are 44.1kHz, 48kHz, 96kHz, and 192kHz. The lowest of these, 44.1kHz, is typically used for distributing finished mixes. Since this sampling rate can represent all audible frequencies, you might wonder why anyone would ever use a higher sampling rate.

The answer is that, besides allowing higher frequencies to be represented, higher sampling rates can also make certain audio processes sound better, with fewer sonic artifacts. Such processes include equalization and compression, certain aspects of synthesis, such as filtering and waveform synthesis, and certain aspects of sampling, such as repitching.

The drawback of higher sampling rates is that they imply higher CPU usage. For instance, going from 48kHz to 96kHz, you can expect most processes to use twice as much CPU, because they are processing twice as many samples in the same amount of time.

Video Link: A Guide to Digital Audio

submitted by /u/pauseplayrepeatcom
[link] [comments]

Go to Source
Author: /u/pauseplayrepeatcom

By admin