ePanorama.net
[ index ][ back ][ site search ] [ acronyms ] [ discussion ] [ mail to a friend ] [ post message ]

Digital audio page


Index


General information

Digital audio is the most commonly used method to represent sound inside a computer, many audio processing device and modern audio storage devices (like CD, MD, DVD).

Digital audio technology is a method of representing audio signal using binary numbers. An analog audio signal is converted to digital by the use of an analog-to-digital (A/D) converter chip by taking samples of the signal at a fixed time interval (sampling frequency). Binary number are assigned to these samples. This process is called sampling. This means that the audio data is stored as a sequence of samples taken from the audio signal using constant time intervals. A sample represents volume of the signal at the moment when it was measured.

This digital stream of data is then recorded onto storage media (magnetic tape, optical disk, hard disk or computer memory) or transmission path (telecommunication network, Internet, digital satellite, digital TV transmission). Upon playback, a digital-to-analog (D/A) converter chip reads the binary data and reconstructs the original analog signal. This process virtually eliminates generation loss as every digital-to-digital copy is theoretically an exact duplicate of the original.

In uncompressed digital audio each sample require one or more bytes of storage. Number of bytes required depends on number of channels (mono, stereo) and sample format (8 or 16 bits, mu-Law, etc.). The length of this interval determines the sampling rate. Normally used sampling rates are between 8 kHz (telephone quality) and 48 kHz (DAT tapes).

In digital audio, bit depth (like 16 bits, 24 bits etc.) describes the potential accuracy of a particular piece of hardware or software that processes audio data. In general, the more bits that are available, the more accurate the resulting output from the data being processed. Bit depth is frequently encountered in specifications for analog-to-digital converters (ADCs) and digital-to-analog converters (DACs), when reading about software plug-in, and when recording audio using a professional medium such as a digital audio workstation or a Digital Audio Tape machine. Bit depth is the number of bits you have in which to describe something. Each additional bit in a binary number doubles the number of possibilities. When you have a 16-bit system, there are 65,536 possible levels of audio signal. When you have a 24-bit process or piece of 24-bit hardware, there are 16,777,216 available levels of audio.

16 bits is a typical resolution (bit depth) of many digital audio systems nowadays. The audio on CD disk is stored at 16 bit resolution. Modern PC soundcards operate at 16 bits resolution. In some professional applications, there is need for more bits. In some digital professional audio systems you can encounter 20-bit or 24-but resolutions. Some devices can process soemtimes the data more accurately (for example at 32-bit resolution), but the information is generally transferred and conmmunicated at maximum of 24 bit resolution. 24-bits is practically enough for any real life audio application (pretty much on the performance limits of bers A/D converter) and it usually the maximum resolution supported by digital audio interfaces (AES/EBU in professional audio world, S/PDIF in consumer products).

There is universal definition of resolution: Resolution is the ratio between the largest representable signal and the basic ambiguity in representation, whether that ambiguity is due to random, unpredictable noise or quantization. This amobiguity exists simply because at low enough signal levels, you do not know whether the signal change is due to the signal or to the noise in the system (and, by "you" we mean "your ear" or "your test instrument" equivalently).

It should be further noted that properly implemented digital system do not sample the music in discrete "steps" but rather have the same unpredictable random ambiguity as conventional analog systems.

In some instances you might have heard of 1-bit audio systems. You might wonder how 1-bit can present audio signals, because 1 bit can present only onle signal level. The magic is in that those 1-bit samples are generated using a special sampling process which produces a series of 1-bit samples that represent the audio signal accurately after played back to playback device. The simplest 1-bit system you can think of is delta modulation (DM), where each bit presents should the output signal voltage increased or dicreased a bit. This was used in some telephone systems. Sigma-delta modulation (SDM) was developed in 1960s to overcome the limitations of delta modulation. Sigma-delta systems quantize the delta (difference) between the current signal and the sigma (sum) of the previous difference. SDM is also known as pulse density modulation (PDM). The maximum quantizer range of SDM is determined by the maximum signal amplitude and is not dependent on signal spectrum.

For the 1-bit samples to represent the audio signal accurately, this kind of 1-bit samples needs to be taken at very high rate (very much higher than samples with higher resolution are generally taken in audio applications). To convert a maximum amplitude 16-bit word, a 1-bit modulator (delta modulator) would have to perform 216 toggles per conversion period; with a sampling frequency of 44.1 kHz, this would demand a toggle rate to approximately 2.9 GHz, an impossibility with today's technology. For comparision with SDM with an audio band 22,1 kHz and 64 times oversampling, the internal sampling frequency rises to 2.8224 MHz, thus quantization noise is spread from dc to 1.4112 MHz. However, sigma-delta modulation adds noise-shaping benefits.

Nowadays SDM (sigma-delta-modulation) is used in the DA converters of many CD players. The 16-bit 44.1 kHz data is requantized and oversampled with SDM encoder and converted to analog line level signal with SDM decoder. 1-bit converter is in some cases easier and cheaper to implement and the most important benefit is its better tolerance for small variances in component values compared to the high demands of the conventional weighted-resistor and R-2R ladder high-bit DACs.

The physical device that converts analogue audio to digital audio is called ADC (Analog to Digital Converter) and the device which converts digital audio to analogue audio is DAC (Digital to Analog Converter).

Sampling parameters affect quality of sound which can be reproduced from the recorded signal. The most fundamental parameter is sampling rate which limits the highest frequency than can be stored. It is well known (Nyquist's Sampling Theorem) that the highest frequency that can be stored in sampled signal is at most 1/2 of the sampling frequency. Sample encoding limits dynamic range of recorded signal (difference between the faintest and the loudest signal that can be recorded). In theory the maximum dynamic range of signal is number_of_bits * 6 dB . This means that 8 bits sampling resolution gives dynamic range of 48 dB and 16 bit resolution gives 96 dB.

Analogue to digital conversion technology

An analog-to-digital converter (also known as an ADC or an A/D converter) is an electronic circuit that measures a real-world signal (analogue audio) and converts it to a digital representation of the signal (digital audio). A/D-converter compares the analog input voltage to a known reference voltage and then produces a digital representation of this analog input. The output of an ADC is a digital binary code. By its nature, an ADC introduces a quantization error. This is simply the information that is lost, because for a continuous analog signal there are an infinite number of voltages but only a finite number of ADC digital codes. By increasing the resolution of the ADC, the number of discrete steps is increased, which reduces quantization errors.

A typical analog-to-digital process consists of two stages: discrete-time sampling followed by amplitude quantization. The sampling takes samples of the incoming signal voltage at teh predetermined rate (sample rate). Amplitude quantization convert those voltage samples to numeric value with limited resolution (determined by the number of bits used). Each of these two steps has requirements that must be met. Discrete-time sampling demands that the signal must be band limited to the appropriate Nyquist band before sampling to prevent aliasing. The signal must also be properly dithered before quantization to prevent the introduction of signal-correlated artifacts (usually described as "grain and grunge"). If you do not add dithering, there is some nasty stuff that happens to an undithered signal as it approaches the bottom of the bit bucket, and the rapidly mouonting severe distortion due to that. One of the big problems with quantisation distortion is that it isn't so many dBs below signal level, but is pretty much at a constant level regardless of the level of the programme material.

To avoid those problems, dither is used practically always in some form. The application of the correct amount of dither will completely eliminate all signal-correlated artifacts. Properly-dithered digital contains no nonlinear distortion of the signal being quantized. With any normal form of dither (technically, with any wordlength reduction to quantized PCM) you end up with an entirely decorrelated noise floor, which mimicks the behavior of ordinary noise. Dithered noise floor is not true Gaussian noise, indeed there are continuing discussions among pro-audio guys as to which form of dither is best for total inaudibility.

Dither can can applies in many ways and sometimes you might not need any spearate ditner noise source at all. For example when music is being played full-out there is generally enough non-harmonic, noise-like content to do an effective job of dithering on its own without need for a specific dither source. In a 24-bit recording system the intrinsic analogue noise floor will almost certainly be greater than 1 bit in amplitude, so a specific dither signal would be unnecessary. Of course it would be needed during 16-bit truncation.

In a dithered system the the definition of resolution is is not always very lear. With an un-dithered quantised system, resolution is simple - the size of the step. There can be no argument about that. When you introduce dither, things change. You no longer have that step, and resolution turns into a rather moot subject which is more to do with how small a signal you can discern in the presence of noise. This is no longer simple. By choosing appropriate bandwidths you can discern as small a signal as you like. So resolution no longer has any easily defined meaning. It is replaced by signal to noise ratio.

This was the conventional architecture. Nowadays different A/D conversion architectures tend to blur the implementational division between the two steps.

A/D-conversion in audio systems is a sampling process. All the sampling processes are limited by Nyquist limit. The Nyquist limit is defined as half of the sampling frequency. The Nyquist limit sets the highest frequency that the system can sample without frequency aliasing. In a sampled data system, when the input signal of interest is sampled at a rate slower than the Nyquist limit (fIN > 0.5fSAMPLE), the signal is effectively "folded back" into the Nyquist band, thus appearing to be at a lower frequency than it actually is. This unwanted signal is indistinguishable from other signals in the desired frequency band (fSAMPLE/2). Usually the signals are prefiltered before they enter the A/D-converter to avoid too high frequency signal components which can cause this kind of unwanted signals.

Typical good quality modern digital audio systems use at least 16 bit resolution (typically 16-24 bits) and sample rate of 44.1 kHz (CD sample rate), 48 kHz (DAT sample rate) or higher (96 kHz, 192 kHz etc.).

Sometimes you might wonder what is the dynamic range of a digital audio system. Hard to use accurate definition is that the total amount of uniquely representable information in the system is, indeed, proportional to the dynamic range in amplitude multiplied by the bandwidth.

But, if you accept the definition of dynamic range as " the ratio between the largest possible undistorted signal and the smallest unambiguous change in signal," then it's easy.

Actually, for run-of-the-mill PCM data, dynamic range only depends on the number of bits. To do the bare calculation, the dynamic range in dBs is 20 log (2^n) where n is the number of bits. But as I say, that is far from the whole story. The generally accepted rule of thumb is 6 dB per bit. So a 10-bit system has 60 dB, a 16-bit system 96 dB, etc. Please note that this value is for a single sample. Through common averaging techniques, it is possible, just like in non-digital systems, to encoded and detect signals that are well below the noice floor. The 96 dB figure is the worst-case, non-dithered instantaneous dynamic range of a 16-bit system.

When talking about A/D-conversion, things get more complicated. 0dB is easy - it is the biggest sine wave you can fit into the digital space. You are driving the ADC end-to-end. The other end is more problematic. You find it by measuring the noise level. Unfortunately this depends on the measuring bandwidth.

Every time you halve the measuring bandwidth, you reduce the noise by 3dB. Of course you can simply include the entire bandwidth, but in the case of audio that isn't really fair, because we don't hear that way. Particularly at low levels, we hear predominantly the middle range frequencies, and the extreme lows and highs disappear. At threshold levels even "A" weighting gives unfairly pessimistic levels of noise.

Digital to analogue conversion technology

A digital-to-analog converter (also known as a DAC or a D/A converter) is an electronic circuit that converts a digital representation of a quantity into a discrete analog value. The input to the DAC is typically a digital binary code, and this code, along with a known reference voltage, results in a voltage or current at the DAC output. The word "discrete" is very important to understand, because a DAC cannot provide a continuous time output signal; rather, it provides analog "steps." The steps can be lowpass-filtered to obtain a continuous signal.

In D/A conversion process the output of D/A converter is fed through a filter which will remove the image-frequency information (signal higher than 1/2 of sampling frequency) from the output signal. This image-frequency information can distort the output signal. Two methods exist for removing unwanted image signals from the DAC output to prevent alising in a following ADC. First approach is to use a high-performance lowpass filter (data -> DAC -> high-order lowpass filter). For low pass filtering usually a sixth-order lowpass filter is enough. The second methos is to use digital-interpolation filters and a simple analogue filter (data -> oversampling digital-interpolation filter -> DAC -> low-order lowpass filter).

Digital audio formats

Digital audio broadcasting

Networked audio

Papers and FAQs

Speech and music coding

Digital audio interfaces

Audio compression

The issue of using compression is the matter of reduction of the amount of data. A direct measure of the total amount of data or information in audio is its bitrate. By using the compression the audio signal can be transported with lower bit rate or stored to a smaller file size.

A typical audio file where we start from is CD quality audio. Audio Compact Disks store the audio data in files on the disk. The audio data is in a PCM (Pulse Code Modulation) format. Each minute of recording time consumes about 9 MB (Megabytes) of file storage space. A three minute song would occupy about 27 MB of file storage space, and a 5 minute song would occupy about 45 MB of file storage space. A 650 Megabyte Compact disc can contain up to 74 minutes of PCM audio.

The sheer size of PCM audio data files made them unpopular as a storage medium for audio on a computer or the fole format when the audio is transported through slow modem connection. Digital compression came along and changed all that. Digital compression offers possibilities for a huge reduction in the amount of file space required to store audio data files. This reduction in file size also reduced the time required to transmit them electronically dramatically.

Audio compression can be one of two categories, lossless or lossy:

Lossless digital compression is commonly used to reduce the size of computer files for electronic transmission. In order for the files to be useable on a computer the files that are extracted from a compressed data file must be identical to the original file (before it was compressed). Lossless compression is great because it makes perfect copies but it doesn't yield very high compression ratios. That means it doesn't save huge amounts of disk storage space. ZIP, ARC, TAR, and SIT are some of the acronyms or formats of Lossless Compression commonly used on computers. Lossless compression sounds wonderful, however it does not yield an extremely high compression ratio. Typically, lossless routines give a ratio no higher than 2:1.

Lossy compression algorithms offers much higher compression ratios than lossless algorithms but in order to achieve this they need to discard some of the original data. Lossy Audio Compression can yield a variety of different rates of compression based on the ability of your hardware and software to encode and decode music audio data. Lossy compression is only suitable for use on audio or graphical data. The audio or graphics are reproduced but at a lower overall quality than they had before they were compressed. In some cases the difference is difficult to perceive. The compression ratio can usually be adjusted so the quality level can vary widely. Audio that is compressed at a 20:1 or 10:1 ratio will certainly sound inferior to audio that was compressed at a 2:1 ratio. There are various compression schemes to achieve different results; for example, many codecs remove portions of the audio signal that human hearing is less sensitive to. To some peoples ears the resulting audio has distinguishing artifacts or characteristics that change the listening experience. The final "quality" level is largely a matter of personal perception. Each different encoder/decoder (CODEC) has different strengths and weaknesses. Typically, encoding takes a long period of time and large amounts of processing power. decoding can usually be made real time or faster. MPEG, MP3, AAC, RA, WMF, JPEG, QT, and DivX are some of the acronyms or formats of Lossy compression commonly used for audio and video. The most commonly used and most talked about audio compression in typical computer and consumer audio enviroment is MP3.

Brief descriptions of some different audio compression formats:

The compression ratio for MPEG compression can be adjusted across a wide range of quality levels, however sound quality drops as the compression is increased. Trying to maintain the highest quality (without extremely noticeable signal loss) yields the following compression ratios for different MPEG coders:

Layer 1 = 1:4 (384 kbps for stereo)
Layer 2 = 1:6 or 1:8 (256 kbps or 192 kbps for a stereo signal)
Layer 3 = 1:10 or 1:12 (128 kbps or 112 kbps for a stereo signal)

With the increase in inexpensive mass storage and the increases in connection speeds we are rapidly approaching a point in time when compression of audio may not be as important of an issue as it was in the past.

Computer audio

Contents rights management

Tecording industry is worried about copying of their products (CDs). The recording industry has tried putting down the first critical pieces for a system it hopes will keep songs on the Net from being pirated.

The general term 'piracy" refers to the illegal duplication and distribution of sound recordings and takes three specific forms: counterfeit, pirate and bootleg. Counterfeit recordings are the unauthorized recording of the prerecorded sounds, as well as the unauthorized duplication of original artwork, label, trademark and packaging of prerecorded music. Pirate recordings are the unauthorized duplication of only the sounds of one or more legitimate recordings. Bootleg recordings are the unauthorized recording of a musical broadcast on radio or television or of a live concert.

The most talked about technology has been Secure Digital Music Initiative (SDMI). The first phase of the SDMI system requires that portable digital music player manufacturers implement several security components, foremost among them a digital rights management system (DRM). This will allow record labels to securely distribute and track files as they are transmitted over the Net and on to portable players.


Related pages


<mailto:webmaster@epanorama.net?subject=Feedback on audio.html page>



[ webmaster ] [ feedback ] [ friend ][ main index ] [ Disclaimer ] [ Legal Notice ]
Copyright 2003 ELH Communications Ltd. all rights reserved.