Humans are usually the intended recipients of speech signals in telecommunication, such that the quality of a transmission should be measured in terms of how good a human listener would judge its quality. Perceptual models refer to methods which try to approximate or predict the judgement of auditory quality perceived by human listeners. In coding applications we can thus define perceptual models as evaluation models, with which we approximate the perceptual effect of distortions.
An another type of models which are frequently used in speech coding are source models, which describe the inherent characteristics of the source, which is the speech signal. You can think of a source model as for example 1) physical models, which describe the physiological processes which cause speech sounds or 2) the probability distribution of speech signals. The important distinction is that source models do not care about how who is observing, but they only describe the objective reality. In contrast, perceptual models are applied when we observe the signal, to evaluate properties of the signal.
In speech and audio coding applications, practically all distortions caused by the algorithms are due to quantization of the signal. The objective of perceptual modelling is then to choose the quantization accuracy such that the perceptually degrading effect of quantization is minimized. Roughly speaking, this means that those signal components which are more important to a human listener are quantized with a higher accuracy than those which are less important.
If we play two sinusoid with slightly different frequencies, then the louder of the two can mask the second sinusoid such that it becomes inaudible. This effect is known as frequency masking. In other words, people are less sensitive to sounds which are near in frequency to other sounds. In particular, when quantizing a signal, we can use a lower quantization accuracy in frequency-regions which have more energy. The effect is reduced the further away we are in frequency.
In practice, frequency masking models are similar to spectral (energy) envelopes. That is, the shape of the frequency masking model is similar to the spectral envelope, but a smoothed and less pronounced version thereof. More accurate versions of the model can be generated based on psychoacoustic theory.
Frequency masking models are used in two ways:
- In frequency domain codecs, where a frequency-domain representation of the signal is quantized, we...