=20

=20

=20
=20

=20
=20
## Motivation

=20
Where approaches such as linear regression and sub-space models are based on reducing the dimensionality of a signal =
to capture the essential information in a signal, in many cases we want to =
model the full range of possible signals. For that purpose we can design mo=
dels the *statistical distribution* of the signal. For example, it i=
s possible to model a signal as a Gaussian process, where every=
observation has a (multivariate) Gaussian (normal) distribution.

Speech signals however feature much more structure than simple Gaussian = processes. For example voiced signals are very different from unvoiced sign= als, and within both voiced and unvoiced signals we have a multitude of dis= tinct groups of utterances whose statistical characteristics are clearly di= fferent. Modelling them all with a Gaussian process would ignore such struc= tures and the model would therefore be inefficient.

*Mixture models* is a type of models, where we assu=
me that the signal under study consists of several distinct classes, where =
each class has its own unique statistical model. That is, for example the s=
tatistics of voiced sounds is clearly different from those of unvoiced soun=
ds. We model each class with its own distribution and their joint distribut=
ion is the weighted sum of the class distributions. The weights of each dis=
tribution correspond to the frequency with which they appear in the signal.=
So if unvoiced signals would in some hypothetical language constitute 30% =
of all speech sounds, then the weight of the unvoiced class would be 0.3.&n=
bsp;

The most typical mixture model structure uses Gaussian (normal) distribu=
tions for each of the classes, so that the whole model is known as a *Ga=
ussian mixture model* (GMM). Depending on application the class-di=
stributions can obviously take other forms than Gaussian, for example a Bet=
a mixture model could be used if the individual classes follow the Beta dis=
tribution. In this document we however focus on Gaussian mixture models bec=
ause it is most common among mixture models and demonstrates application in=
an accessible way.

=20

=20
=20

=20
=20

=20
=20

=20
=20
## Model definition

=20
The multivariate normal distribution for a v=
ariable *x* is defined as

where \( \Sigma \) and \( \mu \) are the covariance and mean of th=
e process, respectively, with *N* dimensions. In other words, this i=
s the familiar Gaussian process for vectors *x*.

Suppose then that we have *K* classes in the signal, where e=
ach class has its own covariance and mean \( \Sigma_k \) and \( \mu_k=
. \) The* Gaussian mixture model* is then defined as

where the weights \( \alpha_k \) add up to unity \( \sum_{k=3D1}^K \alph= a_k=3D1. \)

=20

=20
=20

=20
=20

=20
=20

=20
=20
## Applications

=20
- In recognition/classification applications, we can, for example, model = a system which has two distinct states (like speech and noise) and train a = GMM with mixture components matching those states. When receiving a microph= one signal, we can then determine the likelihood of each mixture component = and thus obtain the likelihood that the signal is speech or noise.
- In transmission applications, our objective is to model the signal suc= h that we can transmit likely signals with a small amount of bits and unlik= ely signals with a large number of bits. If we train a GMM on a speech data= base, we can determine which signals are speech-like, such that those can b= e transmitted with a low number of bits.

=20

=20
=20

=20