Message-ID: <1651155190.35417.1597364316940.JavaMail.confluence@aaltowiki-prod.novalocal> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_35416_303888039.1597364316924" ------=_Part_35416_303888039.1597364316924 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html Gaussian mixture model (GMM)

# Gaussian mixture model (GMM)

=20
=20
=20
=20

## Motivation

Where approaches such as linear regression and sub-space models are based on reducing the dimensionality of a signal = to capture the essential information in a signal, in many cases we want to = model the full range of possible signals. For that purpose we can design mo= dels the statistical distribution of the signal. For example, it i= s possible to model a signal as a Gaussian process, where every= observation has a (multivariate) Gaussian (normal) distribution.

Speech signals however feature much more structure than simple Gaussian = processes. For example voiced signals are very different from unvoiced sign= als, and within both voiced and unvoiced signals we have a multitude of dis= tinct groups of utterances whose statistical characteristics are clearly di= fferent. Modelling them all with a Gaussian process would ignore such struc= tures and the model would therefore be inefficient.

Mixture models is a type of models, where we assu= me that the signal under study consists of several distinct classes, where = each class has its own unique statistical model. That is, for example the s= tatistics of voiced sounds is clearly different from those of unvoiced soun= ds. We model each class with its own distribution and their joint distribut= ion is the weighted sum of the class distributions. The weights of each dis= tribution correspond to the frequency with which they appear in the signal.= So if unvoiced signals would in some hypothetical language constitute 30% = of all speech sounds, then the weight of the unvoiced class would be 0.3.&n= bsp;

The most typical mixture model structure uses Gaussian (normal) distribu= tions for each of the classes, so that the whole model is known as a Ga= ussian mixture model (GMM). Depending on application the class-di= stributions can obviously take other forms than Gaussian, for example a Bet= a mixture model could be used if the individual classes follow the Beta dis= tribution. In this document we however focus on Gaussian mixture models bec= ause it is most common among mixture models and demonstrates application in= an accessible way.

=20
=20
=20
=20

=20
=20
=20
=20
=20
=20

## Model definition

The multivariate normal distribution for a v= ariable x is defined as

$f\left(x;\Sigma,\mu\righ= t) =3D \frac{1}{\sqrt{\left(2\pi\right)^N |\Sigma|}} \exp\left[-\frac12 (x-= \mu)^T \Sigma^{-1}(x-\mu)\right],$=20

where $$\Sigma$$ and  $$\mu$$ are the covariance and mean of th= e process, respectively, with N dimensions. In other words, this i= s the familiar Gaussian process for vectors x

Suppose then that we have K classes in the signal, where e= ach class has its own covariance and mean $$\Sigma_k$$ and  $$\mu_k= .$$ The Gaussian mixture model is then defined as

$\boxed{f= \left(x\right) =3D \sum_{k=3D1}^K \alpha_k f\left(x; \Sigma_k,\mu_k\right),= }$=20

where the weights $$\alpha_k$$ add up to unity $$\sum_{k=3D1}^K \alph= a_k=3D1.$$

=20
=20
=20
=20

=20
=20
=20
=20
=20
=20

## Applications

• In recognition/classification applications, we can, for example, model = a system which has two distinct states (like speech and noise) and train a = GMM with mixture components matching those states. When receiving a microph= one signal, we can then determine the likelihood of each mixture component = and thus obtain the likelihood that the signal is speech or noise.
• In transmission applications, our objective is to model the signal suc= h that we can transmit likely signals with a small amount of bits and unlik= ely signals with a large number of bits. If we train a GMM on a speech data= base, we can determine which signals are speech-like, such that those can b= e transmitted with a low number of bits.
=20
=20
=20
=20

=20
=20
=20
------=_Part_35416_303888039.1597364316924--