=20

=20

=20
=20

=20
=20
## Introductio=
n

=20
Many of the most descriptive features of speech are described by energy;= for example, formants are peaks and the fundamental frequency is visible a= s a comb-structure in the power spectrum. A basic property of such features= is that they are positive-valued. Negative values in energy are not physic= ally realizable. However, most signal processing methods are applicable onl= y for real-valued variables and inclusion of a non-negative constraints is = cumbersome.

*Non-negative matrix factorization* (NMF or NNMF) and its tensor-=
valued counterparts is a family of methods which explicitly assumes that th=
e input variables are *non-negative*, that is, they are by definitio=
n applicable to energy-signals. In some sense, NMF methods are an extension=
of prinicipal component analsys (PCA) -type and ot=
her subspace methods to posi=
tive-valued signals.

=20

=20
=20

=20
=20

=20
=20

=20
=20
## Model de=
finition

=20
Specifically, suppose that the power (or magnitude) spectrum of one wind=
ow of a speech signal is represented as a *Nx1* vector *v _{k}*, and furthermore we arrange the

where *W* is the *NxM *weight matrix, *H* is the *M* is the model order.

The idea is that *H* is a fixed matrix corresponding to our model=
of the signal, viz. the source model. It describes typical types features =
of the data. With the weights *W*, we interpolate between the c=
olumns of *H*. In some sense, this is then a generalization of =
a codebook (see vecto=
r quantization), but such that we interpolate between codevectors. In a=
ddition, we require that all elements of *W* and *H* are non-=
negative, such that we ensure that *V* is also non-negative.

Since the model order *K* is chosen to be smaller than either&nbs=
p;*N* or *K*, this mapping is generally an approximation=
. The model thus tries to catch *the relevant features of the input sign=
al with a low number of parameters*.

The model is generally optimized by

\[ \min_{W,H} \| V - WH \|_F\qqu= ad\text{such that}\qquad W,H\geq 0. \]=20Here the norm refers to the Frobenius norm, which = is defined as the square root sum of squared elements. We do not have analy= tic solutions to the above optimization problem, but we can solve it by num= erical methods, which are included in typical software libraries.

=20

=20
=20

=20
=20

=20
=20

=20
=20
## Application<=
/h2>

=20
A typical use of NMF type algorithms is source separation, where we find=
the solution of the above optimization problem and then identify those dim=
ensions of *H* which corresponds to the different sources. By retain=
ing only those dimensions of *W* which correspond to the desired sou=
rce, we can thus extract the desired source signal from their mixture with =
the interfering other sources. For example, we might want to extract a spee=
ch signal corrupted by noise by extracting the dimensions corresponding to =
speech and removing those dimensions which correspond to noise.

Note however that NMF-type methods extract only the power (or magnitude)= spectrum of the desired signal. In contrast, usually the input signal is a= time-frequency representation which has also a phase-component. After appl= ication of NMF-estimation, we therefore need also an estimate of the phase-= component of the signal. Such methods will be discussed in the speech enhancement chapter of this doc= ument.

=20

=20
=20

=20
=20

=20
=20

=20
=20

=20
For more information, see the Wikipedia article: Non-negative matrix factorization.

=20

=20
=20

=20