Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Problem definition

In speech processing and elsewhere, a frequently appearing task is to make a prediction of an unknown vector y from available observation vectors x. Specifically, we want to have an estimate \( \hat y = f(x) \) such that \( \hat y \approx y. \) In particular, we will focus on linear estimates where \( \hat y=f(x):=A^T x, \) and where A is a matrix of parameters.

The minimum mean square estimate (MMSE)

Suppose we want to minimise the squared error of our estimate on average. The estimation error is \( e=y-\hat y \) and the squared error is the L2-norm of the error, that is, \( \left\|e\right\|^2 = e^T e \) and its mean can be written as the expectation \( E\left[\left\|e\right\|^2\right] = E\left[\left\|y-\hat y\right\|^2\right] = E\left[\left\|y-A^T x\right\|^2\right]. \) Formally, the minimum mean square problem can then be written as

\[ \min_A\, E\left[\left\|y-A^T x\right\|^2\right]. \]

This can in generally not be directly implemented because we have the abstract expectation-operation in the middle.

(Advanced derivation) To get a computational model, first note that the error expectation can be written in terms of the mean of a sample of vector ek as

\[ E\left[\left\|e\right\|^2\right] \approx \frac1N \sum_{k=1}^N \left\|e_k\right\|^2 = \frac1N {\mathrm{tr}}(E^T E), \]

where \( E=[e_1,\,e_2,\dotsc,e_N] \) and tr() is the matrix trace. To minimize the error energy expectation, we can then set its derivative to zero

\[ 0 = \frac{\partial}{\partial A} \frac1N {\mathrm{tr}}(E^T E) = \frac1N\frac{\partial}{\partial A} {\mathrm{tr}}((Y-A^TX)^T (Y-A^TX)) = \frac1N(Y-A^T X)X^T \]

where the observation matrix is \( X=[x_1,\,x_2,\dotsc,x_N] \) and the desired output matrix is \( Y=[y_1,\,y_2,\dotsc,y_N] \) . (End of advanced derivatione)

It follows that the optimal weight matrix A can be solved as

\[ \boxed{A = (XX^T)^{-1}XY^T = X^\dagger Y^T}, \]

where the superscript \( \dagger \) denotes the Moore-Penrose pseudo-inverse.

  • No labels