## Problem definition

In speech processing and elsewhere, a frequently appearing task is to make a prediction of an unknown vector *y* from available observation vectors *x*. Specifically, we want to have an estimate
\( \hat y = f(x) \)
such that
\( \hat y \approx y. \)
In particular, we will focus on *linear estimates* where
\( \hat y=f(x):=A^T x, \)
and where *A* is a matrix of parameters.

## The minimum mean square estimate (MMSE)

Suppose we want to minimise the squared error of our estimate on average. The estimation error is* *
\( e=y-\hat y \)
and the squared error is the *L _{2}*-norm of the error, that is,
\( \left\|e\right\|^2 = e^T e \)
and its mean can be written as the expectation
\( E\left[\left\|e\right\|^2\right] = E\left[\left\|y-\hat y\right\|^2\right] = E\left[\left\|y-A^T x\right\|^2\right]. \)
Formally, the minimum mean square problem can then be written as

This can in generally not be directly implemented because we have the abstract expectation-operation in the middle.

*(Advanced derivation begins)* To get a computational model, first note that the error expectation can be written in terms of the mean of a sample of vector *e _{k}* as

where
\( E=[e_1,\,e_2,\dotsc,e_N] \)
and *tr()* is the matrix trace. To minimize the error energy expectation, we can then set its derivative to zero

where the observation matrix is
\( X=[x_1,\,x_2,\dotsc,x_N] \)
and the desired output matrix is
\( Y=[y_1,\,y_2,\dotsc,y_N] \)
. *(End of advanced derivation)*

It follows that the optimal weight matrix *A* can be solved as

where the superscript \( \dagger \) denotes the Moore-Penrose pseudo-inverse.

### Estimates with a mean parameter

Suppose that instead of an estimate \( \hat y=A^T x \) , we want to include a mean vector in the estimate as \( \hat y=A^T x + \mu \) . While it is possible to derive all of the above equations for this modified model, it is easier to rewrite the model into a similar form as above with

\[ \hat y=A^T x + \mu = \begin{bmatrix} \mu^T \\ A^T \end{bmatrix} \begin{bmatrix} 1 \\ x \end{bmatrix} := A' x'. \]That is, we can extend *x* by a row of constant 1s, and extend *A* to include the mean vector. With this modifications, the above Moore-Penrose pseudo-inverse can again be used to solve the modified model.

### Estimates with linear equality constraints

In practical situations we often have also linear constraints, such as *b=Cx*. That is, we require that the constraint equality is fulfilled, and at the same time want to minimize the mean square error. That is, we have the modified programming task

We again find that this problem can be rewritten as an unconstrained minimization problem as follows. First of all, we can include the constraint in a single objective function using a Lagrange multiplier *g* as

Minimization of this unconstrained objective function then returns the solution of the constrained problem. The trick can be heuristically explained as follows; suppose *g* is a free parameter. Then *b-Cx* must be zero, otherwise the objective function can attain any value. But if *b-Cx=0*, then the constraint is satisfied. In other words, we have found a solution which satisfies the equality constraint, while simultaneously minimizing the square error.

Secondly, note that the above objective function can be written as

\[ \eta(A,g) = \|y-A^Tx\|^2 - 2g^T(b-Cx) = x^TAA^T x - 2y^T A^T x -g^TCx + y^T y - g^T b = \begin{bmatrix} x^T & g^T \end{bmatrix} \begin{bmatrix} AA^T & -y^T A^T-g^TC \\ -Ay-C^Tg & \end{bmatrix} \begin{bmatrix} x \\ g \end{bmatrix} - y^T y - g^T b \]