Problem definition

In speech processing and elsewhere, a frequently appearing task is to make a prediction of an unknown vector y from available observation vectors x. Specifically, we want to have an estimate

\hat y = f(x)

such that

\hat y \approx y.

In particular, we will focus on linear estimates where

\hat y=f(x):=x^T A,

and where A is a matrix of parameters.

The minimum mean square estimate (MMSE)

Suppose we want to minimise the squared error of our estimate on average. The estimation error is

e=y-\hat y

and the squared error is the L2-norm of the error, that is,

\left\|e\right\|^2 = e^T e

and its mean can be written as