In speech processing and elsewhere, a frequently appearing task is to make a prediction of an unknown vector y from available observation vectors x. Specifically, we want to have an estimate \( \hat y = f(x) \) such that \( \hat y \approx y. \) In particular, we will focus on linear estimates where \( \hat y=f(x):=A^T x, \) and where A is a matrix of parameters.
The minimum mean square estimate (MMSE)
Suppose we want to minimise the squared error of our estimate on average. The estimation error is \( e=y-\hat y \) and the squared error is the L2-norm of the error, that is, \( \left\|e\right\|^2 = e^T e \) and its mean can be written as the expectation \( E\left[\left\|e\right\|^2\right] = E\left[\left\|y-\hat y\right\|^2\right] = E\left[\left\|y-A^T x\right\|^2\right]. \)