## Problem definition

In speech processing and elsewhere, a frequently appearing task is to make a prediction of an unknown vector y from available observation vectors x. Specifically, we want to have an estimate

 \hat y = f(x)

such that

 \hat y \approx y.

In particular, we will focus on linear estimates where

 \hat y=f(x):=x^T A,

and where A is a matrix of parameters.

## The minimum mean square estimate (MMSE)

Suppose we want to minimise the squared error of our estimate on average. The estimation error is

 e=y-\hat y

and the squared error is the L2-norm of the error, that is,

 \left\|e\right\|^2 = e^T e

and its mean can be written as

 E\left[\left\|e\right\|^2\right].