##### Page tree
Go to start of banner

# Linear regression

You are viewing an old version of this page. View the current version.

Version 12

## Problem definition

In speech processing and elsewhere, a frequently appearing task is to make a prediction of an unknown vector y from available observation vectors x. Specifically, we want to have an estimate $$\hat y = f(x)$$ such that $$\hat y \approx y.$$ In particular, we will focus on linear estimates where $$\hat y=f(x):=A^T x,$$ and where A is a matrix of parameters.

## The minimum mean square estimate (MMSE)

Suppose we want to minimise the squared error of our estimate on average. The estimation error is $$e=y-\hat y$$ and the squared error is the L2-norm of the error, that is, $$\left\|e\right\|^2 = e^T e$$ and its mean can be written as the expectation $$E\left[\left\|e\right\|^2\right] = E\left[\left\|y-\hat y\right\|^2\right] = E\left[\left\|y-A^T x\right\|^2\right].$$ Formally, the minimum mean square problem can then be written as

$\min_A\, E\left[\left\|y-A^T x\right\|^2\right].$

This can in generally not be directly implemented because we have the abstract expectation-operation in the middle.

(Advanced derivation begins) To get a computational model, first note that the error expectation can be written in terms of the mean of a sample of vector ek as

$E\left[\left\|e\right\|^2\right] \approx \frac1N \sum_{k=1}^N \left\|e_k\right\|^2 = \frac1N {\mathrm{tr}}(E^T E),$

where $$E=[e_1,\,e_2,\dotsc,e_N]$$ and tr() is the matrix trace. To minimize the error energy expectation, we can then set its derivative to zero

$0 = \frac{\partial}{\partial A} \frac1N {\mathrm{tr}}(E^T E) = \frac1N\frac{\partial}{\partial A} {\mathrm{tr}}((Y-A^TX)^T (Y-A^TX)) = \frac1N(Y-A^T X)X^T$

where the observation matrix is $$X=[x_1,\,x_2,\dotsc,x_N]$$ and the desired output matrix is $$Y=[y_1,\,y_2,\dotsc,y_N]$$ . (End of advanced derivation)

It follows that the optimal weight matrix A can be solved as

$\boxed{A = (XX^T)^{-1}XY^T = X^\dagger Y^T},$

where the superscript $$\dagger$$ denotes the Moore-Penrose pseudo-inverse.

### Estimates with a mean parameter

Suppose that instead of an estimate $$\hat y=A^T x$$ , we want to include a mean vector in the estimate as $$\hat y=A^T x + \mu$$ . While it is possible to derive all of the above equations for this modified model, it is easier to rewrite the model into a similar form as above with

$\hat y=A^T x + \mu = \begin{bmatrix} \mu^T \\ A^T \end{bmatrix} \begin{bmatrix} 1 \\ x \end{bmatrix} := A' x'.$

That is, we can extend x by a row of constant 1s, and extend A to include the mean vector. With this modifications, the above Moore-Penrose pseudo-inverse can again be used to solve the modified model.

### Estimates with linear equality constraints

In practical situations we often have also linear constraints, such as b=Cx. That is, we require that the constraint equality is fulfilled, and at the same time want to minimize the mean square error. That is, we have the modified programming task

$\min_A\, \|y-A^Tx\|^2 \quad\text{such that}\quad b=Cx.$

We again find that this problem can be rewritten as an unconstrained minimization problem as follows. First of all, we can include the constraint in a single objective function using a Lagrange multiplier g as

$\eta(A,g) := \|y-A^Tx\|^2 - 2g^T(b-Cx).$

Minimization of this unconstrained objective function then returns the solution of the constrained problem. The trick can be heuristically explained as follows; suppose g is a free parameter. Then b-Cx must be zero, otherwise the objective function can attain any value. But if b-Cx=0, then the constraint is satisfied. In other words, we have found a solution which satisfies the equality constraint, while simultaneously minimizing the square error.

Secondly, note that the above objective function can be written as

$\eta(A,g) = \|y-A^Tx\|^2 - 2g^T(b-Cx) = x^TAA^T x - 2y^T A^T x -g^TCx + y^T y - g^T b = \begin{bmatrix} x^T & g^T \end{bmatrix} \begin{bmatrix} AA^T & -y^T A^T-g^TC \\ -Ay-C^Tg & \end{bmatrix} \begin{bmatrix} x \\ g \end{bmatrix} - y^T y - g^T b$

• No labels