Look at the speech signal segment to the right. On a large scale it is hard to discern a structure, but on a small scale, the signal seems continuous. Speech signals typically have such structure that samples near in time to each other are similar in amplitude. Such structure is often called short-term temporal structure.

More specifically, samples of the signal are *correlated* with the preceding and following samples. Such structures are in statistics measured by covariance and correlation, defined for zero-mean variables x and y as

where *E[ ]* is the expectation operator.

Short segment of speech

For a speech signal *x _{n}*, where

*k*is the time-index, we would like to measure the correlation between two time-indices

*x*and

_{n}*x*. Since the structure which we are interested in appears when

_{h}*n*and

*h*are near each other, it is better to measure the correlation between

*x*and

_{n}*x*. The scalar

_{n-k}*k*

*is known as the*

*lag*. Furthermore, we can assume that the correlation is uniform over all

*n*within the segment. The self-correlation and -covariances, known as the

*autocorrelation*and

*autocovariance*are defined as

The figure on the right illustrates the autocovariance of the above speech signal. We can immediately see that the short-time correlations are preserved - on a small scale, the autocovariance looks similar to the original speech signal. The oscillating structure is also accurately preserved.

Because we assume that the signal is stationary, and as a consequence of the above formulations, we can readily see that autocovarinaces and -correlations are symmetric

\[ r_k = E_n[x_nx_{n-k}] = E_n[x_{n+k}x_{n+k-k}] = E_n[x_{n+k}x_{n}] = r_{-k}. \]This symmetry is clearly visible in the figure to the right, where the curve is mirrored around lag 0.

The autocovariance of a speech segment

The above formulas use the expectation operator *E[ ]* to define the autocovariance and -correlation. It is an abstract tool, which needs to be replaced by a proper estimator for practical implementations. Specifically, to estimate the autocovariance from a segment of length *N*, we use

Observe that the speech signal *x _{n}* has to be windowed before using the above formula.

We can also make an on-line estimate of the autocovariance for sample position *n* with lag *k *as

where α is a small positive constant which determines how rapidly the estimate converges.

It is often easier to work with vector notation instead of scalars, whereby we need the corresponding definitions for autocovariances. Suppose

\[ x = \begin{bmatrix}x_0\\x_1\\\vdots\\x_{N-1}\end{bmatrix}. \]We can then define the autocovariance matrix as

\[ R_x := E[x x^T] = \begin{bmatrix}E[x_0^2] & E[x_0x_1] & \dots & E[x_0x_{N-1}]\\E[x_1x_0] & E[x_1^2] & \dots & E[x_1x_{N-1}]\\\vdots&\vdots&\ddots&\vdots\\E[x_{N-1}x_0] & E[x_{N-1}x_1] & \dots & E[x_{N-1}^2]\end{bmatrix} = \begin{bmatrix}r_0 & r_1 & \dots & r_{N-1}\\ r_1 & r_0 & \dots & r_{N-2}\\\vdots&\vdots&\ddots&\vdots\\r_{N-1} & r_{N-1} & \dots & r_0\end{bmatrix}. \]Clearly *R _{x}* is thus a symmetric Toeplitz matrix. Moreover, since it is a product of

*x*with itself,

*R*is also positive (semi-)definite.

_{x}