Date: Tue, 26 Oct 2021 06:42:53 +0000 (UTC) Message-ID: <1077616351.152122.1635230573496@3ecc041e.tietoverkkopalvelut.fi> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_152121_994183098.1635230573490" ------=_Part_152121_994183098.1635230573490 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html Autocorrelation and autocovariance

# Autocorrelation and autocovariance

=20
=20
=20
=20

Look at the speech signal segment to the right. On a large scale it is h= ard to discern a structure, but on a small scale, the signal seems continuo= us. Speech signals typically have such structure that samples near in time = to each other are similar in amplitude. Such structure is often called shor= t-term temporal structure.

More specifically, samples of the signal are correlated with th= e preceding and following samples. Such structures are in statistics measur= ed by covariance and correlation, defined for zero-mean variables x and y a= s

$\begin{split} \text{covariance:} & \sigma_{xy} =3D E[xy] \\ \t= ext{correlation:} & \rho_{xy} =3D \frac{E[xy]}{\sqrt{E[x^2]E[y^2]}}, \e= nd{split}$=20

where E[ ] is the expectation operator.

=20
=20
=20
=20

Short segment of speech =20
=20
=20
=20
=20
=20

For a speech signal xn, where k is the= time-index, we would like to measure the correlation between two time-indi= ces xn and xh. Since= the structure which we are interested in appears when n and&= nbsp;h are near each other, it is better to measure the correlatio= n between xn and xn-k. The scalar k is known as the lag. F= urthermore, we can assume that the correlation is uniform over all n within the segment. The self-correlation and -covariances, kno= wn as the autocorrelation and autocovariance ar= e defined as

$\begin{split} \text{autocovariance:} & r_{k} =3D E_= n[x_nx_{n-k}] \\ \text{autocorrelation:} & c_{k} =3D \frac{E_n[x_nx_{n-= k}]}{E_n[x_n^2]} =3D \frac{r_k}{r_0}. \end{split}$=20

The figure on the right illustrates the autocovariance of the above spee= ch signal. We can immediately see that the short-time correlations are pres= erved - on a small scale, the autocovariance looks similar to the original = speech signal. The oscillating structure is also accurately preserved. = ;

Because we assume that the signal is stationary, and as a consequence of= the above formulations, we can readily see that autocovarinaces and -corre= lations are symmetric

$r_k =3D E_n[x_nx_{n-k}] =3D E_n[x_{n+k}x_{n+k-= k}] =3D E_n[x_{n+k}x_{n}] =3D r_{-k}.$=20

This symmetry is clearly visible in the figure to the right, where the c= urve is mirrored around lag 0.

=20
=20
=20
=20

The autocovariance of a speech segment =20
=20
=20
=20
=20
=20

The above formulas use the expectation operator E[ ] to define = the autocovariance and -correlation. It is an abstract tool, which needs to= be replaced by a proper estimator for practical implementations. Specifica= lly, to estimate the autocovariance from a segment of length N, we= use

$r_k \approx \frac1{N-1} \sum_{k=3D1}^{N-1} x_n x_{n-k}.$=20

Observe that the speech signal xn has to b= e windowed before using the above f= ormula.

We can also make an on-line estimate of the autocovariance for sample po= sition n with lag k as

$\hat r_k(n) :=3D \alph= a x_n x_{n-k} + (1-\alpha) \hat r_k(n-1),$=20

where =CE=B1 is a small positive constant which determines how rapidly t= he estimate converges.

=20
=20
=20
=20

=20
=20
=20
=20
=20
=20

It is often easier to work with vector notation instead of scalars, wher= eby we need the corresponding definitions for autocovariances. Suppose

= $x =3D \begin{bmatrix}x_0\\x_1\\\vdots\\x_{N-1}\end{bmatrix}.$=20

We can then define the autocovariance matrix as

$R_x :=3D E[x x^T]= =3D \begin{bmatrix}E[x_0^2] & E[x_0x_1] & \dots & E[x_0x_{N-1}= ]\\E[x_1x_0] & E[x_1^2] & \dots & E[x_1x_{N-1}]\\\vdots&\vd= ots&\ddots&\vdots\\E[x_{N-1}x_0] & E[x_{N-1}x_1] & \dots &a= mp; E[x_{N-1}^2]\end{bmatrix} =3D \begin{bmatrix}r_0 & r_1 & \dots = & r_{N-1}\\ r_1 & r_0 & \dots & r_{N-2}\\\vdots&\vdots&= amp;\ddots&\vdots\\r_{N-1} & r_{N-1} & \dots & r_0\end{bmat= rix}.$=20

Clearly Rx is thus a symmetric Toeplitz m= atrix. Moreover, since it is a product of x with itself, Rx is also positive (semi-)definite.

=20
=20
=20
=20

=20
=20
=20