Page tree
Skip to end of metadata
Go to start of metadata

List of authors: Tom Bäckström, Okko Räsänen

Includes contributions from Sneha Das


Table of contents

  1. Introduction
    1. Why speech processing?
    2. Speech production and acoustic properties
    3. Phonetics (Wikipedia)
    4. Linguistics (Wikipedia)
    5. Speech perception (Wikipedia)
    6. Speech-Language pathology (Wikipedia)
  2. Basic representations and models
    1. Waveform
    2. Windowing
    3. Spectrogram and the STFT
    4. Autocorrelation and autocovariance
    5. Cepstrum and MFCC
    6. Linear prediction
    7. Fundamental frequency (F0)
    8. Zero-crossing rate
    9. Deltas and Delta-deltas
    10. PSOLA
  3. Pre-processing
    1. Pre-emphasis
    2. Voice activity detection (VAD)
    3. Vocal tract length normalization
    4. Speech enhancement
  4. Modelling tools in speech processing
    1. Linear regression
    2. Sub-space models
    3. Vector quantization (VQ)
    4. Gaussian mixture model (GMM)
    5. Neural networks
    6. Non-negative Matrix and Tensor Factorization
    7. Hidden Markov Models
  5. Evaluation of speech processing methods
    1. Subjective evaluation
    2. Objective evaluation
    3. Analysis of evaluation results
  6. Speech analysis
    1. Fundamental frequency estimation
    2. Formant estimation and tracking
    3. Inverse filtering for glottal activity estimation
  7. Recognition tasks in speech processing
    1. Voice activity detection (VAD)
    2. Keyword or wake-word spotting
    3. Speech recognition
    4. Speaker recognition and verification
    5. Paralinguistic speech processing
  8. Natural language processing
  9. Speech Synthesis
    1. Concatenative synthesis
    2. Parametric synthesis
  10. Transmission, storage and telecommunication
    1. Short history of speech coding
    2. Design goals
    3. Basic tools
      1. Modified-discrete cosine transform (MDCT)
      2. Entropy coding
      3. Perceptual modelling in speech and audio coding
      4. Vector quantization (VQ)
      5. Linear prediction
    4. Code-excited linear prediction (CELP)
    5. Frequency-domain coding
  11. Speech enhancement
    1. Noise attenuation
    2. Echo cancellation
    3. Dereverberation
    4. Source separation
  12. Speech analysis and imaging for medical applications
    1. Electroglottography (Wikipedia)
    2. Stroboscopy and videokymography (Wikipedia)
    3. Highspeed camera
    4. MRI
    5. Rothenberg mask
    6. Glottal inverse filtering
  13. Computational models of speech perception and language acquisition
  14. References



Space contributors

{"mode":"list","scope":"descendants","limit":"5","showLastTime":"true","order":"update","contextEntityId":148294278}


  • No labels