Page tree
Skip to end of metadata
Go to start of metadata

List of authors: Tom Bäckström, Okko Räsänen, Abraham Zewoudie, Pablo Pérez Zarazaga

Includes contributions from Sneha Das


Table of contents

  1. Introduction
    1. Why speech processing?
    2. Speech production and acoustic properties
    3. Phonetics (Wikipedia)
    4. Linguistics (Wikipedia)
    5. Speech perception (Wikipedia)
    6. Speech-Language pathology (Wikipedia)
    7. Applications and systems structures
  2. Basic representations and models
    1. Waveform
    2. Windowing
    3. Spectrogram and the STFT
    4. Autocorrelation and autocovariance
    5. Cepstrum and MFCC
    6. Linear prediction
    7. Fundamental frequency (F0)
    8. Zero-crossing rate
    9. Deltas and Delta-deltas
    10. PSOLA
    11. Jitter, shimmer, harmonicity etc (external link)
  3. Pre-processing
    1. Pre-emphasis
    2. Noise gate (Wikipedia)
    3. Dynamic Range Compression (Wikipedia)
    4. Voice activity detection (VAD)
    5. Speech enhancement
  4. Modelling tools in speech processing
    1. Linear regression
    2. Sub-space models
    3. Vector quantization (VQ)
    4. Gaussian mixture model (GMM)
    5. Neural networks
    6. Non-negative Matrix and Tensor Factorization
  5. Evaluation of speech processing methods
    1. Subjective quality evaluation
    2. Objective quality evaluation
    3. Other performance measures
    4. Analysis of evaluation results
  6. Speech analysis
    1. Fundamental frequency estimation
    2. Formant estimation and tracking
    3. Inverse filtering for glottal activity estimation
  7. Recognition tasks in speech processing
    1. Voice activity detection (VAD)
    2. Keyword or wake-word spotting
    3. Speech recognition
    4. Speaker recognition and verification

    5. Speaker diarization

    6. Paralinguistic speech processing
  8. Natural language processing
  9. Speech synthesis
    1. Concatenative speech synthesis
    2. Statistical parametric speech synthesis
  10. Transmission, storage and telecommunication
    1. Design goals
    2. Basic tools
      1. Modified discrete cosine transform (MDCT)
      2. Entropy coding
      3. Perceptual modelling in speech and audio coding
      4. Vector quantization (VQ)
      5. Linear prediction
    3. Code-excited linear prediction (CELP)
    4. Frequency-domain coding
  11. Speech enhancement
    1. Noise attenuation
    2. Echo cancellation
    3. Dereverberation
    4. Source separation
    5. Beamforming
  12. Speech analysis and imaging for medical applications
    1. Electroglottography (Wikipedia)
    2. Stroboscopy and videokymography (Wikipedia)
    3. Highspeed camera
    4. MRI
    5. Rothenberg mask
    6. Glottal inverse filtering
  13. Chatbots / Conversational design (external link)
  14. Computational models of human speech processing
  15. Security and privacy in speech technology
  16. References



Space contributors

{"mode":"list","scope":"descendants","limit":"5","showLastTime":"true","order":"update","contextEntityId":148294278}


  • No labels