Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

List of authors: Tom Bäckström, Okko Räsänen, Abraham Zewoudie, Pablo Pérez Zarazaga

Includes contributions from Sneha Das


Table of contents

  1. Introduction
    1. Why speech processing?
    2. Speech production and acoustic properties
    3. Phonetics (Wikipedia)
    4. Linguistics (Wikipedia)
    5. Speech perception (Wikipedia)
    6. Speech-Language pathology (Wikipedia)
    7. Applications and systems structures
  2. Basic representations and models
    1. Waveform
    2. Windowing
    3. Spectrogram and the STFT
    4. Autocorrelation and autocovariance
    5. Cepstrum and MFCC
    6. Linear prediction
    7. Fundamental frequency (F0)
    8. Zero-crossing rate
    9. Deltas and Delta-deltas
    10. PSOLA
    11. Jitter, shimmer, harmonicity etc (external link)
  3. Pre-processing
    1. Pre-emphasis
    2. Voice activity detection (VAD)
    3. Vocal tract length normalization
    4. Speech enhancement
  4. Modelling tools in speech processing
    1. Linear regression
    2. Sub-space models
    3. Vector quantization (VQ)
    4. Gaussian mixture model (GMM)
    5. Neural networks
    6. Non-negative Matrix and Tensor Factorization
    7. Hidden Markov Models
  5. Evaluation of speech processing methods
    1. Subjective quality evaluation
    2. Objective quality evaluation
    3. Other performance measures
    4. Analysis of evaluation results
  6. Speech analysis
    1. Fundamental frequency estimation
    2. Formant estimation and tracking
    3. Inverse filtering for glottal activity estimation
  7. Recognition tasks in speech processing
    1. Voice activity detection (VAD)
    2. Keyword or wake-word spotting
    3. Speech recognition
    4. Speaker recognition and verification

    5. Speaker diarization

    6. Paralinguistic speech processing
  8. Natural language processing
  9. Speech Synthesis
    1. Concatenative synthesis
    2. Parametric synthesis
  10. Transmission, storage and telecommunication
    1. Short history of speech coding
    2. Design goals
    3. Basic tools
      1. Modified discrete cosine transform (MDCT)
      2. Entropy coding
      3. Perceptual modelling in speech and audio coding
      4. Vector quantization (VQ)
      5. Linear prediction
    4. Code-excited linear prediction (CELP)
    5. Frequency-domain coding
  11. Speech enhancement
    1. Single-channel enhancement
    2. Multi-channel enhancement
  12. Speech analysis and imaging for medical applications
    1. Electroglottography (Wikipedia)
    2. Stroboscopy and videokymography (Wikipedia)
    3. Highspeed camera
    4. MRI
    5. Rothenberg mask
    6. Glottal inverse filtering
  13. Chatbots / Conversational design (external link)
  14. Computational models of speech perception and language acquisition
  15. Security and privacy in speech technology
  16. References



Recent space activity

Recently Updated
typespage, comment, blogpost
max5
hideHeadingtrue
themesocial

Space contributors

Contributors
modelist
scopedescendants
limit5
showLastTimetrue
orderupdate