When using speech technology in real environments, we are often faced with less than perfect signal quality. For example, if you make a phone call at cafeteria, typically you have plenty of other people speaking in the background, there could be music playing and the room itself can have reverberation. Such effects distort the desired speech signal such that the receiving end, the desired speech sounds less pleasant, requires more effort to understand or at the worst case, it becomes less intelligible. Speech enhancement refers to methods which try to reduce such distortions, to make speech sounds more pleasant, reduce listening effort and improve intelligibility.

The most prominent categories of speech enhancement are:

  1. Noise attenuation, where we try to extract the desired speech signalm when distorted by background noise(s).
  2. Echo cancellation and feedback cancellation are used when the sound played from a loudspeaker is picked up by a microphone distorting the desired signal.
  3. Dereverberation refers to methods which attenuate the effect of room acoustics on the desired signal.
  4. Source separation methods try to extract sounds of single sources from a mixture, for example, in the classical cocktail-party problem, we would like to isolate single speakers when multiple people are talking at the same time.
  5. Beamforming refers to spatially selective methods, where the objective is isolate sounds coming from a particular direction, by using the information about the spatial separation of a set of microphones.