De-essing using Temporal Sibilance Processing

The following de-esser software is made available in the hope that it might be useful to others who need to reduce sibilance in speech.

This de-esser uses a novel approach called Temporal Sibilance Processing (if you have a better idea for naming this technique, or if it has been done before, please enlighten the author). The idea is to distinguish between fricatives and voiced sections of the speech signal by the number of zero crossings in time. When a fricative is identified, it is put into one of 3 classes:
1. Short fricatives are left untouched since, even if they are quite loud, they are needed to give crispness and clarity to speech.
2. Medium-length fricatives above threshold level 1 are filtered through a FIR filter with frequency response determined by eq1.cfg.
3. Long fricatives above threshold level 2 are filtered through a FIR filter with frequency response determined by eq2.cfg.

Most of the speech file is left untouched (the samples are directly copied from source to destination). Only fricatives that are long enough and loud enough are filtered. The advantage of this approach over traditional approaches is that the clarity of the remaining speech is completely unaffected.

The software could be modified to classify the fricatives into more classes with different processing for each class. A number of parameters are hard-coded and should be made into configurable options.

de-ess is now hosted on GitHub