Experimental Speech recognition from pathological voices
HADJI SALAH
Laboratory of nano-materials (LANSER) of Energy Center (CRTEn)
Technopole of Borj-cedria, Hamm-lif, Tunis, 2050
TUNISIA
Abstract: Speech recognition has been the subject of quite a few research subjects as it is the adequate
means for dynamic, efficient and interaction Human communication simultaneously using the two
phenomena of phonation and hearing between speakers, the applications of its searches are enormous
for example one can quote: the dictation, the speech synthesis within the Windows software, the
speech recognition of the Google search engine at the Smartphone level …… ..etc. all its applications
depend on the conditions of use in which they are implemented, to be done and to overcome the
puzzles of imperfection it is necessary to be sure to properly characterize the speech signal by
extracting the most relevant characters such as: the fundamental frequency (pitch in English), timbre,
tonality, to extract them many techniques are possible, the most used of which are acoustic such as:
MFCC, PLP, LPC, RASTA and other in the form of combination (hybridization) namely: PLP
RASTA, MFCC PLP etc. They are used in data transmission, speaker recognition and even in speech
synthesis.
Keywords: speech signal, Parametrisation, SVM, pathological voices, classifier, MFCC, PLP,
RASTA, LPC
Received: April 18, 2021. Revised: August 17, 2022. Accepted: September 23, 2022. Published: October 31, 2022.
1. Introduction
The extraction of acoustic parameters or
characteristics, such as fundamental frequency,
formants, etc. is done by applying signal
processing methods which are for example:
time-frequency analysis, spectral analysis,
Cepstral analysis..etc Parameterization
constitutes the initial block (fig.2) for any
recognition of a speech signal, its role is to
extract from a speech signal the most relevant
information possible in order to be able to
make a separation between the sounds [8]. The
extracted information is presented as a
sequence of acoustic vectors. In order to be
able to extract these parameters, several
methods exist, taking into account the
superposition of the noises of the sounds, we
will make a comparison of the different
methods (MFCC, PLP, PLP RASTA, and the
combination of several other parameters such
as LPC, pitch, forming, energy). Given the
redundancy of the speech signal and its
complexity, to process it, different methods are
admitted to have a better parameterization. In
this paper we will give a brief overview on the
signal processing tools such as short-term
energy and weighting windows,then see the
different speech signal parameterization
methods which are: LPC (Linear Predictive
Coding) analysis, Homomorphic or Cepstral
analysis on which the MFCC (Mel Frequency
Cepstral Coefficient) is based, PLP (Predictive
Linear Perceptual) and PLP-RASTA (Real
Ative SpecTrA).this involves using an SVM
classification to distinguish between speech
signals from people with speech pathology
(Nodule or Oedeme) and normal signals (no
pathology). In this paper, two types of
classifications have been used:
- A classification in two classes: in which we
used samples of corpus from pathological
signals (Nodule and Oedema) and another
from normal signals to make this classification,
WSEAS TRANSACTIONS on SIGNAL PROCESSING
DOI: 10.37394/232014.2022.18.23