
Experimental Speech recognition from pathological voices
HADJI SALAH
Laboratory of nano-materials (LANSER) of Energy Center (CRTEn)
Technopole of Borj-cedria, Hamm-lif, Tunis, 2050, TUNISIA
Abstract: Speech recognition has been the subject of quite a few research subjects as it is the adequate means
for dynamic, efficient and interaction Human communication simultaneously using the two phenomena of
phonation and hearing between speakers, the applications of its searches are enormous for example one can
quote: the dictation, the speech synthesis within the Windows software, the speech recognition of the Google
search engine at the Smartphone level …… ..etc. all its applications depend on the conditions of use in which
they are implemented, to be done and to overcome the puzzles of imperfection it is necessary to be sure to
properly characterize the speech signal by extracting the most relevant characters such as: the fundamental
frequency (pitch in English), timbre, tonality, to extract them many techniques are possible, the most used of
which are acoustic such as: MFCC, PLP, LPC, RASTA and other in the form of combination (hybridization)
namely: PLP RASTA, MFCC PLP …………… etc. They are used in data transmission, speaker recognition
and even in speech synthesis.
Keywords: speech signal, Parametrisation, SVM, pathological voices, classifier, MFCC, PLP, RASTA, LPC
Received: March 15, 2022. Revised: October 11, 2022. Accepted: November 15, 2022. Published: December 31, 2022.
1. Introduction:
The extraction of acoustic parameters or
characteristics, such as fundamental frequency,
formants, etc. is done by applying signal
processing methods which are for example: time-
frequency analysis, spectral analysis, Cepstral
analysis..etc Parameterization constitutes the initial
block (fig.2) for any recognition of a speech
signal, its role is to extract from a speech signal the
most relevant information possible in order to be
able to make a separation between the sounds [8].
The extracted information is presented as a
sequence of acoustic vectors. In order to be able to
extract these parameters, several methods exist,
taking into account the superposition of the noises
of the sounds, we will make a comparison of the
different methods (MFCC, PLP, PLP RASTA, and
the combination of several other parameters such
as LPC, pitch, forming, energy). Given the
redundancy of the speech signal and its
complexity, to process it, different methods are
admitted to have a better parameterization. In this
paper we will give a brief overview on the signal
processing tools such as short-term energy and
weighting windows,then see the different speech
signal parameterization methods which are: LPC
(Linear Predictive Coding) analysis,
Homomorphic or Cepstral analysis on which the
MFCC (Mel Frequency Cepstral Coefficient) is
based, PLP (Predictive Linear Perceptual) and
PLP-RASTA (Real Ative SpecTrA).this involves
using an SVM classification to distinguish
between speech signals from people with speech
pathology (Nodule or Oedeme) and normal signals
(no pathology). In this paper, two types of
classifications have been used:
- A classification in two classes: in which we used
samples of corpus from pathological signals
(Nodule and Oedema) and another from normal
signals to make this classification, (see Fig.)
Shows us the principle of classification.
- A multi-class classification: in which we took
samples of each type of pathology among the two
that we have (Nodule and Oedme) to constitute the
first and second classes and samples from normal
signals, which is for the third class. To perform a
multi-class classification, we used a One VS all
type of algorithm, that is to say "one against all",
International Journal on Applied Physics and Engineering
DOI: 10.37394/232030.2023.1.2