
Fig. 9 Local PCA segmentation results of 'face' word signal (more than 95%
components)
Fig. 10 Local PCA segmentation results of 'wash' word signal (more than 90%
components)
Fig. 11 Local PCA segmentation results of 'wash' word signal (more than 95%
components)
The upper part of each image is the signal time-domain
waveform diagram, and the lower part is the graph of the number
of principal components finally obtained over time. The vertical
direction of the two images corresponds to the same time.
The thresholds of the number of principal components are
taken as 13 (take more than 90% components) and 18 (take more
than 95% components). Only from the distinction between
unvoiced and voiced sounds, it can be seen from the above result
graph that the position of the red vertical line in the figure can
be accurately correspond to the segmentation of unvoiced and
voiced sounds in the time domain waveform. That is, the method
can segment unvoiced and voiced sounds. However, in the
"face" signal, it can be seen that there is a silent signal in the
signal. Although the voiced and unvoiced sounds can still be
segmented, the existence of the silent signal cannot be
distinguished. Therefore, the silent signal should be extracted
first, and then segmented by the method in this paper.
This paper firstly studies the relationship between the
number of principal components and the frame length after the
monophone signal is divided into frames and reduced in
dimension. As the frame length increases, the number of
principal components tends to a limit for voiced sounds, while
for unvoiced sounds, the number of principal components
increases approximately linearly. And under the same frame
length, the number of principal components of different
phoneme pronunciation signals is different. Further research on
continuous speech segmentation by local PCA is carried out.
That is, the set of speech frames that are very close in time is
used for PCA analysis, and the graph of the number of local
principal components over time is obtained and compared with
the time-domain waveforms. It is found that the segmentation
of voiced and unvoiced sounds can be effectively performed by
setting the threshold. Future research will be carried out from
the segmentation of silent segments and unvoiced or voiced
sounds. We will strive to achieve high-accuracy real-time
segmentation for it that is different from traditional methods.
[1] Qizheng Huang, Changchun Bao, Xianyun Wang, Yang Xiang, "Speech
enhancement method based on multi-band excitation model ", Applied
Acoustics, 2020, Volume 163.
[2] J. Yang, Z. Li, and P. Su,“Review of speech segmentation and endpoint
detection,”Journal of Computer Applications, 2020, pp.1-7.
[3] D. Ridha and S. Suyanto, "Removing Unvoiced Segment to Improve Text
Independent Speaker Recognition," 2019 International Seminar on
Research of Information Technology and Intelligent Systems (ISRITI),
2019, pp. 50-53.
[4] A.K.Alimuradov, "Enhancement of Speech Signal Segmentation Using
Teager Energy Operator," 2021 23rd International Conference on Digital
Signal Processing and its Applications (DSPA), 2021, pp. 1-7.
[5] A.K. Alimuradov, "Speech/Pause Segmentation Method Based on Teager
Energy Operator and Short-Time Energy Analysis," 2021 Ural
Symposium on Biomedical Engineering, Radio electronics and
Information Technology (USBEREIT), 2021, pp. 0045-0048.
[6] R.Bachu, S. Kopparthi,B. Adapa, and B. Barkana,"Separation of voiced
and unvoiced using zero crossing rate and energy of the speech signal,”
American Society for Engineering Education(ASEE) Zone Conference
Proceedings,2008,pp. 1-7.
[7] K. Struwe, "Voiced-Unvoiced Classification of Speech Using a Neural
Network Trained with LPC Coefficients," 2017 International Conference
on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO),
2017, pp. 56-59.
[8] M. Musaev, I. Khujayorov and M. Ochilov, "The Use of Neural Networks
to Improve the Recognition Accuracy of Explosive and Unvoiced
Phonemes in Uzbek Language," 2020 Information Communication
Technologies Conference (ICTC), 2020, pp. 231-234.
[9] Herve Cardot, David Degras, "Online principal component analysis in
high dimension: Which algorithm to choose?” International Statistical
Review, 2018,pp.29-50.
0 500 1000 1500 2000
0
10
20
30
40
50
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
-0.5
0
0.5
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
10
20
30
0 1000 2000 3000 4000 5000 6000 7000 8000
-0.2
-0.1
0
0.1
0.2
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
10
20
30
40
0 1000 2000 3000 4000 5000 6000 7000 8000
-0.2
-0.1
0
0.1
0.2
5. Conclusion
References
WSEAS TRANSACTIONS on SIGNAL PROCESSING
DOI: 10.37394/232014.2022.18.9
Zhaoting Liu, Zhongxiao Li,
Xiaodong Zhuang, Nikos Mastorakis