The upper part of each image is the signal time-
domain waveform diagram, and the lower part is the
graph of the number of principal components finally
obtained over time. The vertical direction of the two
images corresponds to the same time.
The thresholds of the number of principal
components are taken as 13 (take more than 90% of
components) and 18 (take more than 95% of
components). Only from the distinction between
unvoiced and voiced sounds, it can be seen from the
above result graph that the position of the red vertical
line in the figure can accurately correspond to the
segmentation of unvoiced and voiced sounds in the
time domain waveform. That is, the method can
segment unvoiced and voiced sounds. However, in the
"face" signal, it can be seen that there is a silent signal
in the signal. Although the voiced and unvoiced sounds
can still be segmented, the existence of the silent signal
cannot be distinguished. Therefore, the silent signal
should be extracted first, and then segmented by the
method in this paper.
5 Conclusion
This paper firstly studies the relationship between
the number of principal components and the frame
length after the monophone signal is divided into
frames and reduced in dimension. As the frame length
increases, the number of principal components tends
to a limit for voiced sounds, while for unvoiced
sounds, the number of principal components increases
approximately linearly. And under the same frame
length, the number of principal components of
different phoneme pronunciation signals is different.
Further research on continuous speech segmentation
by local PCA is carried out. That is, the set of speech
frames that are very close in time is used for PCA
analysis, and the graph of the number of local principal
components over time is obtained and compared with
the time-domain waveforms. It is found that the
segmentation of voiced and unvoiced sounds can be
effectively performed by setting the threshold. Future
research will be carried out from the segmentation of
silent segments and unvoiced or voiced sounds. We
will strive to achieve high-accuracy real-time
segmentation for it that is different from traditional
methods.
References:
[1] D. Ridha and S. Suyanto, Removing Unvoiced
Segment to Improve Text Independent Speaker
Recognition, 2019 International Seminar on
Research of Information Technology and
Intelligent Systems (ISRITI), 2019, pp. 50-53.
[2] Qizheng Huang, Changchun Bao, Xianyun
Wang, Yang Xiang, Speech enhancement method
based on multi-band excitation model, Applied
Acoustics, 2020, Volume 163.
[3] J. Yang, Z. Li, and P. Su, Review of speech
segmentation and endpoint detection, Journal of
Computer Applications, Vol.40, No.1, 2020,
pp.1-7.
[4] A.K.Alimuradov, Enhancement of Speech Signal
Segmentation Using Teager Energy Operator,
2021 23rd International Conference on Digital
Signal Processing and its Applications (DSPA),
2021, pp. 1-7.
[5] A.K. Alimuradov, Speech/Pause Segmentation
Method Based on Teager Energy Operator and
Short-Time Energy Analysis, 2021 Ural
Symposium on Biomedical Engineering, Radio
electronics and Information Technology
(USBEREIT), 2021, pp. 0045-0048.
[6] R.Bachu, S. Kopparthi,B. Adapa, and B.
Barkana, Separation of voiced and unvoiced
using zero crossing rate and energy of the speech
signal, American Society for Engineering
Education(ASEE) Zone Conference
Proceedings,2008, pp. 1-7.
[7] K. Struwe, Voiced-Unvoiced Classification of
Speech Using a Neural Network Trained with
LPC Coefficients, 2017 International Conference
on Control, Artificial Intelligence, Robotics &
Optimization (ICCAIRO), 2017, pp. 56-59.
[8] M. Musaev, I. Khujayorov and M. Ochilov, The
Use of Neural Networks to Improve the
Recognition Accuracy of Explosive and
Unvoiced Phonemes in Uzbek Language, 2020
Information Communication Technologies
Conference (ICTC), 2020, pp. 231-234.
[9] Herve Cardot, David Degras, Online principal
component analysis in high dimension: Which
algorithm to choose? International Statistical
Review, Vol.86, No.1, 2018, pp.29-50.
[10] S. Xiangbo and T. Wei, Research on
Multidimensional User Experience Evaluation
Model Based on Principal Component Analysis,
2020 IEEE International Conference on Power,
Intelligent Computing and Systems (ICPICS),
2020, pp. 554-557.
[11] S. Alakkari and J. Dingliana, Modelling Large
Scale Datasets Using Partitioning-Based PCA,
2019 IEEE International Conference on Image
Processing (ICIP), 2019, pp. 2646-2650.
[12] F. Jing, H. Shaohai and M. Xiaole, SAR image
de-noising via grouping-based PCA and guided
WSEAS TRANSACTIONS on SYSTEMS
DOI: 10.37394/23202.2023.22.44
Zhaoting Liu, Xiaodong Zhuang, Nikos Mastorakis