R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Win-
ter, C. Hesse, M. Chen, E. Sigler, M. Litwin,
S. Gray, B. Chess, J. Clark, C. Berner,
S. McCandlish, A. Radford, I. Sutskever, and
D. Amodei, “Language models are few-shot
learners,” Advances in neural information pro-
cessing systems, vol. 33, pp. 1877–1901, 2020.
[24] A. Chowdhery, S. Narang, J. Devlin, M. Bosma,
G. Mishra, A. Roberts, P. Barham, H. W.
Chung, C. Sutton, S. Gehrmann, P. Schuh,
K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao,
P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran,
E. Reif, N. Du, B. Hutchinson, R. Pope, J. Brad-
bury, J. Austin, M. Isard, G. Gur-Ari, P. Yin,
T. Duke, A. Levskaya, S. Ghemawat, S. Dev,
H. Michalewski, X. Garcia, V. Misra, K. Robin-
son, L. Fedus, D. Zhou, D. Ippolito, D. Luan,
H. Lim, B. Zoph, A. Spiridonov, R. Sepassi,
D. Dohan, S. Agrawal, M. Omernick, A. M.
Dai, T. S. Pillai, M. Pellat, A. Lewkowycz,
E. Moreira, R. Child, O. Polozov, K. Lee,
Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat,
M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck,
J. Dean, S. Petrov, and N. Fiedel, “Palm: Scal-
ing language modeling with pathways,” Journal
of Machine Learning Research, vol. 24, no. 240,
pp. 1–113, 2023.
[25] R. Taylor, M. Kardas, G. Cucurull, T. Scialom,
A. Hartshorn, E. Saravia, A. Poulton, V. Kerkez,
and R. Stojnic, “Galactica: A large lan-
guage model for science,” arXiv preprint
arXiv:2211.09085, 2022,
https://doi.org/10.48550/arXiv.2211.09085.
[26] H. Touvron, T. Lavril, G. Izacard, X. Mar-
tinet, M.-A. Lachaux, T. Lacroix, B. Rozière,
N. Goyal, E. Hambro, F. Azhar, A. Rodriguez,
A. Joulin, E. Grave, and G. Lample, “Llama:
Open and efficient foundation language mod-
els,” arXiv preprint arXiv:2302.13971, 2023,
https://doi.org/10.48550/arXiv.2302.13971.
[27] Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang,
K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang,
W. Ye, Y. Zhang, Y. Chang, P. S. Yu, Q. Yang,
and X. Xie, “A survey on evaluation of large
language models,” ACM Transactions on Intel-
ligent Systems and Technology, vol. 15, no. 3,
pp. 1–45, 2024,
https://doi.org/10.1145/3641289.
[28] W. X. Zhao, K. Zhou, J. Li, T. Tang,
X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang,
Z. Dong, Y. Du, C. Yang, Y. Chen, Z. Chen,
J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu,
P. Liu, J.-Y. Nie, and J.-R. Wen, “A sur-
vey of large language models,” arXiv preprint
arXiv:2303.18223, 2023,
https://doi.org/10.48550/arXiv.2303.18223.
[29] S. M. Thede and M. Harper, “A second-order
hidden markov model for part-of-speech tag-
ging,” in Proceedings of the 37th annual meet-
ing of the Association for Computational Lin-
guistics, 1999, pp. 175–182.
[30] L. R. Bahl, P. F. Brown, P. V. De Souza, and
R. L. Mercer, “A tree-based statistical language
model for natural language speech recognition,”
IEEE Transactions on Acoustics, Speech, and
Signal Processing, vol. 37, no. 7, pp. 1001–
1008, 1989,
https://doi.org/10.1109/29.32278.
[31] T. Brants, A. Popat, P. Xu, F. J. Och, and
J. Dean, “Large language models in machine
translation,” in Proceedings of the 2007 Joint
Conference on Empirical Methods in Natu-
ral Language Processing and Computational
Natural Language Learning (EMNLP-CoNLL),
2007, pp. 858–867.
[32] X. Liu and W. B. Croft, “Statistical language
modeling for information retrieval.” Annu. Rev.
Inf. Sci. Technol., vol. 39, no. 1, pp. 1–31, 2005.
[33] C. Zhai, “Statistical language models for infor-
mation retrieval a critical review,” Foundations
and Trends® in Information Retrieval, vol. 2,
no. 3, pp. 137–213, 2008. [Online]. Available:
10.1561/1500000008
[34] T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ,
and S. Khudanpur, “Recurrent neural network
based language model.” in Interspeech, vol. 2,
no. 3. Makuhari, 2010, pp. 1045–1048.
[35] T. Mikolov, K. Chen, G. Corrado, and J. Dean,
“Efficient estimation of word representations in
vector space,” arXiv preprint arXiv:1301.3781,
2013,
https://doi.org/10.48550/arXiv.1301.3781.
[36] J. Pennington, R. Socher, and C. D. Manning,
“Glove: Global vectors for word representa-
tion,” in Proceedings of the 2014 conference on
empirical methods in natural language process-
ing (EMNLP), 2014, pp. 1532–1543.
[37] P. Bojanowski, E. Grave, A. Joulin, and
T. Mikolov, “Enriching word vectors with sub-
word information,” Transactions of the associ-
ation for computational linguistics, vol. 5, pp.
135–146, 2017,
https://doi.org/10.1162/tacl_a_00051.
WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS
DOI: 10.37394/23209.2024.21.39
Jean Gilbert Mbula Mboma, Obed Tshimanga Tshipata,
Witesyavwirwa Vianney Kambale,
Mohamed Salem, Mudiampimpa Tshyster Joel,
Kyandoghere Kyamakya