The emergence of the COVID-19 pandemic has
necessitated the adoption of stringent public health measures,
with widespread face mask usage being one of the most
effective means of preventing the spread of infectious diseases
in communal settings. However, ensuring compliance with
mask-wearing protocols in public spaces presents a
formidable challenge for authorities and organizations
worldwide. In response to this challenge, the development of
automated face mask detection systems has garnered
significant attention as a promising solution to enforce mask-
wearing guidelines efficiently.
This article provides an in-depth exploration of recent
advancements in face mask detection technology, examining
the underlying methodologies, applications, challenges, and
future prospects of such systems. As the pandemic continues
to evolve, the need for reliable and scalable solutions for
monitoring mask compliance remains paramount. By
leveraging cutting-edge technologies such as computer vision,
deep learning, and edge computing, researchers and
practitioners have made substantial progress in the
development of robust and real-time face mask detection
systems.
Convolutional neural networks (CNNs), in particular,
have demonstrated remarkable capabilities in accurately
detecting faces and distinguishing between masked and
unmasked individuals. Moreover, the integration of
complementary technologies such as thermal imaging, edge
computing, and privacy-preserving techniques has further
enhanced the utility and efficacy of face mask detection
systems in diverse operational settings.
Beyond the immediate imperative of pandemic
management, face mask detection systems hold immense
potential for addressing broader societal challenges, including
security surveillance, access control, and public safety
monitoring. By leveraging the insights gained from research
and real-world deployments, policymakers, businesses, and
public health authorities can develop evidence-based
strategies to promote mask compliance and mitigate the risk
of disease transmission.
However, the widespread adoption of face mask detection
systems also raises important ethical, legal, and societal
considerations. Issues related to privacy, bias, algorithmic
fairness, and consent must be carefully addressed to ensure
that these systems are deployed responsibly and ethically.
Furthermore, the interoperability and standardization of face
mask detection technologies are essential to facilitate seamless
integration with existing infrastructure and interoperability
across different platforms.
In light of these considerations, this article aims to provide
a comprehensive overview of the YOLO [5] based face mask
detection systems, offering insights into the technological
advancements, practical applications, and ethical implications
of this rapidly evolving field. By synthesizing existing
research and identifying key challenges and opportunities, we
seek to inform future research directions and contribute to the
development of effective and equitable solutions for
promoting mask compliance and safeguarding public health.
The aim of this article is to provide a comprehensive
analysis of face mask detection systems, encompassing the
technological advancements, practical applications,
challenges, and ethical considerations in this emerging field.
A. Velip and A. Dessai [1] proposed a multi-task learning
framework combining face detection and mask classification,
achieving high accuracy in diverse environmental conditions.
This study proposes a real-time face mask detection system
using deep learning techniques. The authors employ a CNN
architecture for face detection and mask classification.
S. Sakshi et al. [2] present a face mask detection system
based on a CNN model trained on a large dataset of masked
and unmasked faces. Their system exhibits robust
1. Introduction
Proposed Activation Function Based Deep Learning Approach for Real-
Time Face Mask Detection System
NAY KYI TUN1, AYE MIN MYAT2
1Faculty of Computer System and Technology, Myanmar Information Institute of Technology
Mandalay, MYANMAR
2University of Technology
(Yadanapon Cyber City) Pyin Oo Lwin, MYANMAR
Abstract:The ongoing global pandemic has underscored the importance of effective preventive measures such as wearing face
masks in public spaces. In this paper, we propose a deep learning-based approach for real-time face mask detection to aid in
enforcing mask-wearing protocols. Our system utilizes convolutional neural networks (CNNs) to automatically detect whether
individuals in images or video streams are wearing masks or not. The proposed system consists of three main stages: face detection,
face mask classification, and real-time monitoring. Firstly, faces are localized in the input image or video frame using a proposed
face detection model. Then, the detected faces are fed into a proposed CNN model for mask classification, which determines whether
each face is covered with a mask or not. Finally, the system will provide real-time monitoring and alerts authorities or stakeholders
about non-compliance with mask-wearing guidelines. We evaluate the performance of our system on publicly available datasets and
demonstrate its effectiveness in accurately detecting face masks in various scenarios. Additionally, we discuss the challenges and
limitations of deploying such a system in real-world settings, including issues related to privacy, bias, and scalability. Overall, our
proposed face mask detection system offers a promising solution for automated monitoring and enforcement of face mask policies,
contributing to public health efforts in mitigating the spread of contagious diseases.
Key-words: — CNN, Face Mask, Detection, Classification, YOLO.
Received: March 27, 2024. Revised: September 5, 2024. Accepted: September 24, 2024. Published: October 17, 2024.
2. Aim
3. Related Work
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2024.6.25
Nay Kyi Tun, Aye Min Myat
E-ISSN: 2769-2507
211
Volume 6, 2024
performance across various environmental conditions and
lighting scenarios. Additionally, they explore the impact of
different CNN architectures on detection accuracy.
N. Kowalczyk et al. [3] propose a comprehensive face
mask detection and recognition system integrated with
thermal imaging technology. Their system combines deep
learning-based mask detection with facial recognition to
identify individuals and enforce mask-wearing protocols in
public spaces. The study emphasizes the importance of
multimodal approaches for enhanced accuracy and reliability.
D. Singh and S. K. Joshi [4] introduce a lightweight CNN
model tailored for real-time face mask detection on edge
devices with limited computational resources. Their system
achieves competitive performance while maintaining low
computational overhead, making it suitable for deployment in
resource-constrained environments such as embedded
systems and IoT devices.
These studies represent a diverse range of approaches and
methodologies for face mask detection, spanning from real-
time systems using deep learning to privacy-preserving
solutions leveraging federated learning. By building upon and
extending the findings of these works, our proposed face mask
detection system aims to contribute to the advancement of this
critical area of research.
In this paper, we proposed our new YOLO-based detection
algorithm for face mask detection system. The block diagram
of our proposed system consists of the training section and the
detection section that were shown in fig.1.
Fig. 1. Block diagram of proposed face mask detection system.
In the training section, firstly we make a ground truth
dataset with the original image dataset for detection. And we
apply preprocessing to the image. We extract features of the
image by using our proposed Convolutional Neural Network
without consisting of the fully connected layer. The
architecture is concatenated with the YOLOv2 detection
network model for the part of detection. And then, we train
the image dataset. As for the classification model training, the
input dataset images are applied preprocessing such as
resizing and scaling. The input image size is changed to
224x224 pixels. The smaller the size of image, the fewer the
features of image. After resizing the face mask images, all
these images are trained by using the proposed Convolutional
Neural Network for face mask classification.
In the detection section, the input is obtained from the
video or webcam. The proposed system is captured these
input frames and detected face by using our proposed
detection model. After detecting the face, the system applied
face mask classification which is based on the result of face
detection.
In the process flow diagram of the proposed system, the
system is started by capturing the input frame from the
webcam. Face are detected by using the proposed object
detection method which is based on the YOLOv2
algorithm[6]. If the input frame detects face, the proposed
CNN architecture is applied for the classification of face
mask. And then, the value of classification results is shown
as an output. If the input frame does not contain a face or does
not detects face, the process of detection will return to the
input stage, which is shown in fig. 2.
Fig. 2. Flow chart of proposed face mask detection system.
In the stage of the preprocessing for detection, data
augmentation is applied to the ground truth label image. We
used data augmentation to improve the network accuracy and
we also made a random transformation to original data to
increase the labeled training data. We apply random horizontal
flipping and random X/Y scaling. We also use color space
transformation to convert RGB to HSV color space. And then,
we jitter image color to make randomly augment the color of
each pixel.
As for the stage of training for face mask classification, the
data augmentation techniques are also used that includes scale
augmentation and position augmentation. Scale augmentation
technique and position augmentation is used to augment the
scale and position of training images. We crop and resize the
No
Face Detection
Face Mask Classification
Start
Input Frame
Detected
Trained
Models
Display Face Mask Result
End
Detection
Capture Input
Image
Face Detection
Output with
Bounding box
Training Section
Detection Section
Trained
Models
Feature Extraction
using Proposed
CNN
Input Image
Dataset
Ground-truth
Labeling
Image
Preprocessing
Combine with
YOLOv2
Train for
Detection using
Proposed CNN
Image
Preprocessing
Input Image
Train for
Classification
using Proposed
CNN
Face Mask
Classification
Input Video/Webcam
4. Methodology
4.1 Preprocessing
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2024.6.25
Nay Kyi Tun, Aye Min Myat
E-ISSN: 2769-2507
212
Volume 6, 2024
detected face image into 224x224x3 pixel image. After the
stages of preprocessing, we trained the detected model and
classification model by using our proposed CNN
architectures. The preprocessing result of detection and
classification are shown in Fig.3 and Fig.4.
Fig. 3. Preprocessing of input dataset image for detection
Fig. 4. Preprocessing of input dataset image for classification.
In our proposed system, we make detection of face by
using the proposed detection model which is transferred from
the YOLOv2 object detection model. Before the development
of the YOLOv2 algorithm, we augmented the dataset which is
not only to enlarge our utilized dataset artificially but also to
lessen the likelihood of overfitting throughout the training
stage [7].
In the architecture of CNN, the input layer receives the
image with a size of 416x416 with RGB. The architecture and
parameters of the proposed face detection model is shown in
fig. 5 and Table I. The architecture consists of the feature
extraction part and detection part. In the feature extraction, we
used our tiny CNN architecture which is shown in Table I. In
a typical CNN, full-connected layers are usually placed
toward the end of the architecture [8]. In our proposed
architecture, we operate a series of convolution processes
without consisting of a fully connected layer because we want
to replace the YOLO detection sub-network. We apply the
YOLO convolution layers, YOLO transform layer, and
YOLO output layer. Finally, the result of face detection
displays as an output. There are 8 convolution layers and each
convolution layer consists of convolution, batch
normalization and proposed activation function layer.
Fig. 5. Proposed detection network architecture.
TABLE I. THE LAYER VALUES OF PROPOSED CNN ARCHITECTURE FOR
FACE DETECTION
Type
Filters
Size/Stride
Output
Conv1


MaxPool1


Conv2


Conv3


MaxPool3


Conv4


MaxPool4


Conv5


MaxPool5


Conv6


MaxPool6


Conv7


MaxPool7


Conv8


MaxPool8

4.2 Proposed Detection Network Based
on Yolov2 Algorithm
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2024.6.25
Nay Kyi Tun, Aye Min Myat
E-ISSN: 2769-2507
213
Volume 6, 2024
After the detection of the face, the process of
classification is applied to the detected face by using the
proposed CNN model. There are several architectures in the
field of Convolution Networks that have a name. The most
common are AlexNet, VGGNet, and GoogleNet. The
architecture of the proposed classification method using
Convolutional Neural Network is shown in fig. 6.
Fig. 6. The architecture of the proposed CNN for facemask
classification.
In the architecture of CNN for face mask classification, we
resize the input image into 224x224 with RGB. There are
seven convolution layers and six max-pooling layers. After
passing the two Fully-Connected layers and Softmax layer,
the final size will be reduced to 1x1x2. The classification
output layer produces the output corresponding to each group.
The output is related to the 3 classes for the classification of
without face mask, with face mask and wrong position of face
mask. The parameters of the proposed CNN for face mask
classification are shown in Table II. The number of filters and
the size of output feature maps are shown in it.
TABLE II. THE PARAMETERS OF PROPOSED CNN RCHITECTURE FOR FACE
MASK CLASSIFICATION
Layer
Filter/Weight
Number of
Parameters
Convolutional Layer (C1)
((3×3×3)+1)×16
448
Convolutional Layer (C2)
((3×3×16)+1)×32
4,640
Convolutional Layer (C3)
((3×3×32)+1)×64
18,496
Convolutional Layer (C4)
((3×3×64)+1)×128
73,856
Convolutional Layer (C5)
((3×3×128)+1)×256
295,168
Convolutional Layer (C6)
((3×3×256)+1)×512
1,180,160
Convolutional Layer (C7)
((3×3×512)+1)×1024
4,719,616
Fully Connected (F1)
((1024)+1)×100
102,500
Fully Connected (F2)
((100)+1)×3
303
Total parameter
6,395,187
For the feature extraction of CNN, the proposed activation
function was used to follow the convolution layer. The
proposed activation function which is shown in fig.7. The
proposed activation function extracts the feature without
reducing the negative value of the convoluted feature map.
Fig. 7. Proposed Activation Function.
The mathematical form of the proposed activation function
is shown in equation (3.1).
󰇱


(3.1 a)
 
(3.1 b)


(3.1 c)


(3.1 d)


(3.1 d)
Where x is a convolution layer value,  is maximum
value of convolution layer, is minimum value of
convolution layer, is midpoint value of convolution
layer, G is gradient start point of feature value, ,  are
gradient of feature values and and are the constant leak
factors . In the proposed system, the value of is 1 and the
value of is 0.01.
4.3 Proposed Classification Network Architecture
4.4 Proposed Activation Function
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2024.6.25
Nay Kyi Tun, Aye Min Myat
E-ISSN: 2769-2507
214
Volume 6, 2024
The training of detection network used an Stochastic
Gradient Descent with Momentum (SGDM) optimizer
optimizer, with an initial learning rate of 0.001, mini-batch
size of 5, and 32 maximum number of epochs. The training
of detection model is done to 7200 images of 1 class. The
detector model is then saved and used for the detection of
2400 testing images. This is done by randomly using 60% of
all images for training, 20% for validation and 20 % for
testing. The average precision of detection is 0.97%. Our
proposed system is work well in different positions and light
conditions. It also works in blur conditions.
As for the face mask classification, the training used the
SGDM optimizer, with an initial learning rate of 0.001, mini-
batch size of 30 and 7 maximum number of epochs. Then
training is done to 15000 images of 3 classes. The classifier
model is then saved and used for the classification of 3,000
validation images, and 30,00 testing images. This is done by
randomly using 60% of all images for training, 20% for
validation and 20% for testing.
(a)
(b)
(c)
Fig. 8. Visualization results of feature extraction (a) Original image, (b)
Feature extraction with ReLU activation and (c) Feature extraction with
proposed activation
The process of proposed feature extraction is compared
with ReLU activation function which is shown in fig.8. The
proposed activation function based feature extraction method
extract the feature of image more accuratly than ReLU
activation function based method.
The avearge precision of proposed detection model is
compared with YOLOv1, YOLOv2 that is shown in fig.9.
The process is based on training with SGDM optimizer. The
average precision of YOLOv1 based model is 0.919,
YOLOv2 based model is 0.941 and proposed model is 0.979.
According to the result, the SGDM optimizer based proposed
detection model has a higher accuracy than YOLOv1 and
YOLOv2.
Fig. 9. Comparative analysis of the detection model.
Proposed detection model is tested face mask detection
in daylight, roomlight, different background color, horizontal
poistion and vertical position. We expressed our
experimental result of detection and classification that
includes working in the real-time conditions in fig.10.
(a)
5. Result and Discussion
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2024.6.25
Nay Kyi Tun, Aye Min Myat
E-ISSN: 2769-2507
215
Volume 6, 2024
(b)
(c)
(d)
Fig. 10. Result of face mask detection (a) Face mask detection on image, (b)
Face detection on image (c) Face mask detection on video and (d) Face mask
detection on real-time.
The confusion matrix of the face mask classification is
shown in fig.11 which is based on the 3 classes of 3000 test
dataset images.Table III shows the accuracy result of the
proposed classification model that is obtained from the
confusion matrix results. According to the average result, the
precision is more than recall. The average result of the
negative samples of all sentences is over 99.9% which shows
the specificity of the classification model. The accuracy of
each class is over 96% and the average accuracy of all classes
is 96%.
Fig. 11. Confusion Matrix of Proposed System.
TABLE III. ACCURACY RESULTS OF FACE MASK CLASSIFICATION
Class
Precision
Recall
Specificity
F1
Score
Accuracy
1
0.938
0.942
0.968
0.940
0.9592
2
0.972
0.947
0.985
0.959
0.9724
3
0.914
0.935
0.955
0.924
0.9489
In conclusion, face mask detection systems represent a
crucial technological advancement in the ongoing efforts to
combat infectious diseases and promote public health.
Throughout this article, we have explored the evolution of
face mask detection technology, from traditional image
processing methods to sophisticated deep learning algorithms.
Our analysis has highlighted the significant strides made in
improving the accuracy, reliability, and scalability of face
mask detection systems. Real-world deployments and case
studies have demonstrated the practical utility of these
systems in enforcing mask-wearing protocols and mitigating
the spread of contagious diseases in various settings. Looking
ahead, future research and development efforts should focus
on addressing the remaining challenges and limitations of face
mask detection systems, including dataset bias, robustness to
environmental conditions, and interoperability across
different platforms.
[1] A. Velip and A. Dessai, "Face Mask Detection Using Machine
Learning Techniques," 2022 2nd Asian Conference on Innovation in
Technology (ASIANCON), Ravet, India, 2022, pp. 1-5, doi:
10.1109/ASIANCON55314.2022.9908873.
[2] S. Sakshi, A. K. Gupta, S. Singh Yadav and U. Kumar, "Face Mask
Detection System using CNN," 2021 International Conference on
Advance Computing and Innovative Technologies in Engineering
(ICACITE), Greater Noida, India, 2021, pp. 212-216, doi:
10.1109/ICACITE51222.2021.9404731.
[3] N. Kowalczyk, M. Sobotka and J. Rumiński, "Mask Detection and
Classification in Thermal Face Images," in IEEE Access, vol. 11, pp.
43349-43359, 2023, doi: 10.1109/ACCESS.2023.3272214.
[4] D. Singh and S. K. Joshi, "Masked Face Detection using Lightweight
Deep Learning based Model for IoT based Healthcare applications,"
2022 IEEE 2nd Mysore Sub Section International Conference
(MysuruCon), Mysuru, India, 2022, pp. 1-7, doi:
10.1109/MysuruCon55714.2022.9972485.
[5] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look
Once: Unified, Real-Time Object Detection," 2016 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-
788.
[6] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger",
2017 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Honolulu, HI, USA, 2017, pp. 66518-66520.
[7] C. R. Alimboyong, A. A. Hernandez and R. P. Medina, "Classification
of Plant Seedling Images Using Deep Learning," TENCON 2018 -
2018 IEEE Region 10 Conference, Jeju, Korea (South), 2018, pp.
1839-1844.
[8] Salman Khan, Hossein Rahmani, Syed Afaq Ali Shah, Mohammed
Bennamoun, A Guide to Convolutional Neural Networks for
Computer Vision, ” 2018, pp. 56.
6. Conclusion
References
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2024.6.25
Nay Kyi Tun, Aye Min Myat
E-ISSN: 2769-2507
216
Volume 6, 2024
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
The authors equally contributed in the present
research, at all stages from the formulation of the
problem to the final findings and solution.
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
No funding was received for conducting this study.
Conflict of Interest
The authors have no conflicts of interest to declare
that are relevant to the content of this article.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US
International Journal of Electrical Engineering and Computer Science
DOI: 10.37394/232027.2024.6.25
Nay Kyi Tun, Aye Min Myat
E-ISSN: 2769-2507
217
Volume 6, 2024