Abstract—Although single image dehazing has been widely studied as a common low-level computer vision task, it
still faces serious challenges such as limited ability to dehaze real foggy pictures. We propose an efficient end-to-
end self-adaptation feature attention (SAFA) network with multi-step fusion for this purpose. The proposed SAFA
module can adaptively expand the receptive field to obtain the key structure information in space and extract more
comprehensive and accurate features. In addition, considering the lack of connection between features acquired at
low and high levels in the network, we also implement a multi-step fusion module, which makes the features of
different layers in the network complementary effectively in the process of image recovery. The network structure is
simplified, and the required computing resources are significantly reduced by decreasing network parameters. For
multiple datasets and photographs with real haze, our method demonstrates better efficiency and availability, both
quantitatively and qualitatively.
Keywords— image dehazing; self-adaptation feature attention; multi-step fusion
Received: September 9, 2021. Revised: April 15, 2022. Accepted: May 12, 2022. Published: June 28, 2022.
1. Introduction
OR a long time, input images captured in indistinct scenes
show the negative impact on the performance of computer
vision tasks gravely. When the environment is affected by the
particles floating in the atmosphere, such as smoke, haze, and
dust, human activities in nature will be influenced seriously,
and our safety even will be threatened due to the lack of
visibility. The images are taken outdoors tend to suffer from
problems like reduced contrast, which include degraded colors
and structural details.
Therefore, single image dehazing has gradually become
indispensable research. The purpose of it is to effectively
recover the image from the corrupted input, which means, to
restore the basic information of the clean pictures. This can be
used as a pre-preparation for high-level visual tasks in many
fields such as real-time object detection, remote sensing, and
automatic transportation. Other computer vision applications
that are initially challenged by the hazy environment can also
be completed.
Basically, the generation of hazy images can be described
by applying the classic atmospheric scattering model [1, 2],
Based on the physical atmosphere scattering model, most
dehazing methods were proposed rely on prior knowledge of
physics and various assumption in early studies [3, 4, 5, 6].
For instance, dark channel prior (DCP) proposed by He et
al.[4] is the most representative algorithm among them. In
general, this kind of method has gained some achievements in
image dehazing. However, their assumptions do not precisely
reflect the inherent attributes of the image. Therefore, the
performance of these techniques is often actually limited.
With the rising-up and evolution of the deep learning in
recent years, it has also been applied to some simple computer
vision tasks such as target recognition [7] and image
reconstruction [8]. Compared with traditional ways, deep
learning method has extraordinary capability and robustness
on dehazing capability. Besides, with the remarkable success
of convolutional neural network (CNN) techniques for image
dehazing, more and more research teams are tending to use the
similar methods to estimate atmospheric light and transmit
maps to achieve the desired effect by using external data. For
instance, the transport map is decided to be in an end-to-end
way in DehazeNet [9]. And in the following research [10, 11,
12, 13, 14], all kinds of novel techniques have also been
Self-Adaptation Feature Attention Network with Multi-Step
Fusion for Single Image Dehazing
JIAWEI ZHANG, XIAOCHEN LIU, DONGHUA ZHAO, CHENGUANG WANG, CHONG SHEN,
JUN TANG, JUN LIU
Key Laboratory of Instrumentation Science & Dynamic Measurement, Ministry of Education, School
of Instrument and Electronics, North University of China, Taiyuan 030051, P. R. CHINA
School of Instrumentation Sciences and Engineering, Southeast University, Nanjing 210096, P. R.
CHINA
School of Information and Communication Engineering, North University of China, Taiyuan 030051,
P. R. CHINA
Shanxi province Key laboratory of quantum sensing and precision measurement (201905D121001),
North University of China, Taiyuan 030051, P. R. CHINA
F
WSEAS TRANSACTIONS on COMMUNICATIONS
DOI: 10.37394/23204.2022.21.24
Jiawei Zhang, Xiaochen Liu,
Donghua Zhao, Chenguang Wang,
Chong Shen, Jun Tang, Jun Liu
E-ISSN: 2224-2864
196
Volume 21, 2022
gradually added to this field to strengthen the haze removal
effect of the network. Due to the strong expression of deep
learning networks, these end-to-end network models often
have the ability to gain much better dehazing effects than
previous work. But the haze from the real world is much more
complex than the simulated one, which makes it harder for
these methods to process real-world haze images. On the other
hand, all of them inevitably need huge computation cost to
support. Previous studies [13, 15, 16, 17, 18] have paid too
much attention to improve dehazing performance by greatly
increasing the depth or width of models and using vast training
parameters. But they have not taken into account reasonably
the time consumption, memory, or computing consumption,
which also makes these models can not be applied in resource-
limited environments (such as mobile devices).
In this text, we propose an end-to-end self-adaptation
feature attention (SAFA) network with multi-step fusion for
single image dehazing. The convolution kernel with the fixed
shape is usually adopted in the previous CNN-based image
dehazing network, which results in the structural cues in the
feature space cannot be effectively utilized. The SAFA module
we proposed can adaptively adjust the deformable convolution
kernel during the training process to obtain and deal with the
crucial structural information in the space. In addition, the
application of a multi-step fusion module makes the features of
different levels in the network efficiently combined. This
network not only reduces the computation cost with a compact
and simplified network structure, but also shows excellent
visual effects and metrics on several datasets as well as real
foggy images.
Basically, the main contributions of this paper include:
A self-adaptation feature attention (SAFA) module is
proposed, which integrates the attention mechanism and
deformable convolution mechanism. This module is capable of
paying more attention to dense haze areas and handling
different kinds of complex information adaptively. Besides,
the uncomplicated network structure also avoids great
computational consumption.
A multi-step fusion module is developed, which is capable
of fusing the features of disparate steps adaptively and
supplementing each other to get the haze-free image.
An efficient end-to-end self-adaptation feature attention
(SAFA) network with multi-step fusion for single image
dehazing is implemented, which is combined with the above
modules. Moreover, we adjust the network and carry out
exhaustive experiments to get the best performance on public
datasets as well as real images with fog. Abundant
experimental results highlight the validity and practicability of
our dehazing network compared with the state-of-the-art
(SOTA) methods.
2. Related Work
In the past, most image dehazing work was relied on
external information, such as available geo-referenced models
[19, 20] or information obtained from other sources [21].
However, due to the unknown nature of transmission map and
global atmospheric light, there is no suitable external
information for image dehazing in real application. For this
extremely challenging task, the current solutions are generally
divided into two categories: the classical priority-based ways
and the novel deep learning-based methods. But either way,
the fundamental problem that how to deal with transmission
map and atmospheric light is still remained.
2.1 Priority-based Image Dehazing Methods
The priority-based method for image dehazing usually
depends on the atmospheric scattering model. It utilizes certain
assumptions or priors to estimate the atmospheric light and
transmission map and also takes advantage of additional
constraints to compensate for the information lost in the
process. Then the image corrupted by haze can be restored to
clear. He et al. [4] realized that at least one of the color
channels in the haze-affected area has a pixel intensity value
close to zero. Based on this, they proposed using dark channel
priors to estimate the transmission map and atmospheric light,
which is a landmark method of fog removal. By generating a
linear model to model the scene depth of the blurred image,
Zhu and Mai [22] proposed a valid but not complicated color
attenuation prior to recover the depth information to estimate
the transmitted map and atmospheric light, thus obtaining the
clear image. Tan et al. [23] developed a local contrast
maximization dehazing technique to increase the visibility of
hazy images, which depends on the theory observed that the
contrast of the foggy images is often lower than that of the
clean ones. In the research of Fattal et al. [24], an image
production model was proposed, which can be applied for
scenario transmission and surface shading in order to improve
the visibility of the scenario and restore the contrast of haze-
free environment. Although these methods achieved
impressive results, their performance are still limited by the
accuracy of the priors, which are heavily dependent on
assumptions and target scenarios. However, it is realized that
the lost priors will lead to poor robustness when the scene is
becoming more complex. As a result, they are not capable of
handling all situations as properly as they used to, such as
dehazing the sky area of the image.
2.2 Learning-based Image Dehazing Methods
In recent years, as deep learning has been proven effective
in image processing tasks and the availability of related
synthetic image datasets, data-driven image dehazing methods
have gradually become the mainstream.
Among them, early studies [9, 14, 25] usually apply neural
networks for the estimation of the transmission map and
atmospheric light in the physical scattering model. For
example, a three-layer CNN with the coarse-to-fine method
developed by Cai et al. [9], is applied to estimate the media
transmission map from existing foggy images to remove haze.
For the AOD-Net implemented by Li et al. [14], on the other
hand, re-establishes the scattering model through the
lightweight CNN with creative design and generates clean
images accordingly. But these estimations are not always
accurate, which will lead to serious reconstruction errors
WSEAS TRANSACTIONS on COMMUNICATIONS
DOI: 10.37394/23204.2022.21.24
Jiawei Zhang, Xiaochen Liu,
Donghua Zhao, Chenguang Wang,
Chong Shen, Jun Tang, Jun Liu
E-ISSN: 2224-2864
197
Volume 21, 2022
between the reconstructed images and the clear ones, such as
artifacts and distortion.
Another research strategy is to focus on the end-to-end
dehazing network model which uses the neural network to
learn the mapping of foggy images to clean ones straightway to
complete the dehazing task [13, 15, 16, 26, 27, 28, 29]. A
network based on feature fusion attention mechanism (FFA-
Net) proposed by Qin et al. [16], which only utilizes a simple
loss function L1 to reconstruct the loss, and the combination of
different attention mechanisms makes the network more
flexible when dealing with disparate information. As Ren et al.
[26] presented the multi-scale convolutional neural network
(MSCNN) for dehazing image, many essentially similar but
fully improved networks were born on this basis, such as the
gated fusion network (GFN) [27] and the multi-scale boost
dehazing network (MSBDN) [13]. Compared with these
methods, the enhanced pix2pix dehazing network (EPDN)
implemented by Qu et al. [28] is combined with the generative
adversarial network, which is able to reduce the dependence
on paired datasets and restoring the haze-free images directly.
2.3 Attention Mechanism
Since the attention mechanism is capable of guiding the
network model to dispose of the crucial components in images
adaptively, it has been paid more and more attention and
applied to a series of computer vision tasks [30, 31, 32, 33] by
researchers. For instance, Liu et al. [15] combined a channel-
wise attention mechanism with the end-to-end neural network
and used the multi-scale estimation technology to guide
information exchange and aggregation in the network flexibly.
By utilizing the channel attention mechanism, a feature
attention dehazing network based on pyramid channels was
proposed by Zhang et al. [25] to remove fog in images. Qin et
al. [16] also developed a FFA-Net, which includes both
channel attention and pixel attention and has the ability to
conduct different types of information efficiently.
3. Our Method
3.1 Method Overview
Inspired by the FA module from FFA-Net [16], we propose
a new self-adaptive feature attention module (SAFA) as our
basic module, and only five of these modules are used in the
main architecture of the network. At the same time, a multi-
step fusion module is adopted between each SAFA module to
realize feature fusion between different steps, which
dramatically reduces the memory required for calculation
(compared with 57 FA modules in the original network [16]).
As shown in the Fig. 1, our network first applies the
downsampling operation (such as one convolution with stride
1 and one convolution layer with stride 2, both followed by the
ReLU function) for making the subsequent modules obtain the
capability to learn the feature representation in the low-
resolution domain, and a regular convolution layer for shallow
feature extraction. After continuous SAFA modules and multi-
step fusion modules, one convolution layer and the related
upsampling operation are used to produce the recovered haze-
free image.
Fig. 1 The architecture of the self-adaptation feature attention
network with multi-step fusion
Generally speaking, the shallow features such as edge will
gradually lost with the increase of network depth. Some
researches [34, 35], including the FFA-Net[16], combine the
shallow features and deep features through the operation of
multi-skip connection and concatenation, so as to form output.
For solving the problem of lack of contact between the
downsampling layer and the upsampling layer in our network,
the adaptive mixup operation [36] is utilized to link the
information between the two layers to maintain information
flow adaptively and better restore the image. In this network,
the final output of this operation can be expressed as:
Mix( , )= () (1 ())
(1)
where denotes the final output, and represent
feature maps from upsampling and downsampling,
respectively. ( ) refers to the learnable factor to combine the
inputs from the two layers, which is obtained by sigmoid
function with parameter .
3.2 The Self-Adaptation Feature Attention
Module
Fig. 2 Principle of deformable convolution
In early researches [13, 15, 16, 28, 29, 37], the fixed
network convolution kernel as shown in the middle of Fig. 2 is
usually adopted, which leads to the limitation of the receptive
field and the inability to explore the structured clues in the
feature space effectively. To solve this problem, it is also
crucial to adjust the shape of the receptive field. As shown in
the right of Fig. 2, due to the flexibility of the deformable
convolution kernel, it is capable of obtaining more significant
structural information adaptively. The spatial invariant
WSEAS TRANSACTIONS on COMMUNICATIONS
DOI: 10.37394/23204.2022.21.24
Jiawei Zhang, Xiaochen Liu,
Donghua Zhao, Chenguang Wang,
Chong Shen, Jun Tang, Jun Liu
E-ISSN: 2224-2864
198
Volume 21, 2022
convolution kernel will lead to the destruction of image
texture, which has been confirmed in previous studies [38]. As
the core component of our SAFA module, we introduce two
deformable convolution layers with deformable 2D kernels
into the original pixel attention module [39], as shown in the
Fig. 3, which implements an expansion of the receptive field
adaptively and promotes the transformation ability of the
model when the network focus on the calculation of thick fog
pixels and high-frequency image regions. The capability to
sample the unconstrained deformation of the grid also enables
the network to adaptively integrate more spatial structure
information and achieve a better dehazing effect. In addition, it
is worth noting that in each SAFA module, the deformable
convolution in deep deployment is better than in shallow
deployment. On the other hand, through experiments, we
notice that for the pixel attention module in our method, it is
more efficient to replace the original two convolution layers
and ReLU function with one 1 × 1 convolution, and the
network is simplified to a certain extent. Therefore, the
process can be defined as:
(2)
where refers to the deformable convolution
operation and the represents the sigmoid function. The rest
parts of the SAFA module basically keep the network structure
of the FA module [16].
Fig. 3 The basic architecture of the SAFA module
3.3 Multi-step Fusion Module
Low-level features(i.e., step 1 and step 2 in our network),
such as local information like edges, can often be easily
extracted. With the increase of receptive field, the semantics of
the global scope can be obtained by high-level features. In
many cases such as target detection [7], image restoration [8],
and other CNN-based tasks, the application of different levels
of feature extraction and fusion methods has demonstrated
significant effects. However, for the image dehazing field, the
existing feature fusion methods do not fully consider feature
fusion from disparate levels. In general, using only high-level
features results in images lacking local details. By contrast,
applying only low-level features, preserves the details though,
but does not recover the semantics at the global level. In order
to make full use of the advantages of this method, we
implement a multi-step feature fusion module for the dehazing
network. As shown in the Fig. 1, there are four fusion modules
from left to right. The first module fuses the features from step
1 and Step 2, and the resulting fusion feature 1, as a low-level
feature, continues to fuse with the high-level features of Step 3
in the second fusion module to produce fusion feature 2.
Similarly, fusion feature 3 generated after the fusion of the
feature in step 4 and fusion feature 2 in the third fusion module
is also used for the final feature fusion module after step 5.
Basically, for each feature fusion module, there is a low-
level feature and a high-level feature, respectively. Each of
them passes through a convolution layer before being fused,
and then fusion operation is completed by an element-wise
product. Two different features are combined in the fused
features, which will go through a convolution layer and a
ReLU layer, and then be processed by the next fusion module
in sequence. The high-level and low-level features in each
fusion module are denoted as and , respectively, and
means the ReLU function. represents the final output of
the whole module. Finally, this process can be expressed as:
(3)
3.4 Loss Function
There are three loss functions utilized to measure the
deviation between the haze-free images and related clear ones
to optimize the model. They are mean square error (MSE),
Smooth L1 loss, and perceptual loss, and each of them plays a
different role in the total loss function, respectively. The MSE
is usually applied to precisely obtain some information of the
low frequencies in the images, which are necessary for
recovering clear images. This term is formulated as:
(4)
where means the image after dehazing, and denotes that
related ground truth image, and represent the number
of RGB channels, height, and width of the image severally.
In addition to strengthen low-frequency correctness, Smooth
L1 loss is also insensitive to outliers and can be used to
mitigate situations such as gradient explosion. Accordingly,
this term can be expressed as:
(5)
To enhance network recovery of images with low-to-high
semantic fidelity and better visual effect criterion, we utilize
perceptual loss, which leverages multi-scale features obtained
from the pre-trained neural network, to quantify the feature
discrepancy between and .
where denotes the k-th feature extractor related to the three
stages of the pre-trained VGG16 network associated with
WSEAS TRANSACTIONS on COMMUNICATIONS
DOI: 10.37394/23204.2022.21.24
Jiawei Zhang, Xiaochen Liu,
Donghua Zhao, Chenguang Wang,
Chong Shen, Jun Tang, Jun Liu
E-ISSN: 2224-2864
199
Volume 21, 2022
restored image and its clear image , the , , and
refer to the value of three parameters mentioned above of the
feature maps corresponding to the k-th layer of the VGG16
severally.
At last, the overall loss function is defined by integrating three
terms above as shown below:
(7)
where is a weight parameter applied to control the balance
of the three terms.
4. Experiment
4.1 Datasets and Evaluation Metrics
Due to the serious difficulty of collecting the authentic hazy
images and their references without haze, we first choose the
outdoor training set (OTS) and synthetic objective testing set
(SOTS) from the RESIDE-standard dataset [40] for training
and testing goals severally. RESIDE contains plentiful
synthetic hazy indoor and outdoor images as well as their
related clear images (ground truth). It has been used for a long
time by researchers as a benchmark in the field of image
dehazing based on CNN. To further evaluate the integrated
dehazing capability of our model in the scene of the real
world, we also adopt Dense-Haze dataset [41] and NH-HAZE
dataset [42], which included 55 pairs of images from various
outdoor scenes of homogeneous, uneven fog as well as their
ground truth, respectively.
The peak signal-to-noise ratio (PSNR) and the structural
similarity index (SSIM)[43] are applied as the metrics for the
assessment section, which also are the most common criteria
for comparing the image quality in dehazing tasks.
4.2 Training Details
We implement proposed model utilizing PyTorch [44]
framework, and all training and tests are performed on the
platform with an Nvidia GeForce RTX 2080Ti GPU, 128GB
of RAM, and the Intel XEON E5-2698V4 CPU. For the model
training section, the configuration is shown below: The Adam
optimizer [45] with exponential decay rates of 0.9 and 0.99
respectively is applied, and the 8 hazy-image patches with the
size 240 × 240 are extracted as input of our network.
Moreover, the initial learning rate is set as 0.0002 and decayed
based on the cosine annealing strategy. The network is totally
trained for about 130 epochs on the OTS subset. Besides, 90,
180, 270 degrees random horizontal and vertical rotations are
applied as extra augmentation methods of training data.
4.3 Evaluation on the Benchmark dataset
Firstly, the proposed network is tested according to the
visual effect and quantitative accuracy with the synthetic
dataset SOTS [40]. We compare our way with SOTA methods
in the visual effect of the recovered image, as shown in the
Fig. 4. It can be clearly seen that although the haze is
successfully removed in DCP [4] and MSBDN [13], it also
caused the problem of color distortion. The image utilized
GridDehazeNet[15] is recovered though, the brightness
became too high. In comparison, AOD-Net [14] and FFA-Net
[16] obtained relatively good output results, but there is still a
small amount of haze in the local region of images.
Fig. 4 Visual results comparison of images on SOTS dataset
[40].
Besides, some experimental comparisons are conducted
with SOTA techniques including DCP [4](the prior-based
method), AOD-Net [14], GridDeHazeNet [15], MSBDN
[13] and FFA-Net [16]. The quantitative results on the testing
set are summarized below in Table 1:
It can be observed from the comparison with the previous
FFA-Net [16] of Table 1, our SAFA network realized 0.24dB
PSNR performance increase with significantly reduced
parameters. Although SSIM decreased slightly by 0.0063, our
method generated images more naturally.
Table 1 Quantitative comparisons of results with SOTA
techniques on SOTS[40] dataset.
Methods
PSNR
DCP[4]
15.09
AOD-Net[14]
19.82
GridDehazNet[15]
32.16
MSBDN[13]
33.79
FFA-Net[16]
36.39
SAFA-Net
36.63
4.4 Evaluation on real-world datasets
We also compared the test results of Dense-Haze [41] and
NH-HAZE [42] datasets with other SOTA approaches. Both
datasets are under much denser and more difficult-to-remove
fog than the RESIDE dataset [40], especially the former.
It can be observed from the Fig. 5 and Fig. 6 that whether it is
DCP [4], AOD-Net [14], GridDeHazeNet [15], and MSBDN
[13], all of them have limited visual effects on removing dense
haze. It is obvious that most fog still remains on the processed
images, while there are particular problems such as texture loss
and color degradation in FFA-Net [16]. By comparing the
visual effects, our method can apparently recover more explicit
images than other methods while retaining the original details
and structure.
WSEAS TRANSACTIONS on COMMUNICATIONS
DOI: 10.37394/23204.2022.21.24
Jiawei Zhang, Xiaochen Liu,
Donghua Zhao, Chenguang Wang,
Chong Shen, Jun Tang, Jun Liu
E-ISSN: 2224-2864
200
Volume 21, 2022
Fig. 5 Visual results comparison of images on Dense-Haze
dataset [41].
Fig. 6 Visual results comparison of images on NH-HAZE
dataset [42].
As shown in Table 2 and Table 3, it is shown that the
performance of our SAFA network on the Dense-Haze dataset
[41] is far superior to all SOTA techniques depend on 17.34dB
PSNR and 0.5817 SSIM. For the NH-HAZE dataset [42], we
also obtained the highest PSNR and SSIM, which are 21.81
dB and 0.7253 severally.
Besides, it is not difficult to discover from the last row of
Table 3 that the proposed network achieves relatively excellent
results with fewer parameters for the trade-off between
calculation parameters and image recovery metrics, which also
reduces the cost of computation effectively.
Table 2 Quantitative comparisons of results with SOTA
techniques on Dense-Haze[41] dataset.
Methods
PSNR
SSIM
DCP[4]
10.06
0.3856
AOD-Net[14]
13.14
0.4144
GridDehazNet[15]
14.31
0.4081
MSBDN[13]
15.47
0.4858
FFA-Net[16]
14.39
0.4725
SAFA-Net
17.34
0.5817
Table 3 Quantitative comparisons of results with SOTA
techniques on NH-HAZE[42] dataset.
Methods
PSNR
*Parameters
DCP[4]
10.57
-
AOD-Net[14]
15.40
0.002M
GridDehazNet[15]
13.80
0.96M
MSBDN[13]
19.23
31.35M
FFA-Net[16]
19.87
4.68M
SAFA-Net
21.81
2.37M
4.5 Evaluation on real-world hazy photographs
In order to measure the dehazing effect of our network on
real foggy photographs and make it more convincing. Plentiful
real hazy photographs obtained from the RTTS [40] dataset
and a part of foggy day images collected by the author in the
campus of the University of Kent were tested and compared.
The visual results are shown in the figure. It can be seen that
although the previous methods of AOD-Net [14],
GridDeHazeNet [15], MSBDN[13], and FFA-Net [16]
perform very well on artificial datasets, the effect of fog
removal for such real images is not satisfactory enough.
Besides, relatively effective DCP [4] suffers from color
distortion and tends to over-enhance the images. In some
cases, the results of AOD-Net [14] appeared floating shadows,
and the brightness of the pictures after MSBDN [13]
processing became lower. In general, our model achieves the
superior visual effect in image detail recovery while
maintaining the overall brightness, and the clear and haze-free
images are reconstructed with good perceptual quality as well.
Fig. 7 Visual results comparison of real photographs with
haze.
5. Experiment
In this paper, we propose an end-to-end dehazing network
that consists of self-adaptation feature attention (SAFA)
module and a multi-step fusion module. The former module is
capable of extracting the detailed features of the hazy image
adaptively, which enlarges the range of dealing with
complicated information to increase the transformation ability
of the network significantly. The latter one uses the features
from multiple steps to obtain the benefit from their
combination. We also carried out exhaustive experiments on
disparate datasets, and by comparing with results of some
SOTA algorithms, it is proved that the apparent advantages of
this network structure in the aspect of image detail recovering
with effect. In addition, as we reduce the depth and complexity
in network design, the more compact network significantly
reduces the computational power consumption and time
required for operation. Through further research in the future,
it is expected to realize real-time dehazing and the application
WSEAS TRANSACTIONS on COMMUNICATIONS
DOI: 10.37394/23204.2022.21.24
Jiawei Zhang, Xiaochen Liu,
Donghua Zhao, Chenguang Wang,
Chong Shen, Jun Tang, Jun Liu
E-ISSN: 2224-2864
201
Volume 21, 2022
based on this network structure in other image restoration
tasks.
References
[1] E. J. McCartney, “Optics of the atmosphere: Scattering by
molecules and particles,” New York, John Wiley and Sons,
Inc, 1976. 421, 1976.
[2] S. G. Narasimhan and S. K. Nayar, “Vision and the
atmosphere,” International Journal of Computer Vision, vol.
48, no. 3, pp. 233254, 2002.
[3] R. Fattal, “Dehazing using color-lines,” ACM Transactions
on Graphics, vol. 34, no. 1, pp. 1–14, 2014.
[4] K. He, J. Sun, and X. Tang, “Single image haze removal
using dark channel prior,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, pp. 2341–2353, 2010.
[5] R. T. Tan, “Visibility in bad weather from a single image,”
in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 1–8, 2008.
[6] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze
removal algorithm using color attenuation prior,” IEEE
transactions on image processing, vol. 24, no. 11, pp. 3522
3533, 2015.
[7] B. Xu and Z. Chen, “Multi-level fusion based 3d object
detection from monocular images,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp.
2345–2353, 2018.
[8] J. Li, F. Fang, J. Li, K. Mei, and G. Zhang, “Mdcn:
Multiscale dense cross network for image super-resolution,”
IEEE Transactions on Circuits and Systems for Video
Technology, 2020.
[9] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet:
An end-to-end system for single image haze removal,” IEEE
Transactions on Image Processing, vol. 25, no. 11, pp. 5187
5198, 2016.
[10] S. Zhao, L. Zhang, Y. Shen and Y. Zhou, “RefineDNet: A
Weakly Supervised Refinement Framework for Single Image
Dehazing,” in IEEE Transactions on Image Processing, vol.
30, pp. 3391-3404, 2021.
[11] L. Huang, J. Yin, B. Chen and S. Ye, “Towards
Unsupervised Single Image Dehazing With Deep
Learning,” 2019 IEEE International Conference on Image
Processing (ICIP), pp. 2741-2745, 2019.
[12] Y. Zhang and Y. Dong, “Single Image Dehazing via
Reinforcement Learning,” 2020 IEEE International
Conference on Information Technology,Big Data and
Artificial Intelligence (ICIBA), pp. 123-126, 2020.
[13] H. Dong, J. Pan, L. Xiang, Z. Hu, X. Zhang, F. Wang,
and M.-H. Yang, “Multi-scale boosted dehazing network with
dense feature fusion,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp.
2157–2167, 2020.
[14] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “Aod-net:
All-in-one dehazing network,” in Proceedings of the IEEE
International Conference on Computer Vision, pp. 4770–4778,
2017.
[15] X. Liu, Y. Ma, Z. Shi, and J. Chen, “Griddehazenet:
Attention-based multi-scale network for image dehazing,” In
ICCV, 2019.
[16] X. Qin, Z. Wang, Y. Bai, X. Xie, and H. Jia, “Ffa-net:
Feature fusion attention network for single image dehazing.” in
AAAI, pp. 11 908– 11 915, 2020.
[17] T. Guo, X. Li, V. Cherukuri, and V. Monga, “Dense
scene information estimation network for dehazing,” In
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition Workshops, 2019.
[18] X. Zhang, J. Wang, T. Wang and R. Jiang, “Hierarchical
Feature Fusion with Mixed Convolution Attention for Single
Image Dehazing,” in IEEE Transactions on Circuits and
Systems for Video Technology, 2021.
[19] G. Meng, Y. Wang, J. Duan, S. Xiang, and C. Pan,
“Effificient image dehazing with boundary constraint and
contextual regularization,” in Proceedings of the IEEE
International Conference on Computer Vision, pp. 617–624,
2013.
[20] C. O. Ancuti, C. Ancuti, C. Hermans, and P. Bekaert, “A
fast semi-inverse approach to detect and remove the haze from
a single image,” in Proceedings of the Asian Conference on
Computer Vision, pp. 501–514, 2010.
[21] S. G. Narasimhan and S. K. Nayar, “Interactive (de)
weathering of an image using physical models,” in
Proceedings of the IEEE Workshop on Color and Photometric
Methods in Computer Vision, vol. 6, no. 6.4, p. 1, 2003.
[22] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze
removal algorithm using color attenuation prior,” IEEE
transactions on image processing, vol. 24, no. 11, pp. 3522
3533, 2015.
[23] R. T. Tan, “Visibility in bad weather from a single
image,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 1–8, 2008.
[24] R. Fattal, “Dehazing using color-lines,” ACM
Transactions on Graphics, vol. 34, no. 1, pp. 1–14, 2014.
[25] H. Zhang and V. M Patel, “Densely connected pyramid
dehazing network,” In CVPR, pages 3194–3203, 2018.
[26] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M. Yang,
“Single image dehazing via multi-scale convolutional neural
networks,” In ECCV, pages 154–169, 2016.
[27] W. Ren, L. Ma, J. Zhang, J. Pan, X. Cao, W. Liu, and M.-
H. Yang, “Gated fusion network for single image dehazing,” in
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 3253–3261, 2018.
[28] Y. Qu, Y. Chen, J. Huang, and Y. Xie, “Enhanced
pix2pix dehazing network,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp.
8160–8168, 2019.
[29] M. Hong, Y. Xie, C. Li, and Y. Qu, “Distilling image
dehazing with heterogeneous task imitation,” In CVPR, pages
3462–3471, 2020.
[30] Q. Wang, Z. Teng, J. Xing, J. Gao, W. Hu, and S.
Maybank, “Learning attentions: residual attentional siamese
network for high performance online visual tracking,” in
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 4854–4863, 2018.
WSEAS TRANSACTIONS on COMMUNICATIONS
DOI: 10.37394/23204.2022.21.24
Jiawei Zhang, Xiaochen Liu,
Donghua Zhao, Chenguang Wang,
Chong Shen, Jun Tang, Jun Liu
E-ISSN: 2224-2864
202
Volume 21, 2022
[31] Y. Peng, X. He, and J. Zhao, “Object-part attention model
for fine-grained image classification,” IEEE Transactions on
Image Processing, vol. 27, no. 3, pp. 1487–1500, 2017.
[32] X. Zhang, R. Jiang, T. Wang, P. Huang, and L. Zhao,
“Attentionbased interpolation network for video deblurring,”
Neurocomputing, 2020.
[33] Y. Hu, J. Li, Y. Huang, and X. Gao, “Channel-wise and
spatial feature modulation network for single image super-
resolution,” IEEE Transactions on Circuits and Systems for
Video Technology, 2019.
[34] O. Ronneberger, P. Fischer, and T. Brox, “U-net:
Convolutional networks for biomedical image segmentation,
In MICCAI, pages 234–241, 2015.
[35] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual
learning for image recognition,” In CVPR, pages 770–778,
2016.
[36] H. Zhang, M. Cisse, Y. N Dauphin, and D. Lopez-Paz,
“mixup: Beyond empirical risk minimization,” arXiv preprint
arXiv:1710.09412, 2017.
[37] Y. Shao, L. Li, W. Ren, C. Gao, and N. Sang, “Domain
adaptation for image dehazing,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 2808–2817, 2020.
[38]X. Xu, M. Li, and W. Sun, “Learning deformable kernels
for image and video denoising,” arXiv preprint
arXiv:1904.06903, 2019.
[39] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y.
Wei, “Deformable convolutional networks, In ICCV, pages
764–773, 2017.
[40] B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z.
Wang, “Benchmarking single-image dehazing and beyond,”
IEEE Transactions on Image Processing, vol. 28, no. 1, pp.
492–505, 2018.
[41] C.O. Ancuti, C. Ancuti, M. Sbert, and R. Timofte, “Dense
haze: A benchmark for image dehazing with dense-haze and
haze-free images,” In ICIP, 2019.
[42] C.O. Ancuti, C.Ancuti, and R.Timofte, “NH-HAZE: An
image dehazing benchmark with nonhomogeneous hazy and
haze-free images,” In The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) Workshops, 2020.
[43]Z. Wang, A. C Bovik, H. R Sheikh, and E.P Simoncelli,
“Image quality assessment: from error visibility to structural
similarity,” IEEE transactions on image processing,
13(4):600–612, 2004.
[44] A. Paszke, S. Gross, S. Chintala, and G. Chanan,
“Pytorch,” Computer Software. Vers. 0.3, vol. 1, 2017.
[45] D. P. Kingma and J. Ba, “Adam: A method for stochastic
optimization,” arXiv preprint arXiv:1412.6980, 2014.
Authors' ContributionsAll authors contributed to the study
conception and design. Material preparation, data collection
and analysis were performed by Jiawei Zhang and Xiaochen
Liu. The first draft of the manuscript was written by Jiawei
Zhang and all authors commented on previous versions of the
manuscript. Donghua Zhao, Chenguang Wang, Chong Shen,
Jun Tang and Jun Liu gave the guidance and help during the
whole research. All authors read and approved the final
manuscript.
FundingThis work was supported, in part, by the National
Natural Science Foundation of China (Grant Nos. 61973281
and 51705477), the Innovative Research Group Project of the
National Natural Science Foundation of China (Grant No.
51821003), the Pre-research Field Foundation (Grant No.
6140518010201), the Scientific and Technology Innovation
Programs of Higher Education Institutions in Shanxi (Grant
No. 201802084), the Aeronautical Science Foundation of
China (Grant No. 2018ZCU0002), the Weapons and
Equipment Joint Foundation (Grant No. 6141B021305), the
Program for the Top Young Academic Leaders of Higher
Learning Institutions of Shanxi, the Young Academic Leaders
Foundation in North University of China, the Science
Foundation of North University of China (Grant No.
XJJ201822), and the Fund for Shanxi “1331 Project” Key
Subjects Construction.
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the Creative
Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en_US
WSEAS TRANSACTIONS on COMMUNICATIONS
DOI: 10.37394/23204.2022.21.24
Jiawei Zhang, Xiaochen Liu,
Donghua Zhao, Chenguang Wang,
Chong Shen, Jun Tang, Jun Liu
E-ISSN: 2224-2864
203
Volume 21, 2022