Abstract: At present, research on image style conversion based on deep learning is increasing, different from the
conventional style conversion, this paper is based on convolutional neural networks, using the InceptionV3 model
trained under ImageNet dataset. By using Deep Dream technology, which gives a dull and ordinary background picture
a warm color, makes the picture content richer, the texture is very softer and more exquisite.
Keywords: deep learning, convolutional neural network, deep dream, image style conversion.
Received: March 9, 2022. Revised: August 13, 2022. Accepted: September 2, 2022. Available online: September 21, 2022.
1. Introduction
Image style conversion is an image processing method
that converts its background and overall tone (style) without
changing the overall framework of its own source image, to
obtain a new aesthetic image, which perfectly combines the
local characteristics of the source image with the style.
However, based on the limitations of the style conversion
algorithm, the traditional style conversion algorithm cannot
extract the high-level abstract features in the target image, and
can only synthetize the pictures of the abstract painting style.
With the rapid development of artificial intelligence algorithms
and the investment of funds in experimental facilities, scientists
from Germany such as Gatys and others proposed the use of
convolutional neural networks to achieve style conversion of
images[1]. They discovered the convolutional neural network
through research, dividing it into deep convolutional layers and
shallow convolutional layers, the former can obtain the overall
framework of the image, while the latter can obtain the style
characteristics of the image, based on this discovery, to achieve
the separation of style and content in the image, and then the
overall content and style of the image are combined to achieve
style conversion. Although the method successfully
implements style conversion, the gradient descent method
needs to be used to repeatedly update the pixel values of the
source image in the implementation, which greatly increases
the resource occupation and cost, and brings a series of
problems of slow generation progress. Subsequently,
researchers have made many improvements based on Gatys'
research, such as Justin et al. proposed a rapid stylized
conversion algorithm[2], which can make the source image
transform the style after only one optimization by establishing a
feed-forward network in advance, which greatly improves the
generation speed.
Based on CNN's real-time style conversion algorithm[3],
this paper designs a more efficient style conversion algorithm
to effectively improve image quality and reduce the time of
processing data redundancy and nonparametric algorithms to a
certain extent. Then use crawler technology, by extracting the
sensitive words of the information required by the user, obtain
the required images from the website or user-side files, and
transform the style of the image through deep learning, to
obtain their ideal exquisite pictures, or get the important
information they want to extract from the pictures. By
combining the principle of deep dream technology to use it to
optimize the random noise image, the pixel value randomly
generated by the random noise picture is fixed by the
convolutional neural network, and the random noise is adjusted
at the beginning, and optimize the adjustment, so that the noise
picture presents a certain feature distribution and thus
optimizes the output picture.
2. Convolutional Neural Network
2.1 Feature extraction stage
In the premise of the image style conversion, about how to
obtain the data information of the image reasonably and
perfectly[4]. we cannot directly use the image as input to
process it, for a colored picture, its basic characteristics mainly
include two parts, depth and pixels. The main idea of machine
learning is to transform real-world information into vectors,
that computers can process. In the image style conversion, we
can convert the image data into the form of a pixel matrix
through the CNN, for black and white pictures, each point
represents only one pixel value, if it is a color picture, each
point will have three-pixel values representing RGB.
We know that the cross-correlation operation of the digital
image saved as a matrix, which meanings for each pixel in the
image, with the gray value of the pixels around it weighted to
adjust the gray value of this point[5]. First need to define a
convolutional kernel, which is also known as the convolutional
template or convolution window, is an N x N matrix, the size of
the convolutional kernel determines the scope of the operation,
it should be a cardinality, so that there is a central point, the
number of the convolutional kernel is the weight of this point
and the points around it[6]. The example is described as follows
Fig.1.
Image Style Conversion using Deep Convolutional Neural Network
LINGLING WANG, XINGGUANG DONG
School of management science and Engineering, Anhui University of Finance and Economics
Bengbu 233000, CHINA
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2023.20.13
Lingling Wang, Xingguang Dong
E-ISSN: 2224-2899
122
Volume 20, 2023
Figure 1. Convolutional calculation process with depth 1
For the example in the figure above, the convolution is
calculated as follows:
0×0+1×1+3×2+4×3=19,
1×0+2×1+4×2+5×3=25,
3×0+4×1+6×2+7×3=37,
4×0+5×1+7×2+8×3=43.
In addition, a non-black and white color picture, that is also
need to consider the basic characteristics of the depth of the
picture, then if the depth of a picture is 2, a pixel is composed of
2 values, in the operation, if the above basic convolution
operation is regarded as a two-dimensional cross-correlation
operation, then the convolution operation with depth would be
a three-dimensional cross-correlation operation[7], for
example, a depth of 2 operation process is as described in Fig.
2.
Figure 2. Convolutional calculation process with depth 2
Step size is equal to the side length of the convolutional
kernel, equivalent to reducing the image by N times, the
convolution operation of the step is also a way to reduce
dimensions, the size of the picture after convolution[8]: If the
step size is S, the original picture size is [N1, N1], the
convolutional kernel size is [N2, N2], then the size of the graph
after convolution:[(N1N2)/S+1,(N1N2)/S+1].
After the transformation of the image data, we will find that
in the convolutional layer, a convolutional kernel can only
extract one feature, which is obviously not enough, but through
multiple convolutional nuclei at the same time to obtain
multiple sets of features can be combined, judged, or classified.
With the support of a large amount of data, it is possible to
achieve a freer image style conversion after a certain degree of
mechanical learning. Convolution in the image conversion
more emphasis on certain features, for the picture, pixels and
depth is the most significant features[9], and the significance of
convolution is to extract different characteristics after
calculation and strengthening, as the computer can understand
the "image features". When the number of convolutional
kernels reaches a certain level, a convolutional layer is formed,
so the convolutional layer is also called the feature layer[10],
the characteristic of the convolutional layer is to obtain many
convolutional kernels (features) through convolutional
operations, as the operation is repetitive, so it often contains
many redundant data, most data cannot be used, so it still needs
to "pool"[11].
The pooling layer, also known as the feature mapping
layer, as the name suggests, is a certain processing of a large
amount of feature data in the convolutional layer, to filter and
integrate many duplicate data or similar data, reduce the overall
data volume, and improve the availability of data. From the
perspective of hierarchy, the pooling layer as a whole is
characterized by downward sampling, if the meaning of the
convolutional layer or convolution lies in the conversion and
acquisition of the characteristics of the image data[12], then the
significance of the pooling layer is to screen and refine the
feature information extracted by the convolutional layer, and
select the most representative features, which can lower the
repeatability of the pixels, make the subsequent convolution
more meaningful, and calculate more conveniently[13].
Pooling has a maximum pooling, average pooling, etc.
different ways, the use of that way may depend on the situation,
if the adjacent pixels of an image are very similar, and the
number is large, then often the use of maximum pooling can
achieve better results[14], if the picture composition is more
complex, needs to extract more features, then you can consider
the average pooling, the following Fig. 3 is the largest pooling
of simple ideas and calculation process:
Figure 3. The process of maximum pooling
Convolutional layer and pooling layer together constitute the
feature extraction stage of convolutional neural network[15], a
complete convolutional neural network includes many
convolutional layers and pooling layers, to achieve image style
conversion through the support of a large amount of data, this
part is completed by the software or machine itself, and
constantly repeat the convolution, pooling process, making the
data more and more useful. The overall process is shown in the
following Fig. 4:
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2023.20.13
Lingling Wang, Xingguang Dong
E-ISSN: 2224-2899
123
Volume 20, 2023
Figure 4. Feature extraction stage process
2.2 Classification identification stage
Classification recognition stage by one or more fully
connected layers, convolutional neural network processing of
the image of the basic process, can be seen as the image data for
continuous convolutional calculation and pooling
calculation[16], by changing the size of the convolutional
kernel, the number, extract more features, finally, throws the
features which go through multiple convolution and pooling
processing into one or more fully connected layers, using of
softmax function to classify them, so as to achieve the role of
identifying different categories of objects, its overall flow as
shown in the following Fig. 5:
Figure 5. Convolutional neural network process diagram
Therefore, the focus of convolutional neural networks is the
fully connected layer network supported by a large amount of
data, and the convolutional layer and pooling layer can be
regarded as a tool for machine learning from the result[17], and
the image style conversion based on the convolutional neural
network, first, takes out the features from the image, enters the
network, Secondly, undergoes step-by-step transformation, and
samples downward .Finally, repeat this process repeatedly, the
longer the "learning" time, the more data can be used, the more
accurate the classification, and the more diverse, accurate, and
rapid the final transformation of the image style will be.
3. Proposed Method
3.1 Project introduction
Deep dream is a convolutional neural network based on
Google in 2015, can make artistic modifications to the image
technology[18], it can make the modified picture produce a
fantastic artistic effect, just like the dream of people in the
dream, strange abstraction, so it is called deep dream, with this
technology, photos with an artistic style can be obtained. The
images generated by this technique can not only impress
people, but also help us understand what convolutional neural
networks are learning[19].
3.2 Fundamentals
In the past practice, we used convolutional neural
networks for image recognition, input a large amount of sample
data into the network, test the features extracted by neurons,
train the gradient of the neural network, reverse update the
convolutional neural network weights, and iterate repeatedly
until the network converges to the expected accuracy to stop
training, then the resulting neural network can be used to
classify in Fig. 6.
Figure 6. Training process
In contrast to the convolutional neural network's practice
of testing the features extracted by neurons by entering
pictures, the deep dream model is to randomly select some
neurons to observe what their simulated pictures may look like,
and this information is deep dream model updated back to the
content of the network in reverse, so that the features or
enhanced patterns that each neuron most want to represent can
be obtained, assuming that we continue to iterate the output and
constantly activate the features it wants to represent, and the
final output result will be closer and closer to the target image.
The essence of the deep dream model is to visualize the
characteristics of each layer in the neural network through the
gradient ascending method, and the difference from the
convolutional neural network[20], that is the reverse feedback
updates not the convolutional neural network weights, but the
pixel values in Fig. 7.
Figure 7. Feedback network
Suppose the image of the input network is x, the
N-dimensional vector [P1, P2,...,Pn] represents that there are N
classifications, assuming that the probability the image is of
class A as PA, the higher the value of PA, the higher the
probability that the image is of class A, the higher the
probability that the image is of class A, and PA as our
optimization goal, constantly adjust the parameters, so that the
value of PA is maximized, the image has the characteristics of
class A more prominent, and finally achieve the effect of
generating a deep dream picture. The result after the Deep
Dream model run is to maximize the probability of a certain
category (or the image of various categories in the CNN) to
obtain a picture, of course, we can also achieve the final goal by
maximizing the activation layer of a certain channel of the
convolutional layer, that is, visualizing the features of the
convolutional layer.
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2023.20.13
Lingling Wang, Xingguang Dong
E-ISSN: 2224-2899
124
Volume 20, 2023
3.3 Implementation
The network parameters of the convolutional neural network
model, which involved in the Deep Dream model are fixed and
are already trained models, so a convolutional neural network
image classification model should be imported first. The
starting point for the development of convolutional neural
networks was the neurocognitive machine model, at that time
there was already a convolutional structure, the first neural
network was LeNet, but as the technology of convolutional
neural networks was not very mature at that time, it was
replaced by other hand-designed feature classifiers. With the
advent of Dropout, ReLu, and GPU+ big data, convolutional
neural networks ushered in an epic breakthrough in Fig. 8.
Figure 8. Evolution of convolutional neural networks
The evolution of convolutional neural networks can be
clearly seen from the graph. After Alex Net, people through the
network increased the functionality of the convolutional layer,
the transition from classification tasks to detection tasks, and
the addition of new functional modules, which made
convolutional neural networks evolve into deep learning
networks such as VGG, Inception, ResNet, etc. Here we use the
Inception as the model we imported.
In the case of TensorFlow, there are two ways to store and
load a model, one is to generate a checkpoint file, and the other
is to generate a graph protocol file. Here the model is imported
as a diagram file. After storing the model as a graph file, first
define the placeholders for the input picture, and then
preprocess the picture, subtract the mean, and increase the
dimension. Subtraction is due to the subtraction averaging
preprocessing done during the training of inception, so the
same value needs to be subtracted here to maintain consistency.
The dimension is increased because when the image is entered
into the network, the image is often not one, but a batch, so it is
necessary to add a batch dimension so that multiple pictures can
be entered into the network at the same time. Finally, the model
is imported, and the preprocessed image is fed into the network.
After completing the basic operation of the graph, you can
output the number of convolutional layers, you can also output
the names of all convolutional layers, and the parameters of the
specified convolutional layer, it should be noted here that when
outputting parameters, these three dimensions are the batch,
height and width of the image, because at this time the image
has not been entered, it is not clear the size and quantity of the
input image, and the last dimension is CHANNEL, that is, the
number of channels of the convolutional layer, due to the
ImageNet image classification mode imported by the deep
Dream model is trained, so its network parameters are fixed
values and never change.
4. Experiments and Analysis
4.1 Features of the technology
The main idea of Deep dream is to select a channel or
convolutional layer of a convolutional layer (or multiple
network layers, change the image pixels --the biggest
difference from the trainer classifier) to maximize the
activation value of the channel or the layer.
4.2 Training the model
In this experiment, the parameters of deep dream are
optimized by the InceptionV3 model trained from the ImagNet
dataset.
. The architecture of the lnceptionV3 model is quite large,
with 11 layers from 'mixed0' to 'mixed10’. Using different
layers produces different images, with the darker layers
responding to higher-level features such as eyes and faces,
while the earlier layers respond to simpler features such as
edges, shapes, and textures.
The layers that can be freely tried to choose, deeper layers
(layers with higher indexes) take longer to train because of the
gradients are calculated more deeply in Fig. 9.
Figure 9. InceptionV3 model
4.3 Single-layer network compared with
multi-layer network
A single loss layer single channel is used to extract the
characteristics of the specified channel for the style conversion
of the background image, which generate a deep dream image.
By maximizing the average value of a certain channel can get a
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2023.20.13
Lingling Wang, Xingguang Dong
E-ISSN: 2224-2899
125
Volume 20, 2023
meaningful image, compared with the original background
picture, the texture of the picture becomes visible to the naked
eye, and the color of the image has also changed significantly,
which is a wonderful place of deep dream technology. But its
effect is only slightly changed on the basis of the original image
The experimental demonstration is described in Fig. 10 Fig.
15:
Figure 10. Background image A
Figure 11. Deep dream single-layer channel feature
conversion A
Figure 12. Background image B
Figure 13. Deep dream single-layer channel feature
conversion B
Figure 14. Background image C
Figure 15. Deep dream single-layer channel feature
conversion C
Using multiple loss layers full-channel network to extract
multiple features comprehensively applied to the background
layer, generating a Deep Dream image. Multi-layer network
loss is the sum of the output of the selected layer activation
function, the loss is normalized at each layer, so the difference
in the impact of each layer on the result will not be very large.
(The purpose of this experiment is mainly to compare the
effects of single-channel extraction features and multi-channel
extraction features on the same background picture, to obtain a
more suitable optimization method for deep dream technology.
According to the results of this experiment, the use of all
channel features to generate a Deep dream image is more
abstract, the color of the image changes significantly from the
use of a single channel to extract features, with changes in
brightness, edges, and some details on the picture that have a
noticeable effect. The experimental demonstration is described
in Fig. 16 – Fig. 18:
Figure 16. Deep dream multilayer full-channel
feature conversion A
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2023.20.13
Lingling Wang, Xingguang Dong
E-ISSN: 2224-2899
126
Volume 20, 2023
Figure 17. Deep dream multilayer full-channel
feature conversion B
Figure 18. Deep dream multilayer full-channel
feature transformation C
3.4 Problems and Solutions
Through the results of deep dream processing of the original
image, it is not difficult to find that the image has the following
problems:
First, the output picture is relatively rough and noisy.
Although the picture has undergone obvious changes under the
feature processing of the above single channel and
multi-channel, overall, the texture and pixel values of the image
are disorganized, and this study aims to generate an abstract
beauty image, and there are still some places that need to be
improved and optimized to produce images that are accepted by
the public
Second, the resolution of the image is relatively low. Both of
the above methods have a significant drawback, the resolution
of the image is low, such an image looks blurry, some details
are still not very clear after processing, it is difficult to attract
the viewer's attention, so it is still necessary to find ways to
improve the deep dream, so that the image is more easily
recognized and liked.
Third, the texture of output feature mode is similar between
the various parts. Texture features are represented by the
grayscale distribution of pixels and their surrounding spatial
neighborhoods, i.e., local texture information. In addition, the
repetitiveness of local texture information to varying degrees is
global texture information. While a texture feature reflects the
nature of a global feature, it also describes the surface
properties of the scene corresponding to the image or image
area. Unlike color features, texture features are not pixel-based
features, which require statistical calculations in areas that
contain multiple pixels. When retrieving texture images with
large differences in thickness, density, etc., using texture
features is an effective method. However, when there is little
difference between the thickness and density of textures and
other easily distinguishable information, it is difficult for the
usual texture features to accurately reflect the difference
between different textures of people's visual perceptions.
For the deep dream has shortcomings when using existing
training models, this study reduces the image into a different
proportion of the size, in order to more effectively shrink the
image, removed the number of channels for the image , only
retain the width and height of the image of the two pixel values,
and then through the loop iteration, under the existing set scale
of the zoom, the image shape size continues to change in a fixed
proportion, and then for each changed picture, with the defined
channel extraction feature of the deep dream technology
optimized, after the end of the loop, the iteratively optimized
image is adjusted to the original scale. The conclusion of the
experiment is shown in Fig .19 – Fig. 21.
Figure 19. Deep dream optimization feature A
Figure 20. Deep dream optimization feature B
Figure 21. Deep dream optimization feature C
5. Conclusion
This paper is mainly to study the use of deep dream method
to generate artistic images, and further optimize the algorithm
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2023.20.13
Lingling Wang, Xingguang Dong
E-ISSN: 2224-2899
127
Volume 20, 2023
in the technology of deep dream, and compare the pictures
generated by the original deep dream method with the
optimized effect diagram, which shown in abstract artistic
renderings with softer textures, clearer images, and richer
content. The experimental results show that the style
conversion using technology optimized by Deep Dream makes
the demonization effect of the converted background picture
more significant, the resulting image is softer, and the image
texture is clearer. However, there are still some shortcomings in
this technology that need to be improved, such as the short
training batch of the algorithm, the small number of trainings,
and the relative redundancy of the algorithm, which are
important topics worth talking about in the future.
Acknowledgement
This work was supported in part by the Science Research
Project of Anhui University of Finance and Economics under
grant No. ACKYC20085.
References
[1] Gatys L.A., Ecker A.S., Bethge Texture Synthesis Using
Convolutional Neural Networks ,2015.
[2] GATYS L AECKER A S, BETHGEM. Image style
transfer using convolutional neural networks[C]/Proc of
IEEE Conference on Computer Vision and Pattern
Recognition. IEEE Computer Society, 2016: 2414-2423
[3] JOHNSON JALAHI AFEI-FEI LI. Perceptual losses
for real-time style transfer and super resolution[C]//Proc of
European Conference on Computer Vision.Cham:
Springer,2016:694-711.
[4] Shang Ronghua, Meng Yang, Zhang Weitong, Shang
Fanhua, Jiao Licheng, Yang Shuyuan. Graph
Convolutional Neural Networks with Geometric and
Discrimination information[J]. Engineering Applications
of Artificial Intelligence,2021,104.
[5] Zou Yan, Zhang Linfei, Liu Chengqian, Wang Bowen, Hu
Yan, Chen Qian. Super-resolution reconstruction of
infrared images based on a convolutional neural network
with skip connections[J]. Optics and Lasers in
Engineering,2021,146.
[6] Ozkok Fatma Ozge, Celik Mete. Convolutional neural
network analysis of recurrence plots for high resolution
melting classification[J]. Computer Methods and
Programs in Biomedicine,2021,207.
[7] Rekha Rajagopal. Comparative Analysis of COVID-19
X-ray Images Classification Using Convolutional Neural
Network, Transfer Learning, and Machine Learning
Classifiers Using Deep Features[J]. Pattern Recognition
and Image Analysis,2021,31(2).
[8] Wei Bingzhe, Feng Xiangchu, Wang Kun, Gao Bian. The
Multi-Focus-Image-Fusion Method Based on
Convolutional Neural Network and Sparse Representation.
[J]. Entropy (Basel, Switzerland),2021,23(7).
[9] Chen Yonghao, Cheng Ming Bao. Sports Sequence Images
Based on Convolutional Neural Network[J]. Mathematical
Problems in Engineering,2021, 2021.
[10] Tokarev K E, Zotov V M, Khavronina V N, Rodionova O
V. Convolutional neural network of deep learning in
computer vision and image classification problems[J]. IOP
Conference Series: Earth and Environmental
Science,2021,786(1).
[11] Liu Xun, Chen XiaoLin, Xia GuoQing, Chen HuaZhen,
Ye HeZhong, Chen ZhanChi. A Multi-view Image Sets
Classification Based on Graph Convolutional Neural
Network[J]. Journal of Physics: Conference
Series,2021,1948(1).
[12] Jiaqi Wang, Yibo Fan. Predictive fire image recognition
based on convolutional neural networks[J]. Scientific
Journal of Intelligent Systems Research,2021,3(6).
[13] Xiao Haixia, Zhang Feng, Shen Zhongping, Wu Kun,
Zhang Jinglin. Classification of Weather Phenomenon
from Images by Using Deep Convolutional Neural
Network[J]. Earth and Space Science,2021,8(5).
[14] Wang Hao, Jiao Kaijie. Blind guidance system based on
image recognition and convolutional neural network[J].
IOP Conference Series: Earth and Environmental
Science,2021,769(4).
[15] Ji Zou, Chao Zhang, Zhongjing Ma. An Image
Classification Algorithm for Plantar Pressure Based on
Convolutional Neural Network[J]. TS,2021,38(2).
[16] Yiyue Luo, Yu Fan, Xianjun Chen. Research on
optimization of deep learning algorithm based on
convolutional neural network[J]. Journal of Physics:
Conference Series,2021,1848(1).
[17] Chen Yuanyi. Research on Convolutional Neural Network
Image Recognition Algorithm Based on Computer Big
Data[J]. Journal of Physics: Conference
Series,2021,1744(2).
[18] Kazuya URAZOE, Nobutaka KUROKI, Yu KATO,
Shinya OHTANI, Tetsuya HIROSE, Masahiro NUMA.
Multi-Category Image Super-Resolution with
Convolutional Neural Network and Multi-Task Learning:
Regular Section[J]. IEICE Transactions on Information
and Systems,2021, E104.D(1).
[19] Zhang Qinghua. CNNA: A study of Convolutional Neural
Networks with Attention[J]. Procedia Computer
Science,2021,188.
[20] Ademola E. Ilesanmi, Taiwo O. Ilesanmi. Methods for
image denoising using convolutional neural network: a
review[J]. Complex & Intelligent Systems,2021,7(5).
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2023.20.13
Lingling Wang, Xingguang Dong
E-ISSN: 2224-2899
128
Volume 20, 2023
The authors declare that they have no competing financial
interests or personal relationships that could have appeared to
influence the work reported in this paper.
Lingling Wang and Xingguang Dong designed the
experiments, implemented the deep learning models,
performed the experiments, analyzed the experiment results
and wrote the paper.
WSEAS TRANSACTIONS on BUSINESS and ECONOMICS
DOI: 10.37394/23207.2023.20.13
Lingling Wang, Xingguang Dong
E-ISSN: 2224-2899
129
Volume 20, 2023
This work was supported in part by the Science Research
Project of Anhui University of Finance and Economics under
grant No. ACKYC20085.
Contribution of Individual Authors to the
Creation of a Scientific Article (Ghostwriting
Policy)
Sources of Funding for Research Presented in a
Scientific Article or Scientific Article Itself
Conflict of Interest
Creative Commons Attribution License 4.0
(Attribution 4.0 International, CC BY 4.0)
This article is published under the terms of the
Creative Commons Attribution License 4.0
https://creativecommons.org/licenses/by/4.0/deed.en
_US