WSEAS Transactions on Systems and Control
Print ISSN: 1991-8763, E-ISSN: 2224-2856
Volume 15, 2020
A Survey on Different Deep Learning Architectures for Image Captioning
Authors: ,
Abstract: Vision plays an important part which helps us to look at the world and perceive information about our surroundings. A human perceives information by looking at an object or the surrounding on the whole and tries to map visual features and attributes and by summarizing these features we can describe or tell about our surroundings. The way the human brain does this is still a huge mystery. But, For a machine/computer this task is what is called as Image Captioning. The computer or machine is fed with images from which they learn to extract features i.e pixel information, object position, geometry, etc. Using these features the machine tries to map it to a sentence word by word or on a whole which summarizes the information of the image. Due to the advancements in recent Computer Vision Methods and Deep Learning architectures, Computers have been able to correctly summarize images which have been fed to them. In this paper, we present a survey on the new types of architectures and the datasets which are being used to train such architectures. Furthermore, we have discussed future methods that can be implemented.
Search Articles
Pages: 635-646
DOI: 10.37394/23203.2020.15.63