Tampere University of Technology

TUTCRIS Research Portal

Image and Video Captioning with Augmented Neural Architectures

Research output: Contribution to journalArticleScientificpeer-review


Original languageEnglish
Pages (from-to)34-46
Number of pages13
JournalIEEE Multimedia
Issue number2
Publication statusPublished - 1 Apr 2018
Publication typeA1 Journal article-refereed


Neural-network-based image and video captioning can be substantially improved by utilizing architectures that make use of special features from the scene context, objects, and locations. A novel discriminatively trained evaluator network for choosing the best caption among those generated by an ensemble of caption generator networks further improves accuracy.


  • Feature extraction, Neural networks, Computational modeling, Multimedia communication, Object recognition, Detectors, image captioning, mulimodal learning, recurrent networks, deep learning, pervasive computing, ubiquitous computing, video captioning, neural networks

Publication forum classification

Field of science, Statistics Finland