TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Keyframe-based video summarization with human in the loop

Tutkimustuotosvertaisarvioitu

Standard

Keyframe-based video summarization with human in the loop. / Ainasoja, Antti E.; Hietanen, Antti; Lankinen, Jukka; Kämäräinen, Joni-Kristian.

VISIGRAPP 2018 - Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications . Vuosikerta 4 SCITEPRESS, 2018. s. 287-296.

Tutkimustuotosvertaisarvioitu

Harvard

Ainasoja, AE, Hietanen, A, Lankinen, J & Kämäräinen, J-K 2018, Keyframe-based video summarization with human in the loop. julkaisussa VISIGRAPP 2018 - Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications . Vuosikerta. 4, SCITEPRESS, Sivut 287-296, INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, 1/01/00. https://doi.org/10.5220/0006619202870296

APA

Ainasoja, A. E., Hietanen, A., Lankinen, J., & Kämäräinen, J-K. (2018). Keyframe-based video summarization with human in the loop. teoksessa VISIGRAPP 2018 - Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Vuosikerta 4, Sivut 287-296). SCITEPRESS. https://doi.org/10.5220/0006619202870296

Vancouver

Ainasoja AE, Hietanen A, Lankinen J, Kämäräinen J-K. Keyframe-based video summarization with human in the loop. julkaisussa VISIGRAPP 2018 - Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications . Vuosikerta 4. SCITEPRESS. 2018. s. 287-296 https://doi.org/10.5220/0006619202870296

Author

Ainasoja, Antti E. ; Hietanen, Antti ; Lankinen, Jukka ; Kämäräinen, Joni-Kristian. / Keyframe-based video summarization with human in the loop. VISIGRAPP 2018 - Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications . Vuosikerta 4 SCITEPRESS, 2018. Sivut 287-296

Bibtex - Lataa

@inproceedings{40911791a0194f0aadfc43dbaaf96924,
title = "Keyframe-based video summarization with human in the loop",
abstract = "In this work, we focus on the popular keyframe-based approach for video summarization. Keyframes represent important and diverse content of an input video and a summary is generated by temporally expanding the keyframes to key shots which are merged to a continuous dynamic video summary. In our approach, keyframes are selected from scenes that represent semantically similar content. For scene detection, we propose a simple yet effective dynamic extension of a video Bag-of-Words (BoW) method which provides over segmentation (high recall) for keyframe selection. For keyframe selection, we investigate two effective approaches: local region descriptors (visual content) and optical flow descriptors (motion content). We provide several interesting findings. 1) While scenes (visually similar content) can be effectively detected by region descriptors, optical flow (motion changes) provides better keyframes. 2) However, the suitable parameters of the motion descriptor based keyframe selection vary from one video to another and average performances remain low. To avoid more complex processing, we introduce a human-in-the-loop step where user selects keyframes produced by the three best methods. 3) Our human assisted and learning-free method achieves superior accuracy to learning-based methods and for many videos is on par with average human accuracy.",
keywords = "Optical flow descriptors, Region descriptors, Video summarization, Visual bag-of-words",
author = "Ainasoja, {Antti E.} and Antti Hietanen and Jukka Lankinen and Joni-Kristian K{\"a}m{\"a}r{\"a}inen",
note = "INT=sgn,{"}Lankinen, Jukka{"}",
year = "2018",
doi = "10.5220/0006619202870296",
language = "English",
volume = "4",
pages = "287--296",
booktitle = "VISIGRAPP 2018 - Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications",
publisher = "SCITEPRESS",

}

RIS (suitable for import to EndNote) - Lataa

TY - GEN

T1 - Keyframe-based video summarization with human in the loop

AU - Ainasoja, Antti E.

AU - Hietanen, Antti

AU - Lankinen, Jukka

AU - Kämäräinen, Joni-Kristian

N1 - INT=sgn,"Lankinen, Jukka"

PY - 2018

Y1 - 2018

N2 - In this work, we focus on the popular keyframe-based approach for video summarization. Keyframes represent important and diverse content of an input video and a summary is generated by temporally expanding the keyframes to key shots which are merged to a continuous dynamic video summary. In our approach, keyframes are selected from scenes that represent semantically similar content. For scene detection, we propose a simple yet effective dynamic extension of a video Bag-of-Words (BoW) method which provides over segmentation (high recall) for keyframe selection. For keyframe selection, we investigate two effective approaches: local region descriptors (visual content) and optical flow descriptors (motion content). We provide several interesting findings. 1) While scenes (visually similar content) can be effectively detected by region descriptors, optical flow (motion changes) provides better keyframes. 2) However, the suitable parameters of the motion descriptor based keyframe selection vary from one video to another and average performances remain low. To avoid more complex processing, we introduce a human-in-the-loop step where user selects keyframes produced by the three best methods. 3) Our human assisted and learning-free method achieves superior accuracy to learning-based methods and for many videos is on par with average human accuracy.

AB - In this work, we focus on the popular keyframe-based approach for video summarization. Keyframes represent important and diverse content of an input video and a summary is generated by temporally expanding the keyframes to key shots which are merged to a continuous dynamic video summary. In our approach, keyframes are selected from scenes that represent semantically similar content. For scene detection, we propose a simple yet effective dynamic extension of a video Bag-of-Words (BoW) method which provides over segmentation (high recall) for keyframe selection. For keyframe selection, we investigate two effective approaches: local region descriptors (visual content) and optical flow descriptors (motion content). We provide several interesting findings. 1) While scenes (visually similar content) can be effectively detected by region descriptors, optical flow (motion changes) provides better keyframes. 2) However, the suitable parameters of the motion descriptor based keyframe selection vary from one video to another and average performances remain low. To avoid more complex processing, we introduce a human-in-the-loop step where user selects keyframes produced by the three best methods. 3) Our human assisted and learning-free method achieves superior accuracy to learning-based methods and for many videos is on par with average human accuracy.

KW - Optical flow descriptors

KW - Region descriptors

KW - Video summarization

KW - Visual bag-of-words

U2 - 10.5220/0006619202870296

DO - 10.5220/0006619202870296

M3 - Conference contribution

VL - 4

SP - 287

EP - 296

BT - VISIGRAPP 2018 - Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications

PB - SCITEPRESS

ER -