Tampere University of Technology

TUTCRIS Research Portal

Summarization of User-Generated Sports Video by Using Deep Action Recognition Features

Research output: Contribution to journalArticleScientificpeer-review

Standard

Summarization of User-Generated Sports Video by Using Deep Action Recognition Features. / Tejero-de-Pablos, Antonio; Nakashima, Yuta; Sato, Tomokazu; Yokoya, Naokazu; Linna, Marko; Rahtu, Esa.

In: IEEE Transactions on Multimedia, Vol. 20, No. 8, 08.2018, p. 2000-2011.

Research output: Contribution to journalArticleScientificpeer-review

Harvard

Tejero-de-Pablos, A, Nakashima, Y, Sato, T, Yokoya, N, Linna, M & Rahtu, E 2018, 'Summarization of User-Generated Sports Video by Using Deep Action Recognition Features', IEEE Transactions on Multimedia, vol. 20, no. 8, pp. 2000-2011. https://doi.org/10.1109/TMM.2018.2794265

APA

Tejero-de-Pablos, A., Nakashima, Y., Sato, T., Yokoya, N., Linna, M., & Rahtu, E. (2018). Summarization of User-Generated Sports Video by Using Deep Action Recognition Features. IEEE Transactions on Multimedia, 20(8), 2000-2011. https://doi.org/10.1109/TMM.2018.2794265

Vancouver

Tejero-de-Pablos A, Nakashima Y, Sato T, Yokoya N, Linna M, Rahtu E. Summarization of User-Generated Sports Video by Using Deep Action Recognition Features. IEEE Transactions on Multimedia. 2018 Aug;20(8):2000-2011. https://doi.org/10.1109/TMM.2018.2794265

Author

Tejero-de-Pablos, Antonio ; Nakashima, Yuta ; Sato, Tomokazu ; Yokoya, Naokazu ; Linna, Marko ; Rahtu, Esa. / Summarization of User-Generated Sports Video by Using Deep Action Recognition Features. In: IEEE Transactions on Multimedia. 2018 ; Vol. 20, No. 8. pp. 2000-2011.

Bibtex - Download

@article{7ecea19309a549399b2e5decfb7bf148,
title = "Summarization of User-Generated Sports Video by Using Deep Action Recognition Features",
abstract = "Automatically generating a summary of sports video poses the challenge of detecting interesting moments, or highlights, of a game. Traditional sports video summarization methods leverage editing conventions of broadcast sports video that facilitate the extraction of high-level semantics. However, user-generated videos are not edited, and thus traditional methods are not suitable to generate a summary. In order to solve this problem, this work proposes a novel video summarization method that uses players' actions as a cue to determine the highlights of the original video. A deep neural network-based approach is used to extract two types of action-related features and to classify video segments into interesting or uninteresting parts. The proposed method can be applied to any sports in which games consist of a succession of actions. Especially, this work considers the case of Kendo (Japanese fencing) as an example of a sport to evaluate the proposed method. The method is trained using Kendo videos with ground truth labels that indicate the video highlights. The labels are provided by annotators possessing different experience with respect to Kendo to demonstrate how the proposed method adapts to different needs. The performance of the proposed method is compared with several combinations of different features, and the results show that it outperforms previous summarization methods.",
keywords = "3D convolutional neural networks, action recognition, Cameras, deep learning, Feature extraction, Games, Hidden Markov models, long short-term memory, Semantics, Sports video summarization, Three-dimensional displays, user-generated video",
author = "Antonio Tejero-de-Pablos and Yuta Nakashima and Tomokazu Sato and Naokazu Yokoya and Marko Linna and Esa Rahtu",
year = "2018",
month = "8",
doi = "10.1109/TMM.2018.2794265",
language = "English",
volume = "20",
pages = "2000--2011",
journal = "IEEE Transactions on Multimedia",
issn = "1520-9210",
publisher = "Institute of Electrical and Electronics Engineers",
number = "8",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - Summarization of User-Generated Sports Video by Using Deep Action Recognition Features

AU - Tejero-de-Pablos, Antonio

AU - Nakashima, Yuta

AU - Sato, Tomokazu

AU - Yokoya, Naokazu

AU - Linna, Marko

AU - Rahtu, Esa

PY - 2018/8

Y1 - 2018/8

N2 - Automatically generating a summary of sports video poses the challenge of detecting interesting moments, or highlights, of a game. Traditional sports video summarization methods leverage editing conventions of broadcast sports video that facilitate the extraction of high-level semantics. However, user-generated videos are not edited, and thus traditional methods are not suitable to generate a summary. In order to solve this problem, this work proposes a novel video summarization method that uses players' actions as a cue to determine the highlights of the original video. A deep neural network-based approach is used to extract two types of action-related features and to classify video segments into interesting or uninteresting parts. The proposed method can be applied to any sports in which games consist of a succession of actions. Especially, this work considers the case of Kendo (Japanese fencing) as an example of a sport to evaluate the proposed method. The method is trained using Kendo videos with ground truth labels that indicate the video highlights. The labels are provided by annotators possessing different experience with respect to Kendo to demonstrate how the proposed method adapts to different needs. The performance of the proposed method is compared with several combinations of different features, and the results show that it outperforms previous summarization methods.

AB - Automatically generating a summary of sports video poses the challenge of detecting interesting moments, or highlights, of a game. Traditional sports video summarization methods leverage editing conventions of broadcast sports video that facilitate the extraction of high-level semantics. However, user-generated videos are not edited, and thus traditional methods are not suitable to generate a summary. In order to solve this problem, this work proposes a novel video summarization method that uses players' actions as a cue to determine the highlights of the original video. A deep neural network-based approach is used to extract two types of action-related features and to classify video segments into interesting or uninteresting parts. The proposed method can be applied to any sports in which games consist of a succession of actions. Especially, this work considers the case of Kendo (Japanese fencing) as an example of a sport to evaluate the proposed method. The method is trained using Kendo videos with ground truth labels that indicate the video highlights. The labels are provided by annotators possessing different experience with respect to Kendo to demonstrate how the proposed method adapts to different needs. The performance of the proposed method is compared with several combinations of different features, and the results show that it outperforms previous summarization methods.

KW - 3D convolutional neural networks

KW - action recognition

KW - Cameras

KW - deep learning

KW - Feature extraction

KW - Games

KW - Hidden Markov models

KW - long short-term memory

KW - Semantics

KW - Sports video summarization

KW - Three-dimensional displays

KW - user-generated video

U2 - 10.1109/TMM.2018.2794265

DO - 10.1109/TMM.2018.2794265

M3 - Article

VL - 20

SP - 2000

EP - 2011

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

SN - 1520-9210

IS - 8

ER -