The 2015 Affective Impact of Movies Task (includes Violent Scenes Detection)
In this task, participants create systems that automatically detect video content that depicts violence, or predict the affective impact that video content will have on viewers. The use case scenario is a video search system that uses automatic tools to help users find videos that fit their particular mood, age or preferences. The use of automated tools have become more and more crucial in today's world where users have to deal with a widely increasing variety of video sources and content, e.g., user-generated footage, amateur videos, hobbyist content.

This task continues builds on previous years' editions of the Affect in Multimedia Task: Violent Scenes Detection. It is not necessary to have participated in previous years to be successful in the 2015 task.

Two subtasks will be offered to the participants.
  • Induced Affect Detection: the emotional impact of a video or movie can be a strong indicator in a search or recommendation scenario. For each video, participants are expected to predict its valence (i.e., negative-neutral-positive) and arousal (i.e., calm-neutral-excited) class.
  • Violence Detection: detecting violent content helps parents previewing videos in view of selecting those materials that are most suitable for their children. For a given video, participants are expected to classify it as violent or non-violent. The following definition of violence is to be used: “violent videos are those one would not let an 8 years old child see because of their physical violence”.
Participants are strongly encouraged to submit to both subtasks, however, on request, we may also accept single task submissions (Please write an email to the task organizers to explain why you only wish to submit to one task).

Target group
Researchers in the areas of multimedia information retrieval, machine vision, affective computing, multimedia content analysis and video recommending systems (but not limited to).

The proposed dataset, used for both subtasks, consists of around 10,000 video clips extracted from about 100-200 movies, both professionally made and amateur movies. Movies are shared under Creative Commons licenses that allow redistribution and will be provided to participants. In addition to the raw data, participants will also be provided with pre-computed general purpose audio and visual content descriptors. The dataset will be partially based on the publicly available LIRIS-ACCEDE dataset.

In solving the task, participants are expected to exploit only the resources provided in the task. Use of external resources (e.g., Internet data) will be however allowed as specifically marked runs.

Ground truth and evaluation
Movies are annotated for their valence and arousal classification (annotations are obtained via crowdsourcing) and their violent contents (annotations are performed by human assessors).

For the violence subtask the official evaluation metric will be average precision calculated over the returned video clips. Also other standard evaluation metrics will be provided to assess the systems' performance.

For the affect detection subtask, the global accuracy will likely be the official evaluation metric. This is the proportion of the returned video clips that have been assigned to the correct class.

Recommended reading
[1] Sjöberg, M., Ionescu, B., Jiang, Y-G., Quang, V. L., Schedl, M., Demarty, C-H., The MediaEval 2014 Affect Task: Violent Scenes Detection. In Working Notes Proceedings of the MediaEval 2014 Workshop, Barcelona, Spain, October 16-17, 2014,, ISSN 1613-0073.
[2] Baveye, Y., Dellandrea, E., Chamaret, C., Chen, L., LIRIS-ACCEDE: A Video Database for Affective Content Analysis. In IEEE Transactions on Affective Computing, 2015.
[3] Demarty, C-H., Ionescu, B., Jiang, Y.-G., Quang, V.L., Schedl, M., Penet, C., Benchmarking Violent Scenes Detection in Movies, IEEE International Workshop on Content-Based Multimedia Indexing - CBMI, 18-20 June, Klagenfurt, Austria, 2014.
[4] Acar, E., Hopfgartner, F., Albayrak, S., Violence Detection in Hollywood Movies by the Fusion of Visual and Mid-level Audio Cues. In Proceedings of ACM International Conference on Multimedia. ACM, Barcelona, Spain, 2013, 717-720.
[5] de Souza, F. D.M., Chavez, G.C., do Valle, E. A., Araijo, A. de A., Violence Detection in Video Using Spatio-Temporal Features. In Proceedings of 23rd SIBGRAPI Conference on Graphics, Patterns and Images. Gramado, Brazil, 2010.

Task organizers
Mats Sjöberg, University of Helsinki, Finland (contact person)
Bogdan Ionescu, University "Politehnica" of Bucharest, Romania
Emmanuel Dellandréa, Ecole Centrale de Lyon, France
Hanli Wang, Tongji University, China
Markus Schedl, Johannes Kepler University, Austria
Vu Lam Quang, HCMC University of Science, Vietnam
Yoann Baveye, Technicolor, France

Task auxiliaries
Claire-Hélène Demarty, Technicolor, France
Liming Chen, Ecole Centrale de Lyon, France

Task schedule
4 May: Development data release
1 June: Test data release
10 August: Run submission
14 August: Results returned to participants
23 August: Working notes paper deadline
14-15 September MediaEval 2015 Workshop

  • UEFISCDI SCOUTER (under grant no. 28DPST/30-08-2013)
  • Austrian Science Fund (FWF): P25655
  • EU-FP7-ICT-2011-9, no. 601166 “PHENICX
  • ANR through the VideoSense Project under the Grant 2009 CORD 026 02
  • The Visen project within the ERA-NET CHIST-ERA framework under the grant ANR-12-CHRI-0002-04