Violent Scenes Detection

Announcement of Data Release
The task has concluded at the data has been released. Please see MediaEval Datasets.

The 2013 Affect Task: Violent Scenes Detection
This task is a follow-up of last year's edition. It requires participants to automatically detect portions of movies depicting violence. To solve the task, participants are strongly encouraged to deploy multimodal approaches that make use of audio, visual and text modalities.
This challenge derives from a use case at Technicolor. The use case involves helping users choose movies that are suitable for children of different ages. The movies should be suitable in terms of their violent content, e.g., for viewing by users' families. Users select or reject movies by previewing parts of the movies (i.e., scenes or segments) that are the most violent.

For additional information on this task, please see the Violent Scenes Detection page at Technicolor https://research.technicolor.com/rennes/vsd/ This year, two different subtasks are proposed which correspond to two different – objective and subjective - definitions of the targeted violent segments. Participants are required to submit to both subtasks. However, on request, we may also accept single subtask submission.

Subtask 1: objective definition The previous definition from 2012: Violence is defined as "physical violence or accident resulting in human injury or pain".
Subtask 2: subjective definition For this subtask, the targeted violent segments are those “one would not let an 8 years old child see in a movie because they contain physical violence”.

For both subtasks, and for the main run, any features automatically extracted from the video, including the subtitles, can be used by participants. Optional runs will also include the possibility for the participants to use additional external data (e.g., Internet resources).

Target group
Researchers in the areas of event detection, multimedia affect, or multimedia content analysis..

Data
A set of ca. 25 Hollywood movies that must be purchased by the participants. The movies are of different genres, from extremely violent movies to movies without violence.

Ground truth and evaluation
Violence ground truth is created by human assessors and is provided by the task organizers. In addition to segments containing physical violence (with the two above definitions), annotations include the following high-level concepts: presence of blood, fights, presence of fire, presence of guns, presence of cold arms, car chases and gory scenes, for the visual modality; gunshots, explosions and screams for the audio modality. Note that participants are welcome to carry out detection of the high-level concepts. However, concept detection is not a requirement for the task since these high-level concept annotations are provided for training purposes.

The official evaluation metric will be the Mean Average Precision (MAP) over the top N best ranked violent segments. Additionally to this metric, several performance measures will be used for diagnostic purposes, for instance: false alarm rate, missed detection rate, AED-precision and recall, F-measures, average precision, MediaEval cost (a weighted combination of the estimated probabilities of respectively false alarms and missed detection introduced at MediaEval 2011). Whenever possible, detection error trade-off curves will also be used, to avoid the sole comparison of the systems at given operating points. As an extra, participants will also be encouraged to present a summary of their most violent extracted segments during the MediaEval workshop. This will not be evaluated by the organizers this year, but it will serve as a first basis for future evolution of the task.

Recommended reading
2012 working notes together with the task description for last year can be found at: http://ceur-ws.org/Vol-927/.

C. H. Demarty, C. Penet, G. Gravier, M. Soleymani, The MediaEval 2012 Affect Task : Violent Scenes Detection, in MediaEval 2012 Workshop, ceur-ws.org, vol. 927, Pisa, October 2012.

C. H. Demarty, C. Penet, G. Gravier, M. Soleymani, A benchmarking campaign for detecting violent scenes in movies, ECCV2012 workshop on Information Fusion in Computer Vision for Concept Recognition, Firenze, October 2012.

B. Ionescu, J. Schlüter, I. Mironică, M. Schedl, A Naive Mid-level Concept-based Fusion Approach to Violence Detection in Hollywood Movies, ACM International Conference on Multimedia Retrieval - ICMR 2013, Dallas, Texas, USA, April 16 - 19, 2013

F. de Souza, G. Cháandvez, E. do Valle & A. de A Araujo, Violence Detection in Video Using Spatio-Temporal Features Graphics, Patterns and Images (SIBGRAPI), 2010 23rd SIBGRAPI Conference on, 2010, 224 -230.

E. B. Nievas, O.D. Suarez, G.B. Garca & R. Sukthankar, R.,Violence detection in video using computer vision techniques, Proceedings of the 14th international conference on Computer analysis of images and patterns - Volume Part II, Springer-Verlag, 2011, 332-339.

L.-H. Chen, H.-W. Hsu, , L.-Y. Wang & C.-W. Su, Violence Detection in Movies, Computer Graphics, Imaging and Visualization (CGIV), 2011 Eighth International Conference on, 2011, 119 -124 .

J. Lin & W. Wang, Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training. PCM'09, 2009, 930-935.

T. Giannakopoulos, A. Makris, D. Kosmopoulos, S. Perantonis and S. Theodoridis, "Audio-visual fusion for detecting violent scenes in videos", Artificial Intelligence: Theories, Models and Applications. Lecture Notes in Computer Science, 2010, Volume 6040/2010, 91-100.

Yu Gong, Weiqiang Wang, Shuqiang Jiang, Qingming Huang and Wen Gao, "Detecting Violent Scenes in Movies by Auditory and Visual Cues", Advances in Multimedia Information Processing - PCM 2008. Lecture Notes in Computer Science, 2008, Volume 5353/2008, 317-326.

Task organizers
Claire-Helene Demarty, Technicolor, France
Cédric Penet, Technicolor, France
Yu-Gang Jiang, Fudan University, Shanghai, China
Bogdan Ionescu, University Politehnica of Bucharest, Romania
Markus Schedl, Johannes Kepler University, Linz, Austria
Vu Lam Quang, Multimedia LAB, University of IT, Vietnam National University of Ho Chi Minh City, Vietnam

Task auxiliaries
Mohammad Soleymani, Imperial College London, UK
Guillaume Gravier, IRISA. France

Task schedule
15 May: Development set release
15 June: Test set release
15 September: Run submission due
19 September: Results returned
28 September: Working notes paper deadline

This task is made possible by a collaboration of projects including:

Quaero
FWF P22856-N23
NSF China (#61201387 and #61228205)
National 973 Program of China (#2010CB327900)
VNU-HCMC Key Project A – 2013

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context