The Affect Task

Task
Boredom detection: This task involves identifying videos with high and low levels of dramatic tension. In particular, distinguishing between video content that cause the viewer to feel bored and video content that makes the user feel entertained. Visual features, speech transcripts and metadata can all be used for this task. Participants can make use of spoken, visual, and audio content as well as accompanying metadata.

Target group
The target audiences are the researchers in the field of multimedia content analysis who are interested in understanding affective dimension of their content. Estimating the audiences’ affect and emotion leads to a better summarization and tagging of multimedia. This analysis can be done by means of text, speech/audio, or visual content. Low level prosodic audio features and video content features such as color energy, motion components, and color variance are typical examples for affective multimedia content analysis.

Data
The video set consists of short videos from a documentary made by Bill Bowel's travel project, My Name is Bill. Each episode tells a story about a place visited during his travel around the world. The videos are about two to five minutes long and chosen to vary along a broad spectrum with respect to their potential to be either boring or entertaining. The videos, the extracted speech by automated speech recognition and the available metadata including the episodes' popularity, and the annotations will be provided in the sample dataset. A second dataset will be released to serve as the test set later in order to evaluate the submissions.

Groundtruth and evaluation
The video dataset will be affectively annotated by multiple participants of different genders and backgrounds. The videos will be shown to participants without showing the time line and they will be asked to guess the length of the video and the level of their boredom in nine point scale. The time perception is related to the level joy and amusement and can be used as the second indicator for boredom level. The evaluation will be done using the control/test dataset. The estimated boredom level will be used to rank the videos in the control set from the most entertaining (least boring) to the most boring and the ranking distance measure of Kandall’s tau (by the same definition given in Yi-Hsuan Yang and Homer H. Chen, Music emotion ranking, ICASSP 2009) will be computed. The groundtruth will be generated by averaging the judgments of more than 10 participants.

Task coordinator: Mohammad Soleymani, University of Geneva
(Mohammad dot Soleymani at unige dot ch)