The Tagging Task (Professional Version)

This task required participants to assign semantic theme labels from a fixed list of subject labels to videos. It is a multilabel task, by which we mean that a given video may have more than one correct label. The task ran in a smaller form at VideoCLEF 2008 and 2009. Each year more data and more labels are added. The system can either be implemented as a subject classification system (tags are used as subject labels) or an information retrieval system (tags are used as queries).

Target group
Researchers in the area of information retrieval, spoken content retrieval, video retrieval and automatic metadata generation

The task used the TRECVid data collection from the Netherlands Institute for Sound and Vision (Dutch language). The 2007 & 2008 sets were used as training development sets. The 2009 set was used as the test set. Note that the tagging task is completely different than the original TRECVid task since the relevance of the tags to the videos is not necessarily dependent on what is depicted in the visual channel. Participants were provided with speech recognition transcripts, archival metadata, and, if they wish to make use of it, the original video.

Ground truth and Evaluation
The ground truth is the labels that have been assigned by the archivists to the data at the Sound and Vision archive.

2010 Results and Links to Related Working Notes Papers
Three groups, Novay, UNED Madrid and the SINAI group from the University of Jaen, crossed the finish line on this task in 2010. All groups approached the task as an information retrieval task, treating the label as a query and the test set as the collection. The label is assigned to items that are returned as relevant from the test set. Novay and UNED approached the task using only metadata. The Novay approach exploited term co-occurrences, with top performance (MAP 0.49) being achieved by a run that made use of synonyms. The UNED approach made use of term selection, multiple label expansions and also of an interesting clustering approach that exploited topical similarities within the collection. SINAI also made use of the speech recognition transcripts and, under one approach, created models for each label by exploiting an external text resource (Wikipedia) and, under a second approach, exploited semantic distance between the label and the videos that was based on Named Entities and WordNet synonyms.

Wartena, C. Using a Divergence Model for MediaEval’s Tagging Task (Professional Version).

Hernandez-Aranda, D., Granados, R., Cigarran, J., Rodrigo, A., Fresno, V. and Garcia-Serrano, A. UNED at MediaEval 2010: exploiting text metadata for Automatic Video Tagging.

Perea-Ortega, J.M., Montejo-Raez, A., Diaz-Galiano, M.C. and Martin-Valdivia, T.
SINAI at Tagging Task Professional in MediaEval 2010.

Task coordinator: Martha Larson, Delft University of Technology
(m.a.larson at tudelft at nl)