The MusiClef 2013: “Soundtrack Selection for Commercials” task aims at analyzing music usage in TV commercials and determining music that fits a given commercial video. Usually, music consultants select a song to advertise a particular brand or a product. The MusiClef benchmarking activity, in contrast, aims at making this process automated by taking into account both context- and content-based information about the video, the brand, and the music. This is a challenging task, in which multimodal information sources should be considered, which do not trivially connect to each other.
General Task description
Given a TV commercial, participants are required to predict the most suitable music soundtrack from a list of candidate songs. The setup is as follows:
- Participants will initially receive a multimodal dataset (‘the development dataset’) with existing commercials and their corresponding music tracks, involving both context- and content-based information, such as audio features, visual features, and text features (meta-data, web pages, social tags related to brands, products, artists and songs).
- Then, another dataset is released (‘the test dataset’), involving a set of commercial videos represented by visual features and supporting information relating to the respective brands/products. However, this set does not contain the audio track of the original commercial. Next to this, a music dataset is released with audio and social features for a separate, larger set of songs. The participants’ task is then, for each of the commercial videos, to produce a ranked list of songs which fits well to the commercial video. It should be noted that the songs do not need to be synchronized to the video.
- Finally, the produced rankings will be evaluated based on human evaluator assessment.
The MusiClef task is particularly targeted at researchers in the broad areas of multimedia and music information retrieval. Additionally, the task also is expected to be interesting for researchers with specific interest in culture- and community-specific aspects in mass media.
The development set provided by the organizers includes YouTube links to approximately 600 commercial videos, with metadata on the commercial, video features of the commercial, web pages about the respective brands and music artists, social tags on the used music, and low-level and high-level music/audio features (for the audio track in the commercials and, where available, of the original music songs, as identified by an audio fingerprinting algorithm and verified by the organizers).
The test set provided by the organizers will include video features, brand metadata and pages about the respective brands of approximately 50 commercial videos (without references to the original audio of the commercials). Next to this, the test set also includes low- and high-level music features, social tags on music, and web pages about music artists for a dataset of approximately 5000 popular music files from a broadcasting company database.
Participants are allowed, and even encouraged, to build and apply their own feature extraction algorithms to the datasets.
Ground truth and evaluation
Due to the intrinsic difficulty and multimodality of the task, evaluation must involve users. An evaluation experiment is foreseen in which human evaluators (either recruited through crowdsourcing or through a more traditional lab experiment) will be provided with the original video stream of a test video, muted so the original audio is not present, and without any references to the original soundtrack. Next to this, pooled audio music track results of the submitted algorithms will be provided, as well as the music track belonging to the original soundtrack, in a random order. The evaluators will then be asked to rate the suitability of the different provided music tracks to the commercial video on a 5-point Likert scale. The exact temporal scope of the audio tracks (full tracks vs. dedicated excerpts) is still to be discussed with the task participants.
[The extended abstracts connected to the ‘Automatic Music Video generation’ Grand Challenge at ACM Multimedia 2012: http://dl.acm.org/citation.cfm?id=2393347]
[Liem et al, 2013]
When music makes a scene - Characterizing music in multimedia contexts via user scene descriptions, International Journal of Multimedia Information Retrieval, March 2013, Volume 2, Issue 1, pp 15-30
[Urbano and Schedl, 2013]
Minimal Test Collections for Low-Cost Evaluation of Audio Music Similarity and Retrieval Systems, International Journal of Multimedia Information Retrieval March 2013, Volume 2, Issue 1, pp 59-70.
Cynthia C.S. Liem, Delft University of Technology, Netherlands
Nicola Orio, University of Padua, Italy
Markus Sched,l Johannes Kepler University Linz, Austria
Geoffroy Peeters, UMR STMS IRCAM-CNRS, Paris, France
Note that this task is a "Brave New Task" and 2013 is the first year that it is running in MediaEval. If you sign up for this task, you will be asked to keep in particularly close touch with the task organizers concerning the task goals and the task timeline.
Task schedule (updated)
September 1: Run submissions
September 21: Crowdsourcing evaluation
September 24: Release of results
September 31: Working notes papers