Diverse Images

The 2015 Retrieving Diverse Social Images Task
This task addresses the problem of image search result diversification in the context of social media.

We use a tourist use case scenario where a person tries to find more information about a place she is potentially visiting. To decide whether this place is worth visiting, the person is interested in getting a complete visual summary of that location.

Participants are required to develop algorithms to automatically refine a list of images that has been returned by a social image search engine (Flickr) in response to a query. The refined list should be both relevant to the query and also diverse. To carry out the refinement and diversification tasks, participants may use social metadata associated with the images, the visual characteristics of the images, information related to user tagging credibility (an estimation of the global quality of tag-image content relationships for a user’s contributions) or external resources (e.g., Internet).

This task is a follow-up of the 2013 and 2014 editions. As a novelty this year, are multi-concept queries related to events and states associated with locations, e.g., "Oktoberfest in Munich", "Bucharest in winter", etc. This are in addition to new location queries.

Target group
Researchers will find this task interesting if they work in either machine-based or human-based media analysis, including areas such as: image retrieval (text, vision, multimedia communities), re-ranking, machine learning, relevance feedback, natural language processing, crowdsourcing and automatic geo-tagging (but not limited to).

Data
The dataset consists of redistributable Creative Commons information about location related queries (one- and multi- concept). Locations are selected around the world. Each query is represented with up to 300 Flickr photos and their associated social metadata. For one-concept queries we provide also Wikipedia web pages and photos from Wikipedia. In addition, estimation of user tagging credibility is to be learned a priori from user data and provided to participants.

The dataset consists of around 150 one-concept queries and ~45,000 images for development and 140 queries (70 one-concept - 70 multi-concept) and ~42,000 images for testing. An additional dataset used to train the credibility descriptors will provide information for ~3,000 Flickr users and metadata for more than 16M images.

To encourage participation of groups from different communities, resources such as general purpose visual descriptors, text models and credibility-based descriptors will be also provided for the entire data.

Ground truth and evaluation
All the images are to be annotated in terms of relevance to the query and diversity. Annotations are to be carried out by expert annotators. Relevance annotation will consist of yes/no annotations (including the “don’t know” option). Input from different annotators is aggregated with majority voting schemes. Diversity annotation will mainly consist of regrouping visually similar images into clusters (up to ~25 clusters per query). Each image cluster is provided with a short text description that justifies its choice. Naturally, only relevant images are annotated for diversity.

System performance is to be assessed in terms of Cluster Recall at X (CR@X) — a measure that assesses how many different clusters from the ground truth are represented among the top X results (only relevant images are considered), Precision at X (P@X) — measures the number of relevant photos among the top X results and F1-measure at X is defined as the harmonic mean of the previous two. Various cutoff points are to be considered, e.g., X=5,10, 20, 30, 40, 50.

Official ranking metrics will be the F1-measure @20 images, which gives equal importance to diversity (via CR@20) and relevance (via P@20). This metric simulates the content of a single page of a typical Web image search engine and reflects user behaviour i.e., inspecting the first page of results in priority.

Recommended reading
[1] Ionescu, B., Popescu, A., Lupu, M., Gînscă, A.L., Boteanu, B., Müller, H., Div150Cred: A Social Image Retrieval Result Diversification with User Tagging Credibility Dataset. In Proceedings of ACM MMSys International Conference on Multimedia Systems. ACM, Portland, Oregon, USA.

[2] Ionescu, B., Popescu, A., Lupu, M., Gînscă, A.L., Müller, H., Retrieving Diverse Social Images at MediaEval 2014: Challenge, Dataset and Evaluation. In Proceedings of MediaEval Benchmarking Initiative for Multimedia Evaluation, CEUR-WS.org, 1263, 2014.

[3] Ionescu, B., Popescu, A., Radu, A.-L., Müller, H., Result Diversification in Social Image Retrieval: A Benchmarking Framework. In Multimedia Tools and Applications, 2014.

[4] Paramita, M. L., Sanderson, M., Clough, P., Diversity in Photo Retrieval: Overview of the ImageCLEF Photo Task 2009. In Proceedings of ACM ImageCLEF International Conference on Cross-Language Evaluation Forum: Multimedia Experiments. ACM, Springer-Verlag Berlin, Heidelberg, 2009, 45-59.

[5] Rudinac, S., Hanjalic, A., Larson, M.A., Generating Visual Summaries of Geographic Areas Using Community-Contributed Images. In IEEE Transactions on Multimedia, 15(4), 2013, 921-932.

[6] Taneva, B., Kacimi, M., Weikum, G., Gathering and Ranking Photos of Named Entities with High Precision, High Recall, and Diversity. In Proceedings of ACM WSDM International Conference on Web Search and Data Mining. ACM, New York, USA, 2010, 431-440.

[7] van Leuken, R. H., Garcia, L., Olivares, X., van Zwol, R., Visual Diversification of Image Search Results. In Proceedings of ACM WWW International Conference on World Wide Web. ACM, Madrid, Spain, 2009, 341-350.

Task organizers
Bogdan Ionescu, LAPI, University "Politehnica" of Bucharest, Romania (contact person),
Adrian Popescu, CEA LIST, France,
Mihai Lupu, Vienna University of Technology, Austria,
Henning Müller, University of Applied Sciences Western Switzerland in Sierre, Switzerland.

Task auxiliaries
Alexandru Lucian Gînscă, CEA LIST, France,
Bogdan Boteanu, LAPI, University Politehnica of Bucharest, Romania.

We also acknowledge the precious help of the task supporters (in alphabetic order): Ioan Chera, Ionuț Duță, Andrei Filip, Florin Guga, Tiberiu Loncea, Corina Macovei, Cătălin Mitrea, Ionuț Mironică, Irina Emilia Nicolae, Ivan Eggel, Andrei Purică, Mihai Pușcaș, Oana Pleș, Gabriel Petrescu, Anca Livia Radu, Vlad Ruxandu, Gabriel Vasile.

Task schedule
4 May: Development data release,
1 June: Test data release,
14 August: Run submission due,
17 August: Results returned to participants,
23 August: Working notes paper initial deadline (updated),
28 August: Working notes paper camera ready deadline,
14-15 September: MediaEval 2015 Workshop.

This task is supported by CHIST-ERA FP7 MUCKE - Multimodal User Credibility and Knowledge Extraction project.

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context