Diverse Images

The 2014 Retrieving Diverse Social Images Task
This task is a follow-up of the 2013 edition. It addresses the problem of result diversification in social photo retrieval.

We use a tourist use case where a person tries to find more information about a place she is potentially visiting. The person has only a vague idea about the location, e.g., knowing the name of the location. She uses the name to learn additional information about the location from the Internet, for instance by visiting a Wikipedia page, e.g., getting a photo, the geographical position of the place and basic descriptions. Before deciding whether this location suits her needs, the person is interested in getting a more complete visual description of the place.

The participating systems are expected, given a ranked list of location photos retrieved from Flickr using text information, to refine the results by providing a set of images that are in the same time relevant, e.g., depict partially or entirely the target location, and provide a diversified summary, e.g., around 50 images that depict different views of the location at different times of the day/year and under different weather conditions, creative views, etc. Initial results are typically noisy and redundant.

The refinement and diversification process will be based on the social metadata associated with the images and on the visual characteristics of the images. In particular, this year we provide information about user annotation credibility. Credibility is determined as an automatic estimation of the quality (correctness) of a particular user's tags. Participants are allowed to exploit this credibility estimation or to compute their own approach, in addition to classical retrieval techniques. A specifically designed dataset will be used to train this measure and will be provided to the participants.

Target group
Target participants are researchers in the areas of information retrieval (text, vision and multimedia communities), re-ranking, relevance feedback, crowd-sourcing, automatic geo-tagging (but not limited to).

Data
The dataset will be constructed around the last year dataset and consists of around 200 locations from around the world, ranging from very famous ones, e.g., Eiffel Tower, to lesser known ones, e.g., Palazzo delle Albere. For each location, we provide a ranked list of photos of various qualities with a Creative Commons license allowing redistribution, together with their associated metadata (mainly social data) retrieved from Flickr (ca. 300 photos per location). To serve as reference information, each location is accompanied by a few representative photos and a location description from Wikipedia. To encourage participation of groups from different communities, resources such as general-purpose visual descriptors and text models will be provided for the entire collection. In addition to those, estimation of user annotation credibility is to be learned a priori from user data and provided to participants.

The dataset consists of a development dataset (devset, ca. 30 locations - to be used for training/tuning the methods), a testing dataset (testset, ca. 100 locations - for final evaluation) and an additional dataset used to train the credibility descriptors (credibilityset, ca. 300 locations and around 1,000 users, with at least 50 images per user). Overall, for the entire dataset we target a total of 95,000 photos (45,000 for devset and testset and 50,000 for credibility estimation).

Ground truth and evaluation
All the images from the devset and testset are to be annotated in terms of relevance to the query and diversity. All the images from the credibility set are to be annotated only for relevance. Annotations are to be carried out by expert annotators with advanced knowledge of the location characteristics (mainly learned for Internet sources). Diversity annotation will mainly consist in regrouping visually similar images into clusters (up to 20-25 clusters). Only relevant images are to be considered.

System performance is to be assessed in terms of Cluster Recall at X (CR@X) — a measure that assesses how many different clusters from the ground truth are represented among the top X results (only relevant images are considered), Precision at X (P@X) — measures the number of relevant photos among the top X results and F1-measure at X defined as the harmonic mean of the previous two. Various cut off points are to be considered, e.g., X=5,10,20,30,40,50. Official ranking metrics will be the F1-measure (e.g., @20 images), which gives equal importance to diversity (via CR@20) and relevance (via P@20). This metric simulates the content of a single page of a typical Web image search engine and reflects user behaviour, i.e., inspecting the first page of results in priority.

Recommended reading
[1] Ionescu, B., Menéndez, M., Müller, H., Popescu, A. Retrieving Diverse Social Images at MediaEval 2013: Objectives, Dataset and Evaluation. In Proceedings of MediaEval Benchmarking Initiative for Multimedia Evaluation, CEUR-WS.org, 1043, ISSN: 1613-0073, 2013.

[2] Ionescu, B., Radu, A.-L., Menéndez, M., Müller, H., Popescu, A., Loni, B. Div400: A Social Image Retrieval Result Diversification Dataset. In Proceedings of ACM MMSys International Conference on Multimedia Systems. ACM, Singapore, Singapore, 2014, 29-34.

[3] Paramita, M. L., Sanderson, M., Clough, P. Diversity in Photo Retrieval: Overview of the ImageCLEF Photo Task 2009. In Proceedings of ACM ImageCLEF International Conference on Cross-Language Evaluation Forum: Multimedia Experiments. ACM, Springer-Verlag Berlin, Heidelberg, 2009, 45-59.

[4] Popescu, A., Grefenstette, G. Social Media Driven Image Retrieval. In Proceedings of ACM ICMR International Conference on Multimedia Retrieval. ACM, Trento, Italy, 2011.

[5] Radu, A.-L., Ionescu, B., Menéndez, M., Stöttinger, J., Giunchiglia, F., De Angeli, A. A Hybrid Machine-Crowd Approach to Photo Retrieval Result Diversification. In Proceedings of International Conference on MultiMedia Modeling. LNCS 8325, 2014, 25-36.

[6] Rudinac, S., Hanjalic, A., Larson, M. A Generating Visual Summaries of Geographic Areas Using Community-Contributed Images. IEEE Transactions on Multimedia, 15(4), 2013, 921-932.

[7] Taneva, B., Kacimi, M., Weikum, G. Gathering and Ranking Photos of Named Entities with High Precision, High Recall, and Diversity. In Proceedings of ACM WSDM International Conference on Web Search and Data Mining. ACM, New York, USA, 2010, 431-440.

[8] van Leuken, R. H., Garcia, L., Olivares, X., van Zwol, R. Visual Diversification of Image Search Results. In Proceedings of ACM WWW International Conference on World Wide Web. ACM, Madrid, Spain, 2009, 341-350.

Task organizers
Bogdan Ionescu, LAPI, University Politehnica of Bucharest, Romania,
Adrian Popescu, CEA LIST, France,
Mihai Lupu, Vienna University of Technology, Austria,
Henning Müller, University of Applied Sciences Western Switzerland (HES-SO) in Sierre, Switzerland.

Task auxiliaries
Alexandru Lucian Gînscă, CEA LIST, France,
Adrian Iftene, Faculty of Computer Science, Alexandru Ioan Cuza University, Romania.

Many thanks to (by alphabetic order): Bogdan Boteanu, Ioan Chera,
Ionuț Duță, Andrei Filip, Corina Macovei, Cătălin Mitrea, Ionuț
Mironică, Irina Emilia Nicolae, Andrei Purică, Mihai Pușcaș, Oana
Pleș, Gabriel Petrescu, Anca Livia Radu, Vlad Ruxandu.

Task schedule
1 May: Development data release
2 June: Test data release
8 September: Run submission due
15 September: Results returned
22 September: Working notes paper deadline (updated deadline)

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context