The 2015 Placing Task: Multimodal geo-location prediction

Now Available: Check our the Placing Task 2015 Leaderboard

The Placing Task requires participants to estimate the locations where multimedia items (photos or videos) were captured solely by inspecting the content and metadata of these items, and optionally exploiting additional knowledge sources such as gazetteers.

This year we bring the task into the realm of human geography by predicting the places where the photos and videos were taken in terms of neighborhoods, cities, etc; as well as into the domain of human mobility by estimating any missing locations given a sequence of photos taken in the same city. The Placing Task integrates all aspects of multimedia - text, audio, photo, video, location, time, users and context.

Sub-Tasks
The following subtasks are offered:
  1. Locale-based placing task: participants are given a hierarchy of places across the world, ranging from neighborhoods to continents, and are asked to pick a node (i.e. place) from the hierarchy in which they most confidently believe the photo or video was taken. While the ground truth locations of the photos and videos will be associated with the most accurate nodes (i.e. leaves) in the hierarchy, the participants can express a reduced confidence in their location estimates by selecting nodes at higher levels in the hierarchy. If their confidence is sufficiently high, participants may naturally directly estimate the geographic coordinate of the photo/video instead of choosing a node from the hierarchy.
  2. Mobility-based placing task: participants are given a sequence of photos taken in a city by the same user, of which not all photos are associated with a geographic coordinate (e.g., the user took some photos when GPS was temporarily unavailable). The participants are asked to predict the locations of those photos with missing coordinates.

Target group
The task is of interest to researchers in the area of geographic multimedia information retrieval, social media, human mobility, and media analysis.

Data
The dataset will be a subset of the YFCC100M collection. We will only include those photos and videos that are taken within any of the GADM boundaries, supplemented with neighborhood data for several cities; photos taken in international waters or international airspace will therefore be excluded, since these are generally challenging to accurately predict anyway.

We will provide several visual, aural and textual features to the participants so they can focus on solving the task rather than spending time on reinventing the wheel.

Ground truth and evaluation
For the Locale-based and Mobility-based placing tasks the evaluation of runs submitted by participating groups will be similar to last year. One important difference is that this year we measure the distances between the predicted and the actual geographic coordinates using Karney’s algorithm; this algorithm is based on the assumption that the shape of the Earth is an oblate spheroid, which therefore produces more accurate distances than methods such as the great-circle distance that assume the shape of the Earth is a sphere. For the Locale-based placing task the evaluation will be performed based on a hierarchical distance metric between nodes in the hierarchy.

We will provide several baseline methods (source code + performance evaluation) to the participants so they have a starting point.

Leaderboard
We will have a running leaderboard system, where participants can submit up to two runs a day and can view their relative standing towards others, as evaluated on a representative development set (i.e,. not the actual test set). Participants are not required to submit their runs to the leaderboard, and may hide their identity if they so desire.

Recommended reading
[1] Hays, J., Efros, A. A. “IM2GPS: Estimating Geographic Information from a Single Image”. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference, 2008.

[2] Cao, L., Yu, J., Luo, J., Huang, T. “Enhancing Semantic and Geographic Annotation of Web Images Via Logistic Canonical Correlation Regression”. In Proceedings of the ACM International Conference on Multimedia, 2009, pp. 125-134.

[3] Yin, Z., Cao, L., Han, J., Zhai, C., Huang, T. “Geographical Topic Discovery and Comparison”. In Proceedings of the ACM International Conference on World Wide Web, 2011, pp. 247-256.

[4] Larson, M., Soleymani, M., Serdyukov, P., Rudinac, S., Wartena, C., Murdock, V., Friedland, G., Ordelman, R., Jones, G. J.F. “Automatic Tagging and Geotagging in Video Collections and Communities”. In Proceedings of the ACM International Conference on Multimedia Retrieval, 2011, pp. 51-54.

[5] Luo, J., Joshi, D., Yu, J., Gallagher, A. “Geotagging in Multimedia and Computer Vision - A Survey”. In Springer Multimedia Tools and Applications, Special Issue: Survey Papers in Multimedia by World Experts, 51(1), 2011, pp. 187–211.

[6] Van Laere, O., Schockaert, S., Dhoedt, B. “Finding locations of Flickr resources using language models and similarity search”. In Proceedings of the ACM International Conference on Multimedia Retrieval, 2011, article 48.

[7] Penatti, O.A.B., Li, L. T., Almeida, J., Torres, R. da S. “A Visual Approach for Video Geocoding using Bag-of-Scenes”, In Proceedings of the ACM International Conference on Multimedia Retrieval. ACM, 2012, article 53.

[8] Choi, J., Lei, H., Ekambaram, V., Kelm, P., Gottlieb, L., Sikora, T., Ramchandran, K., Friedland, G. “Human vs. Machine: Establishing a Human Baseline for Multimodal Location Estimation”. In Proceedings of the ACM International Conference on Multimedia, 2013, pp. 866-867.

[9] Kelm, P., Schmiedeke, S., Choi, J., Friedland, G., Ekambaram, V., Ramchandran, K., Sikora, T. “A Novel Fusion Method for Integrating Multiple Modalities and Knowledge for Multimodal Location Estimation”. In Proceedings of the ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia, 2013, pp. 7-12.

[10] Trevisiol, M., Jégou, H., Delhumeau, J., Gravier, G. “Retrieving Geo-location of Videos with a Divide & Conquer Hierarchical Multimodal Approach”. In Proceedings of the ACM International Conference on Multimedia Retrieval, 2013.

Task organizers
Bart Thomee (Yahoo Labs, San Francisco, CA, USA)
Olivier Van Laere (Blueshift Labs, San Francisco, CA, USA)
Claudia Hauff (TU Delft, Netherlands)
Jaeyoung Choi (ICSI, Berkeley, CA, USA / TU Delft, Netherlands)

Task schedule
1 May 2015: Data (development + test) released.
7 August 2015: Run submission deadline.
14 August 2015: Run results released.
28 August 2015: Working notes paper deadline.
14-15 September 2015: MediaEval 2015 workshop.