Interspeech 2015 Special Session CfP: Synergies of speech and multimedia technologies

Synergies of Speech and Multimedia Technologies
Call for Papers
Interspeech 2015 Special Session
Deadline: 20 March 2015

Speech processing and multimedia analysis are highly active research areas, exploring both fundamental research questions and development of applications. While the types of research questions and the applications investigated often overlap, the research communities behind speech processing and multimedia analysis are largely separate, at best using each others work as ‘black boxes’.

This session seeks to bring the speech and multimedia analysis research communities together to report current work of mutual interest and to explore potential synergies and opportunities for creative collaborations with the aim to advancing both research in speech and multimedia technologies.

Nowadays, a growing amount of multimedia content is produced using mobile devices as well as professional equipment, and is shared or stored online as multimedia archives. Different research directions use these data to develop and improve speech or video processing technologies in parallel. Integration of modalities, however, potentially provides a better reflection of the nature of the content, which consists of multiple sources of information.

Much of the research of the multimedia research community involves exploring new scenarios and frameworks for multimedia applications in which speech technologies play a crucial role. However, these are typically based on standard speech transcriptions tools, and there is no consideration of well founded integration of speech processing with multimedia analysis. Realizing the full potential of speech technologies in these applications requires multimedia technologists and speech researchers to understand each others work, and to seek novel means of integrating their work.  From the speech perspective this requires understanding of the fundamentals of  human speech production and perception and how these can be related to multimedia data and applications. Beyond speech and multimedia, the current development and growth of general mobile applications involving visual and context based signals is creating opportunities for synergistic speech research, while applications involving speech in robotics also offer the potential for creative research in multimodal speech processing.

The list of topics of interest includes (but is not limited to):

  • Navigation in multimedia content using advanced speech analysis features
  • Large scale speech and video analysis
  • Multimedia content segmentation and structuring using audio and visual features
  • Multimedia content hyperlinking and summarization
  • Natural language processing for multimedia
  • Multimodality-enhanced metadata extraction, e.g. entity extraction, keyword extraction, etc.
  • Generation of descriptive text for multimedia
  • Multimedia applications and services using speech analysis features
  • Affective and behavioural analytics based on multimodal cues
  • Audio event detection and video classification
  • Multimodal speaker identification and clustering
Important dates:
20 Mar 2015 paper submission deadline
01 Jun 2015 paper notification of acceptance/rejection
10 Jun 2015 paper camera-ready
20 Jun 2015 early registration deadline
6-10 Sep 2015
Interspeech 2015 in Dresden, Germany

Submission takes place via the general Interspeech submission system.

  • Maria Eskevich, Communications Multimedia Group, EURECOM, France
  • Robin Aly, Database Management Group, University of Twente, The Netherlands
  • Roeland Ordelman, Human Media Interaction Group, University of Twente, The Netherlands
  • Gareth J.F. Jones, CNGL Centre for Global Intelligent Content, Dublin City University, Ireland

Pasted Graphicisca_sig_slim