2005: Meta Data Extraction from Linguistic Meeting Transcripts for the Annodex File Format

Claudia Schremmer and Silvia Pfeiffer, “Meta Data Extraction from Linguistic Meeting Transcripts for the Annodex File Format”, IEEE Computer Society, Proceedings for the 11th Intl. Conference on Multi-Media Modelling January 2005, Melbourne, Australia pp. 405 – 412.


Semantic interpretation of the data distributed over the Internet is subject to major current research activity. The Continuous Media Web (CMWeb) extends the World Wide Web to time-continuously sampled data such as audio and video in regard to the searching, linking, and browsing functionality. The CMWeb technology is based the file format Annodex which streams the media content interspersed with markup in the Continuous Media Markup Language (CMML) format that contains information relevant to the whole media file, e.g., title, author, language as well as time-sensitive information, e.g., topics, speakers, time-sensitive hyperlinks. The CMML markup may be generated manually or automatically. This paper investigates the automatic extraction of meta data and markup information from complex linguistic annotations, which are annotated recordings collected for use in linguistic research. We are particularly interested in annotated recordings of meetings and teleconferences and see automatically generated CMML files and their corresponding Annodex streams as one way of viewing such recordings. The paper presents some experiments with generating Annodex files from hand-annotated meeting recordings.

