roe — ginger's thoughts

Those who know me well know that a few years ago (in fact, almost 10 years now) we developed the Annodex set of technologies at the CSIRO in a project called “Continuous Media Web”.

The idea was to make time-continuous data (read: audio and video) a integral part of the Web. It would be possible to search for media through standard search engines. It would be possible to link into and out of media as we link into and out of Web pages. It would be possible to mash up video from different Web servers into a single media stream just like we are able to mash up images, text and other Web resources from different Web servers.

As you are all aware, we have made huge steps towards this vision in the last 10 years. We now have what is called “universal search” - search engines like Google and Yahoo don’t return only links to HTML pages any longer, but return links to videos and images just as well.

But it doesn’t go far enough yet - even now we still cannot link into a long-form video to the right fragment that has the exact context of what we have been searching for.

In the Annodex project we implemented a working version of such a deep universal search engine in the year 2003 on top of the Panoptic search engine (a enterprise search engine developed by CSIRO, later spun out and now sold as Funnelback).

The basis for our implementation was the combination of specifications that we developed around Ogg:

An extension on Ogg that allows to create valid Ogg streams from subparts of Ogg streams - this is now part of Ogg as Skeleton.
A means of annotating Ogg streams with time-aligned text that could be interleaved with the Ogg media stream to produce streams that knew more about themselves - the format was called CMML for Continuous Media Markup Language.
And an extension to the URI addressing of Ogg streams using temporal URIs.

I am very proud that in the last 2 years, the development of a generic media fragment URI addressing approach has been taken up by the W3C and Conrad Parker and I are invited experts on the Working Group.

I am even more proud that the Working Group has just published a First Public Working Draft of a document called “Use cases and requirements for Media Fragments”. It contains a large collection of examples for situations in which users will want to make use of media fragments. It defines that the key dimensions of fragmentation that need to be specified are:

Temporal fragmentation
Spatial fragmentation
Track fragmentation
Name fragmentation

Beyond mere use cases and requirements, the document also contains a survey of technologies that address multimedia fragments.

In a first step towards the development of a Media Fragments W3C Recommendation, this document also discusses a proposed syntax for media fragment URI addressing and proposes different processing approaches. These sections will eventually be moved into the recommendation and are the most incomplete sections at this point.

To explain some of the approaches that are being proposed in more detail, here are some examples of media fragment URIs that are proposed through this WD:

http://www.example.com/example.ogv#t=10s,20s - addresses the fragment of example.ogv that lies between the 10s and the 20s offset
http://www.example.com/example.ogv#track='audio' - addresses the track called “audio” in the example.ogv file
http://www.example.com/example.ogv#track='audio'&t=10s,20s - addresses the track called “audio” on the subpart between the 10s and 20s offset in the example.ogv file
http://www.example.com/example.ogv#xywh=pixel:160,120,320,240 - addresses the example.ogv file but with a video track cut to a region of the size 320x240px positioned at 160x120px offset
http://www.example.com/example.ogv#id='chapter-1' - addresses the named fragment called “chapter-1” which is specified through some mechanism, e.g. Kate or CMML in Ogg

Note that the latter example works only if the encapsulation format provides a means of specifying a name for a fragment. Such a means is e.g. available in QuickTime through chapter tracks, or in Flash through cuepoints.

We know from our experience with Ogg that temporal fragmentation can be realized. For track addressing it is possible to use the recently developed ROE specification. The id tags used there could be included into Skeleton and then be used to address tracks by name. What concerns spatial fragmentation on Ogg Theora - I don’t think it can be achieved for an arbitrary rectangular selection without transcoding.

The next tasks of the Working Group are in creating implementations for these specifications on diverse formats and thus finding out which processes work the best.

ginger's thoughts

Tag: roe

First draft of a new media fragment URI addressing standard