Category Archives: Open Source

Metadata and Ogg

I am really excited about the huge progress we made at FOMS with metadata and Ogg. The metadata specifications are actually not Ogg-specific – only their mapping into Ogg is. Here are the things that I expect will make for a very structured and sensible distributed handling of metadata on the Web.

At FOMS, we started improving CMML and are now specifying the next version of CMML. CMML is a timed text description language that can easily be multiplexed alongside audio or video data. It is very flexible with its fields and satisfies needs for hypermedia, captions, annotations and other time-aligned text. We took out the Ogg dependencies and it can now be used in any media container format. The specification is now also in an XML schema rather than a DTD, which enables us to reuse modules from XHTML and make it generally more extensible.

We introduced ROE, a description language (or a “manifest”) for multitrack media files. It describes media tracks and their dependencies and thus goes much further than the old stream and import elements in CMML, that now have been deprecated.

ROE can be used to author multitrack media files – in the Ogg case to author Ogg files with a Skeleton track and multiple media tracks. We are in the process of extending Skeleton to incorporate the description of dependencies between logical bitstreams. To complete this, we will be creating a description of how to map ROE into Ogg/Skeleton and vice versa.

ROE can also be used to negotiate with a Web client what media streams to send from the complete manifest that is available on the server. For example, a Web client could request the German sound track with a movie rather than the default English one, and to add English subtitles. This requires a small protocol for negotiation, which can easily be build using Web infrastructure. We are introducing some new HTTP request/response parameters and specific URLs, such as e.g. http://example.com/movie.ogg?track=V1,A2,TT2.

The set of ROE, Skeleton, CMML, and the HTTP and URI specifications will enable a very structured means of interacting with metadata-rich video on the Web. It will be distributed and integrated into the Web infrastructure, much like the Annodex set of technologies already is today.

Since I am also a business owner aside of being an open media enthusiast, let me add that I expect it to have a huge impact on online business around audio and video, enabling business processes and business models that are not possible today. Watch this space!

The greatest gathering of open media sw developers

When I started organising the first FOMS (Foundations of open media software developers workshop) in 2007, I did it because I saw a need to have media hackers get together in a room and discuss stuff in person. Email, irc, svn, bugzilla and wikis only get you a certain distance for collaboration. But no distance communication tool can replace the energy and creative spirit that is created through an in-person meeting and the ability to have a beer together in the evening. Discussions are more intense, impossibilities are identified faster, progress is amazing – and the energy will last and have an impact on the community for months to come after the event.

FOMS 2007 was great in that respect, because some 25 hackers got to know each other for the first time, friendships were formed, trust was built and new ideas (speaking: new code) was created. It was awesome and gave me the motivation to go and organise FOMS 2008. At this point let me express my gratitude to the organising committees of both FOMS 2007 and FOMS 2008 for the support they have given me to organise both workshops and hope they will help again next year in Tasmania.

So then FOMS 2008 took place and what can I say!? It totally blew me away. For me it was a much better experience than the year before because I didn’t also organise the video recordings at LCA. I was therefore more relaxed, got involved in design discussions, and was able to sit down during the week after FOMS at LCA and actually interact with people. On a side note here: Thanks so much to Donna Benjamin, the main organiser of LCA 2008, for getting the FOMS participants a room to ourselves where we were able to gather and get an awesome whole lot of work done.

Nearly the whole Xiph community was at FOMS and issues that have been brewing for years were tabled and discussed. A large number of audio hackers were there, too, and the issue of a standard sound APIs got some heated discussion. There’s a press release and the proceedings of the FOMS discussions up on the FOMS 2008 website, where you can make yourself a complete picture of all the issues that were discussed.

In addition to FOMS, Conrad Parker and I had also organised a Multimedia Miniconf at LCA. It was a great place to communicate some of the outcomes of FOMS and to present some of the latest developments in open media software in the Linux community. Video proceedings are available on the site.

Overall I must say that January has become the highlight of my year in open media software.

Quick links to Ogg-related W3C video Workshop papers

Michael Dale: Metavid & Free Online Video (University Of California at Santa Cruz)

Chris Double: Position Paper for the W3C Video On The Web Workshop (Mozilla Corporation)

Håkon Wium Lie: Opera Software’s position paper for Video on the Web (Opera Software)

Silvia Pfeiffer: Architecture of a Video Web – Experience with Annodex (Annodex Association)

Silvia Pfeiffer: Hyperlinking to time offsets: The temporal URI specification (Annodex Association)

Native javascript support for annotated and indexed media in Web browsers

Many people wonder what the future of video on the Web should be and want a more integrated and simpler video solution than what flash provides right now.

The W3C and WHATWG’s move towards a video element in HTML5 is a good first step.

However, it is not enough.

At the recent W3C’s video workshop, I realised that people’s requirements and expectations go far beyond what the HTML5 spec is currently providing. And most of those requirements can be satisfied with the Annodex technologies. But it will need a lot of explaining, documenting and demonstrating to show that Annodex provides these solutions in a simple, yet comprehensive manner. And what’s more: any technology developed to satisify the requirements will need to take on board many of the design decisions that we made for Annodex, so I hope, whatever will be the next Video Web technology, we can provide our input.

The most fundamental point to understand is that you cannot create a solution for video webs without considering all aspects of handling video on the Web in an integrated fashion. This includes topics such as the URI addressing scheme, seeking and indexing of video, the metadata and annotation scheme, and how all of this fits together with the binary video data and Web servers. Let me repeat: these topics have to be addressed together and not as separate projects, because they influence each other!

Apart from Annodex, no other existing or suggested video technology for the Web brings together all the required facets to really solve the big picture – and that includes video metadata specifications, hyperlinking approaches, codecs etc.

Having said all of this, let me demonstrate to you what I mean by full integration.

Shane Stephens has been coding on a library that brings native Annodex support into Web browsers (called liboggplay) and has provided me with a video that demonstrates what you can do as a programmer once your Web browser understands Annodex. Take note of the integrated use of annotations. And also of the simplicity of URI addressing. And the use of an adapted Web server.

Javascript video API liboggplay

The video is available in Ogg Theora format and on YouTube.

About baseline video codecs and HTML5

[I wrote this more than 8 months ago, but didn’t want to publish it at the time because I want us to solve the issues around video in HTML5 and not fight each other. But I’ve made some changes and I’m now ready to have it published.]

There’s a clash of ecosystems happening at the WHATWG mailing list around the need for the specification of a baseline codec for a future <video> tag in HTML.

The clash is mostly between the open community which want Ogg Theora as a recommended baseline codec and big vendors (Apple & Nokia), which wanted that recommendation taken out. They claim that such a recommendation has nothing to do in a HTML standard, which should specify tags but not recommend external file formats. From one perspective, I agree – some things are better left to the software engineers to decide and left open to the market. However, in this particular instance, I think it would be a big mistake not to specify a baseline video codec. In fact, it would in my mind make the whole move to a new HTML5 standard an irrelevant exercise.

Let’s look at history and play a mind game on the consequences of such a decision.

Around the turn of the century we had a wonderfully diverse situation: we had RealMedia, QuickTime and WindowsMedia all being video formats that people expected to find on the Internet and to stream video. It most certainly made business sense to the involved companies! However, it made no business sense to Web developers and media content producers. They had to set up a transcoding and streaming infrastructure for all these three formats in parallel if they were wanting to reach all their potential clientele. I have actually seen this happening here in Australia at the ABC, which has a mandate to serve all the Australian people and therefore had to provide video in all potential formats. I remember the pain that was written across the faces of the infrastructure people.

A few years fast forward and the ABC can now give sighs of relief: supporting Adobe Flash, they can do away with all this expensive and support-intensive infrastructure and just support one codec.

Another story from the past to keep in mind is the story of PNG and GIF http://www.libpng.org/pub/png/pnghist.html where the collecting of royalties on the GIF codec started the creation of the open and free PNG format, which became a W3C recommendation in 1996 (see http://www.w3.org/Press/PNG-PR.en.html). TBL states in there “We are seeing more of our Members adopt the format and are helping make it the industry standard.”

With these in mind, let’s try and project into the future.

Assuming we do not provide a baseline codec in the spec, what will happen is that we will see each browser adopt support for the codec that “makes business sense”, i.e. Microsoft will support WindowsMedia, and Apple will support QuickTime, while the rest will be looking for a “cheaper” codec which could e.g. be MPEG-1 or Ogg Theora. Or stated differently: we will end up with the same situation that we had around 2001 with streaming codecs, except that Web developers and content owners still have the choice of Flash through the object/embed tag. Who will we confuse? The consumers who will be wanting to create their own content and publish it online. They will want a free and interoperable option. Since that’s not to be had, they will choose what makes most sense on their OS platform – i.e. QuickTime on Macs (comes for “free”), WindowsMedia on Windows, and Ogg Theora on Linux. Yes, this makes business sense to some of us. It will certainly make Adobe happy because – as before – Flash will come out as the winner.

Assuming we do provide a baseline codec in the spec, a very similar situation will actually happen and the browsers will support different codecs initially, since Ogg Theora is just a recommendation, which will probably not be implemented in Apple or MS Web browsers. However, now, Web developer and content owners have a focus on what format they should be providing through the recommendation in the standard. And they will request support for the recommended baseline format from the vendors. So, there may actually be a chance that the confusing mess of codec formats may be sorted after a while. This is the chance we have to make things easier for Web developers and online businesses – and this is why a baseline codec is imperative.

What we now need is to address the issues of Apple, Nokia and MS with Ogg Theora. These are mostly around submarine patents. My suggestion is that the W3C pay an independent patent attorney to perform a patent research on Ogg Theora to address the perceived risks of the big vendors. If the patent search is as comprehensive as possible, we may reach a situation where the big vendors do not perceive the risk any longer. However, there is also a risk that Theora is found to infringe specific patents. I guess we will then either correct the codebase or just have put all our development efforts into Dirac. 🙂 In any case – all the FUD that is currently being sent both ways can then be addressed more easily with some decent data behind it.

Annodex the solution for ethnographic researchers

A few years ago when I was still at CSIRO, I was contacted by Linda Barwick from PARADISEC to research into the use of Annodex for linguists. The main problem was that ethnographic researchers are publishing research outcomes on paper or even HTML, which are essentially discussions about small sections of field recordings of exotic languages – however, they had no means to do citations of these sections through hyperlinks or any other simple interactive means. In the time of online media, that should be a trivial task, right? But it wasn’t. Annodex and the timed URIs provided the right basis for a solution.

Fast forward lots of months of work in the EthnoER project and you get a solution for ethnographic researchers which is unique and completely based on open formats and open source software. Check out Linda’s blog entry of today!

Congratulations to everybody who has put all that effort into the project – Nick Thieberger, Linda Barwick, Shane Stephens, Stuart Hungerford, Jonathan McCabe, and all the others whom I forgot. EthnoER and Annodex might have changed the way in which linguistic research online can be published – not a small feat at all!

Editing the Skeleton and CMML standards

In the last few weeks, I’ve created an Internet-Draft (I-D – a draft specification of an IETF RFC) for the Ogg Skeleton meta track, and updated the CMML I-D to include a new element called “caption” (CMML DTD). All of this is work that should have been done a long time ago, but I only got the motivation for it through the WHATWG work on HTML5 which will take Ogg Theora and Ogg Vorbis as baseline codecs. Since liboggplay is the key open source library that implements this baseline codec support, and liboggplay supports Annodex, it seems plausible that Annodex (which adds essentially Skeleton + CMML) will be available in Web browsers of the future. So, now is the time to fix up the few open issues that remain and cast the specifications into readable I-Ds.

If you haven’t seen the great functionality that will be available with liboggplay, you should check out the liboggplay javascript API. I’ve seen Shane make a demo web page through which you can toy with the javascript API, but haven’t got the link available right now.

WordPress bug? (was: Usability testing for Web2.0 sites)

Or .. why does password-protection suck in WordPress.

I wrote this entry and wanted to get feedback from the people that it was about, before publicly posting it. So I thought – wordpress has this great feature of password-protecting posts. Let me use that! But then the problems started: the post was actually added to my RSS feed and sucked in by a few planets and people started complaining that they were not able to read it. Well, it’s great to get feedback from the community that my posts are actually being read. But it sucks that wordpress handles password-protected post in such a bad manner. Is it something I did wrong or is that indeed a bug in wordpress? Leave me a comment!

LCA Multimedia Miniconf

The organisers of LCA have found another slot for a miniconf and ours is it! Yay!! We shall have an audio/video miniconf at LCA! This is particularly important since we will bring to Australia a large number of key open media application developers for FOMS. These guys will also be able to provide deep insight and understanding during talks provided to the more general LCA audience. Expect some awesome media talks at LCA!!

FOMS 2008 support by Mozilla Foundation

It is awesome to see FOMS – the Open Media Software developer workshop we ran for the first time this year – turning into a major audio and video developer event for Linux. FOMS 2008 will be in Mel8ourne in January and will focus on audio on Linux (in particular libsydneyaudio) and on native Firefox support for Ogg Theora (in particular liboggplay). Because of the latter, FOMS has attracted sponsorship by the Mozilla Foundation. This sponsorship is very welcome since most of the relevant developers come from overseas and are not part of large organisations that could afford to pay the expense. Check out the current list of participants on the site – it will be another milestone event for open media! And … thanks Mozilla Foundation!