All posts by silvia

Metadata and Ogg

I am really excited about the huge progress we made at FOMS with metadata and Ogg. The metadata specifications are actually not Ogg-specific – only their mapping into Ogg is. Here are the things that I expect will make for a very structured and sensible distributed handling of metadata on the Web.

At FOMS, we started improving CMML and are now specifying the next version of CMML. CMML is a timed text description language that can easily be multiplexed alongside audio or video data. It is very flexible with its fields and satisfies needs for hypermedia, captions, annotations and other time-aligned text. We took out the Ogg dependencies and it can now be used in any media container format. The specification is now also in an XML schema rather than a DTD, which enables us to reuse modules from XHTML and make it generally more extensible.

We introduced ROE, a description language (or a “manifest”) for multitrack media files. It describes media tracks and their dependencies and thus goes much further than the old stream and import elements in CMML, that now have been deprecated.

ROE can be used to author multitrack media files – in the Ogg case to author Ogg files with a Skeleton track and multiple media tracks. We are in the process of extending Skeleton to incorporate the description of dependencies between logical bitstreams. To complete this, we will be creating a description of how to map ROE into Ogg/Skeleton and vice versa.

ROE can also be used to negotiate with a Web client what media streams to send from the complete manifest that is available on the server. For example, a Web client could request the German sound track with a movie rather than the default English one, and to add English subtitles. This requires a small protocol for negotiation, which can easily be build using Web infrastructure. We are introducing some new HTTP request/response parameters and specific URLs, such as e.g. http://example.com/movie.ogg?track=V1,A2,TT2.

The set of ROE, Skeleton, CMML, and the HTTP and URI specifications will enable a very structured means of interacting with metadata-rich video on the Web. It will be distributed and integrated into the Web infrastructure, much like the Annodex set of technologies already is today.

Since I am also a business owner aside of being an open media enthusiast, let me add that I expect it to have a huge impact on online business around audio and video, enabling business processes and business models that are not possible today. Watch this space!

The greatest gathering of open media sw developers

When I started organising the first FOMS (Foundations of open media software developers workshop) in 2007, I did it because I saw a need to have media hackers get together in a room and discuss stuff in person. Email, irc, svn, bugzilla and wikis only get you a certain distance for collaboration. But no distance communication tool can replace the energy and creative spirit that is created through an in-person meeting and the ability to have a beer together in the evening. Discussions are more intense, impossibilities are identified faster, progress is amazing – and the energy will last and have an impact on the community for months to come after the event.

FOMS 2007 was great in that respect, because some 25 hackers got to know each other for the first time, friendships were formed, trust was built and new ideas (speaking: new code) was created. It was awesome and gave me the motivation to go and organise FOMS 2008. At this point let me express my gratitude to the organising committees of both FOMS 2007 and FOMS 2008 for the support they have given me to organise both workshops and hope they will help again next year in Tasmania.

So then FOMS 2008 took place and what can I say!? It totally blew me away. For me it was a much better experience than the year before because I didn’t also organise the video recordings at LCA. I was therefore more relaxed, got involved in design discussions, and was able to sit down during the week after FOMS at LCA and actually interact with people. On a side note here: Thanks so much to Donna Benjamin, the main organiser of LCA 2008, for getting the FOMS participants a room to ourselves where we were able to gather and get an awesome whole lot of work done.

Nearly the whole Xiph community was at FOMS and issues that have been brewing for years were tabled and discussed. A large number of audio hackers were there, too, and the issue of a standard sound APIs got some heated discussion. There’s a press release and the proceedings of the FOMS discussions up on the FOMS 2008 website, where you can make yourself a complete picture of all the issues that were discussed.

In addition to FOMS, Conrad Parker and I had also organised a Multimedia Miniconf at LCA. It was a great place to communicate some of the outcomes of FOMS and to present some of the latest developments in open media software in the Linux community. Video proceedings are available on the site.

Overall I must say that January has become the highlight of my year in open media software.

“IT’s a mad men’s world”

Last night I took part in a panel that was organised by Rachel Slattery under the title of “IT’s a mad men’s world.”. There were a whole lot of really fascinating women there, both in the audience and the panel, but also some stray men, which was good to see. With me on the panel were Sue Klose, Corporate Development Director of News Digital Media; Juliet Potter, Founder & Director of www.autochic.com.au; and Tim Batten, Head of eChannels & Payments at Westpac. The panel was moderated by Sandra Davey, Director of kcollective.

The discussion was really awesome and the stories that each of us could tell of situations where we had to stand up for ourselves just for being a woman were shocking. But what really got me was the universal message that we all had: don’t let the morons get you down and let go of your goals. Fight the fights that are worth it. It’s OK if not everybody loves you – you have to ask yourself: do you want to be liked or respected?

I actually did a little research before the event and wasn’t able to share half of the things I learnt. So I thought I’d put some more in this blog post.

I came across this xkcd cartoon just yesterday and thought: wow, this really is the essence of the problem why we have so few women in IT.

xkcd: how it works

You may be thinking that this cartoon represents a problem with male (or indeed societal) prejudices against women. I actually think the problem is deeper.

Imagine you’re a girl and have to decide on a career. You’re pretty good at many things and could be going into a technical career. But you have little experience since you’ve had little exposure and no mentors in the field before. Would you take the chance to expose yourself to looking really dumb, possibly even failing? Not just are you taking the hard road for yourself if you do. But there’s the larger impact on the perception of women. By looking dumb or failing, you will shed a bad light on all women and thus confirm the prejudice, making it even harder for other women to go into the field. Now do you start to understand why there are so few and each year even less women in technical jobs?

You think I’m taking this too far? Don’t. Women are being taught from very early on to not just think about themselves, but to be cooperative and always consider their environment. While such thoughts might not be consciously taken, they are there and play a role.

What do I really want to say with this? It’s not just a matter of changing men and indeed societal attitudes towards women. It’s also a matter of building up women’s self-confidence, teaching women how to be competitive and independent. And you have to start at school with encouraging and introducing women into IT. Because really: “Computing is too important to be left to the men” (quote from Karen Sparck-Jones).

UPDATE: I have heard from several men that they find that quote rather offensive and read it as in “we should not trust men with computing”. That is absolutely not the way I read it. I want it to be read as an encouragement to women to go into computing – it is an important field for the future of humanity and half of humanity is not taking part in shaping it. That’s just not right.

Further reading material:

Sexier new Vquence player

I’ve been meaning to write about this for a while, but haven’t found a good motivation yet. Today I stumbled across the videos from RailsConf2007 on Blip.tv and decided – this is it! I will show off the nice new sexy layout of the Vquence player with this content – after all, we are a rails shop (apart from all those other programming languages that we use).

Julian has worked over the design of the player in December and done an awesome job. The image pane’s scroll slows down as your reach the left or right border. It works similar to a scrollbar, where if you go to the middle of the image pane, it will scroll to the middle clip in the playlist. As you leave the image pane, it snaps back to focus on the clip that you are currently watching.

The new player also has a lot more text in it. As you mouse over the images, you get the titles of the clips. As you click on the (i) button, you get the annotations of the current clip (click (i) again to make it go away). At the beginning of each clip, there’s a small text reminder at the top that a click on the video will take you to the full video.

And finally – to give the video more space, the transport bar actually disappears as you keep watching and stop interacting with the player. This gives it more of a sit-back experience. The possibility to activate the full-screen display also adds to this experience.

Overall, I am really thrilled how far we have taken the player. Enjoy!

(But should you have any feedback or suggestions for improvement, feel free to shoot me an email or leave a comment.)

Quick links to Ogg-related W3C video Workshop papers

Michael Dale: Metavid & Free Online Video (University Of California at Santa Cruz)

Chris Double: Position Paper for the W3C Video On The Web Workshop (Mozilla Corporation)

Håkon Wium Lie: Opera Software’s position paper for Video on the Web (Opera Software)

Silvia Pfeiffer: Architecture of a Video Web – Experience with Annodex (Annodex Association)

Silvia Pfeiffer: Hyperlinking to time offsets: The temporal URI specification (Annodex Association)

Native javascript support for annotated and indexed media in Web browsers

Many people wonder what the future of video on the Web should be and want a more integrated and simpler video solution than what flash provides right now.

The W3C and WHATWG’s move towards a video element in HTML5 is a good first step.

However, it is not enough.

At the recent W3C’s video workshop, I realised that people’s requirements and expectations go far beyond what the HTML5 spec is currently providing. And most of those requirements can be satisfied with the Annodex technologies. But it will need a lot of explaining, documenting and demonstrating to show that Annodex provides these solutions in a simple, yet comprehensive manner. And what’s more: any technology developed to satisify the requirements will need to take on board many of the design decisions that we made for Annodex, so I hope, whatever will be the next Video Web technology, we can provide our input.

The most fundamental point to understand is that you cannot create a solution for video webs without considering all aspects of handling video on the Web in an integrated fashion. This includes topics such as the URI addressing scheme, seeking and indexing of video, the metadata and annotation scheme, and how all of this fits together with the binary video data and Web servers. Let me repeat: these topics have to be addressed together and not as separate projects, because they influence each other!

Apart from Annodex, no other existing or suggested video technology for the Web brings together all the required facets to really solve the big picture – and that includes video metadata specifications, hyperlinking approaches, codecs etc.

Having said all of this, let me demonstrate to you what I mean by full integration.

Shane Stephens has been coding on a library that brings native Annodex support into Web browsers (called liboggplay) and has provided me with a video that demonstrates what you can do as a programmer once your Web browser understands Annodex. Take note of the integrated use of annotations. And also of the simplicity of URI addressing. And the use of an adapted Web server.

Javascript video API liboggplay

The video is available in Ogg Theora format and on YouTube.

About baseline video codecs and HTML5

[I wrote this more than 8 months ago, but didn’t want to publish it at the time because I want us to solve the issues around video in HTML5 and not fight each other. But I’ve made some changes and I’m now ready to have it published.]

There’s a clash of ecosystems happening at the WHATWG mailing list around the need for the specification of a baseline codec for a future <video> tag in HTML.

The clash is mostly between the open community which want Ogg Theora as a recommended baseline codec and big vendors (Apple & Nokia), which wanted that recommendation taken out. They claim that such a recommendation has nothing to do in a HTML standard, which should specify tags but not recommend external file formats. From one perspective, I agree – some things are better left to the software engineers to decide and left open to the market. However, in this particular instance, I think it would be a big mistake not to specify a baseline video codec. In fact, it would in my mind make the whole move to a new HTML5 standard an irrelevant exercise.

Let’s look at history and play a mind game on the consequences of such a decision.

Around the turn of the century we had a wonderfully diverse situation: we had RealMedia, QuickTime and WindowsMedia all being video formats that people expected to find on the Internet and to stream video. It most certainly made business sense to the involved companies! However, it made no business sense to Web developers and media content producers. They had to set up a transcoding and streaming infrastructure for all these three formats in parallel if they were wanting to reach all their potential clientele. I have actually seen this happening here in Australia at the ABC, which has a mandate to serve all the Australian people and therefore had to provide video in all potential formats. I remember the pain that was written across the faces of the infrastructure people.

A few years fast forward and the ABC can now give sighs of relief: supporting Adobe Flash, they can do away with all this expensive and support-intensive infrastructure and just support one codec.

Another story from the past to keep in mind is the story of PNG and GIF http://www.libpng.org/pub/png/pnghist.html where the collecting of royalties on the GIF codec started the creation of the open and free PNG format, which became a W3C recommendation in 1996 (see http://www.w3.org/Press/PNG-PR.en.html). TBL states in there “We are seeing more of our Members adopt the format and are helping make it the industry standard.”

With these in mind, let’s try and project into the future.

Assuming we do not provide a baseline codec in the spec, what will happen is that we will see each browser adopt support for the codec that “makes business sense”, i.e. Microsoft will support WindowsMedia, and Apple will support QuickTime, while the rest will be looking for a “cheaper” codec which could e.g. be MPEG-1 or Ogg Theora. Or stated differently: we will end up with the same situation that we had around 2001 with streaming codecs, except that Web developers and content owners still have the choice of Flash through the object/embed tag. Who will we confuse? The consumers who will be wanting to create their own content and publish it online. They will want a free and interoperable option. Since that’s not to be had, they will choose what makes most sense on their OS platform – i.e. QuickTime on Macs (comes for “free”), WindowsMedia on Windows, and Ogg Theora on Linux. Yes, this makes business sense to some of us. It will certainly make Adobe happy because – as before – Flash will come out as the winner.

Assuming we do provide a baseline codec in the spec, a very similar situation will actually happen and the browsers will support different codecs initially, since Ogg Theora is just a recommendation, which will probably not be implemented in Apple or MS Web browsers. However, now, Web developer and content owners have a focus on what format they should be providing through the recommendation in the standard. And they will request support for the recommended baseline format from the vendors. So, there may actually be a chance that the confusing mess of codec formats may be sorted after a while. This is the chance we have to make things easier for Web developers and online businesses – and this is why a baseline codec is imperative.

What we now need is to address the issues of Apple, Nokia and MS with Ogg Theora. These are mostly around submarine patents. My suggestion is that the W3C pay an independent patent attorney to perform a patent research on Ogg Theora to address the perceived risks of the big vendors. If the patent search is as comprehensive as possible, we may reach a situation where the big vendors do not perceive the risk any longer. However, there is also a risk that Theora is found to infringe specific patents. I guess we will then either correct the codebase or just have put all our development efforts into Dirac. 🙂 In any case – all the FUD that is currently being sent both ways can then be addressed more easily with some decent data behind it.

The Future of Video on the Web

We are in the middle of a big technological change for the dear old World Wide Web. And it will have a massive impact on how we are using video on the Web.

Not only is the Web Hypertext Application Technology Working Group (WHATWG) defining an all-new HTML5 standard which will have a native video tag (just as current HTML4 has a native img tag).

The W3C is wondering how to go even beyond that onto a road that will make video a first-class citizen on the Web. Next week, a W3C Video Workshop will be held on that exact topic.

Funnily enough, when we described the aim of the Annodex project at CSIRO in the year 2000, we used those exact words: how to make video a first-class citizen on the Web. At that time, people thought we were crazy. Now that YouTube is a commonly accepted phenomenon, we can actually see the limitations of existing video technology on the Web: we can still not interact as naturally with video as we do with Web pages – we can still not search well for video – and we can still not mash-up video as easily as we do with HTML pages, e.g. through RSS feeds.

I will be travelling to the US next week to share our experiences on Annodex with the Web World and have my input on what the future of video on the Web should look like. To that end, I have submitted two position papers to the workshop – one on Temporal URIs and one on our experiences with Annodex and CMML. Check out the other cool talks on the agenda or even the full list of position papers that got submitted!

Also, I have just been asked whether I would like to be part of the “Future of Video and Next Steps” Panel on the second day of the workshop – a panel that has been very well selected to represent online and traditional video technology, content interests, and consumer interests. I am looking forward to a very lively discussion and a great overall workshop that may be the first step towards a better video web.

Video on the Web is still only at the beginning of its evolution – comparable to the evolution that film and movie theatres have gone through over the last hundred years. It’s awesome to be working on the next technology revolution and to see that the best is yet to come!