Category Archives: Open Source

Embedding time-aligned text into Ogg

As part of my accessibility work for Mozilla and Xiph, it is necessary to define how time-aligned text such as subtitles, captions, or annotations, are encapsulated into Ogg. In the fansubber community this is called “hard subtitles” as opposed to “soft subtitles” which are subtitles that stay in a text file and are loaded separately to the video file into a media player and synchronised with the video by the media player. (as per comment below, all text annotations are “soft” – or also “closed”.)

I can hear you ask: so how do I do subtitles/captions with Ogg now? Well, it would have been possible to simply choose one subtitling format and map that into Ogg, then ask everyone to just use that one format and be done. But which one to choose? And why prefer a simpler one over a more complex one? And why just do subtitles and not any other time-aligned text?

So, instead, I analysed what types of time-aligned text “codecs” I have come across. Each one would have a multitude of text formats to capture the text data, because it is easy to invent a new format and standardisation hasn’t really happened in this space yet.

I have come up with the following list of typical time-aligned text codecs:

  • CC: closed captions (for the deaf)
  • SUB: subtitles
  • TAD: textual audio descriptions (for the blind – to be transferred to braille or TTS)
  • KTV: karaoke
  • TIK: ticker text
  • AR: active regions
  • NB: metadata & semantic annotations
  • TRX: transcripts / scripts
  • LRC: lyrics
  • LIN: linguistic markup
  • CUE: cue points, DVD style chapter markers and similar navigational landmarks

Let me know if you can think of any other classes of video/audio-related time-aligned text.

All of these texts can be represented in text files with some kind of time marker, and possibly some header information to set up the interpretation environment. So, the simplest way of creating a representation of these inside Ogg was to define a generic mapping for time-aligned text into Ogg.

The Xiph wiki holds the current draft specification for mapping text codecs into Ogg. For anyone wanting to map a text codec into Ogg, this should provide the framework. The idea is to separate the text codec’s data into header data and into timed text segments (which can have all sorts of styling and other information with it). Then, the mapping is simple. An example for srt is described on the wiki page.

The specification is still in draft status, because we’re still expecting feedback. In fact, what we now need is people trying an implementation and providing fixes to the specification.

To map your text codec of choice into Ogg, you will probably requrie further mapping specifications. Dependent on how complex your text codec of choice is, these additional mapping specifications may be rather simple or quite complicated. In the case of srt, it should be trivial. Considering the massive amount of srt already freely available online, the srt mapping may well have a really large impact. Enough hits. Let me know if you’re coding up something!

My next duty is to look for a representation that is generic enough to provide representations for any of the above listed text codecs. This representation is what will need to be available to a Web Browser when working with a Web video that has related text. Current contenders are OggKate and W3C TimedText, but I am not sure if either are too restrictive. I am indeed looking for the next generation of captioning technology that will be able to provide any type of time-aligned text that relates to audio/video.

News from the open media world

Today, there were so many news that I can only summarise them in a short post.

The guys from Collabora have announced that they are going to support the development of PiTiVi – one of the best open source video editors around. They are even looking to hire people to help Christian Schaller, the author of PiTiVi. The plan is to have a feature-rich video editor ready by April next year that is comparable in quality to basic proprietary video editors.

The BBC Dirac team have today announced a ffmpeg2dirac software package, which is built along the same lines as the commonly used ffmpeg2theora and of course transcodes any media stream to Ogg Dirac/Vorbis. With Ogg Dirac/Vorbis playback already available in vlc and mplayer, this covers the much needed creation side of Ogg Dirac/Vorbis files. Dirac is an open source, non-patent-encumbered video codec developed by the BBC. It creates higher quality video than Theora at comparable bitrates.

The FOMSFoundations of Open Media Software hacker workshop for open media software announced today the current list of confirmed participants for the January Workshop. It seems that this year we have a big focus on open video codecs, on browser support of media, on open Flash software, and on media frameworks. It is still possible to take part in the workshop – check out the CFP page.

Finally an important security message: Mozilla has decided to put a security measure around the HTML5 audio and video elements that will stop them from being exploited by cross-site scripting exploits. Chris Double explains the changes that are necessary to your setup to enable your published audio or video to be displayed on domains that are different to the domain on which these files are hosted.

Theora 1.0 released!

While the open source codec “Theora” has been available since 2004 in a stable format, the open source community is very careful about giving any piece of software the “1.0” stamp of quality and libtheora has been put under scrutiny for years.

Today, libtheora 1.0 was finally released – rejoice and go ahead using it in production!

More hard-core improvements to libtheora are also in the pipeline under a version nick-named “Thusnelda”, improving mostly on quality and bit-rate.

W3C Technical Plenary / Advisory Committee Meetings Week 2008

I spent last week in France, near Cannes, at the W3C TPAC meeting. This is the one big meeting that the W3C has every year to bring together all (or most) of the technical working groups and other active groups at the W3C.

It was not my first time at a standards body meeting – I have been part of ISO/MPEG before and also of IETF, and spoken with people at IEEE and SMPTE. However, this time was different. I felt like I was with people that spoke my language. I also felt like my experience was valued and will help solving some of the future challenges for the Web. I am very excited to be an invited expert on the Media Fragments and Media Annotations working groups and be able to provide input into HTML5.

In the Media Fragments working group we are developing a URI addressing scheme that enables direct linking to media fragments, in particular temporal and spatial segments. Experience from our earlier temporal URI scheme is one of the inputs to the scheme. Currently it looks likely that we will choose a scheme that has “#” in it and then require changes to browsers, Web proxys, and servers to enable delivery of media fragments.

In the Media Annotations working group we are deciding upon an ontology to generically describe media resources – something based on Dublin Core but more extended and more appropriate for audio and video. We are currently looking at Adobe’s XMP specification.

As for HTML5 – there was not much of a discussion at the TPAC meeting about the audio and video elements (unless I missed it by attending the other groups). However, from some of the discussions it became clear to me that they are still in very early stage of specification and much can be done to help define the general architecture of how to publish video on the Web and its metadata, help define javascript APIs and DOM models, and help define accessibility.

I actually gave a lightning talk about the next challenges of HTML5 video at TPAC (see my “video slides“) which points out the need for standard definitions of video structure and annotations together with an API to reach them. I had lots of discussions with people afterwards and also learnt a lot more about how to do accessibility for Web video. I should really write it up in an article…

Of course, I also met a lot of cool people at TPAC, amongst them Larry Masinter, Ian Hickson, and Tim Berners-Lee – past and new heros of Web standards. 🙂 It was totally awesome and I am very grateful to Mozilla for sending me there and enabling me to learn more about the greater picture of video accessibility and the role it plays on the Web.

Demo of new HTML5 features

Ian Hickson, the main editor of the new HTML5 specification, gave a talk about some of the cool new features in HTML5 and some of the early implementations of these features in different browsers.

It’s pretty long demo with 1:25 hrs but he types in all the code manually, so you can re-do all of the demos yourself. The script of the talk with code examples is here.

The first 5 minutes are about the new video element and really worth watching.

Also, at 1:11 hrs Ian is asked about the choice of baseline codecs, in case you want to hear him speak what he has publicly written elsewhere.

I can’t wait to marry the video features with:

  1. the new media fragment addressing schemes in development at the W3C
  2. captions, subtitles and other timed text annotations for videos.

These will allow us search for specific topics directly inside the video (such as “form controls” in Ian’s video) and to hyperlink straight into these time offsets. A completely new world is coming!

Video Accessibility for Firefox

Ogg has struggled for the last few years to recommend the best format to provide caption and subtitle support for Ogg Theora. The OGM fork had a firm focus on using subtitles in SRT, SSA or VobSub format. However, in Ogg we have always found these too simplistic and wanted a more comprehensive solution. The main aim was to have timed text included into the video stream in a time-aligned fashion. Writ, CMML, and now Kate all do this. And yet, we have still not defined which is the one format that we want everybody to support as the caption/subtitle format.

With Ogg Theora having been chosen by Mozilla as the baseline video codec for Firefox and the HTML5 <video> tag, Mozilla is looking to solve this problem in a community fashion: the solution needs to be acceptable to Xiph, supported by Opera who are also experimenting with Ogg Theora, and ultimately provide a proposal to the W3C and WHATWG that can sensibly be included into HTML5.

As a first step in this direction, Mozilla have contracted me to analyse the situation and propose a way forward.

The contract goes beyond simple captions and subtitles though: it analyses all accessibility requirements for video, which includes audio annotations for the blind, sign language video tracks, and also transcripts, karaoke, and metadata tracks as more generic timed text example tracks. The analysis will thus be about how to enable a framework for creating a timed text track in Ogg and which concrete formats should be supported for each of the required functionalities.

While I can do much of the analysis myself, a decision on how to move forward can only be made with lots of community input. The whole process of this analysis will therefore be an open one with information being collected on the Mozilla Wiki, see https://wiki.mozilla.org/Accessibility/Video_Accessibility .

An open mailing list is also set up at Xiph to create a discussion forum for video accessibility: accessibility@lists.xiph.org. Join there if you’d like to provide input. I am particularly keen for people with disabilities to join because we need to get it right for them!

I am very excited about this project and feel honoured for being supported to help solve accessibility issues for Ogg and Firefox! Let’s get it right!

“Venuturous Australia” at Pearcey awards event

Yesterday was a long and fascinating day of discussions about innovation in Australia.

At this year’s Pearcey Medal and NSW Pearcey State Award event, the focus was on the recently released innovation report from Terry Cutler with a focus on the effects on ICT (Information and Communication Technology).

If you only look at the summary report, you will miss the structure of the full report, which is why I have outlined it here:

  • Chapter 1 stalling not sprinting
  • Chapter 2 the national innovation system
  • Chapter 3 innovation in business
  • Chapter 4 the case for a public role in innovation
  • Chapter 5 strengthening people and skills
  • Chapter 6 building excellence in national research
  • Chapter 7 information and market design
  • Chapter 8 tax and innovation
  • Chapter 9 market facing programs
  • Chapter 10 innovation in government
  • Chapter 11 national priorities for innovation
  • Chapter 12 governance of the innovation system

I took home a few very interesting observations from reading the reports and from the discussions at the Pearcey event.

But before I can comment, I have to state which organisations I see as ICT innovators in Australia.

  • The government-funded ones are the Universities, NICTA and CSIRO (CRCs fall in the same general class).
  • The big drivers of transforming new research outcomes into business are start-ups and the SMEs.
  • Further innovation happens in large companies and multi-nationals with a stronger focus on incremental innovation rather than disruptive innovation.
  • In ICT, we need to add another big driver of innovation: open source software. I’ll explain this later in more depth.


The following observations on VenturousAustralia and what I took away from the Pearcey awards are on these topics:

  1. Support of fundamental R&D in ICT
  2. Commercialisation of ICT innovation
  3. Enabling SMEs to succeed
  4. Regard for the contribution of Open Source

TOPIC: ICT and innovation

At the Pearcey awards, we had long discussions about whether ICT was appropriately represented in the report and whether the recommendations are pushing ICT further into a supportive role while missing our opportunities to innovate and lead in core ICT.

It is generally accepted that ICT has a major effect on the productivity increase of almost all Australian industries. DCITA reports show that in service industries, between 35 and 65 per cent of productivity growth is estimated to have been driven by technological factors

Ogg Theora video, Dailymotion and OLPC

Today, three of the worlds that I am really engaged in and that tend to not have much in-common with each other seemed to come to a sudden overlap.

The three worlds I am talking about are:

  • Social video publishing (through my company Vquence)
  • One Laptop Per Child (I am really keen to see more OLPC work in the Pacific)
  • Open media software and technology (through Xiph and Annodex work, as well as FOMS)

I was positively surprised to read in this blog message that Dailymotion and the OLPC foundation have partnered to set up a video publishing channel for videos that can be viewed on the OLPC. The channel is available at olpc.dailymotion.com. You can view it on your computer if you have the appropriate codec libraries for Windows and the Mac installed. Your Linux computer should just support it.

To understand the full impact of this message, you have to understand that the XO (the OLPC laptop) does not support the playback of Flash video by default. OLPC cannot ship the official Adobe Flash plugin on the XOs because it is legally restricted and doesn’t meet the OLPC’s standards for open software. Thus, children that receive an XO are somewhat cut off from social video sites like YouTube, Dailymotion, Blip.tv, MySpace.tv, video.google.com and others, even though there are lots of education-relevant videos published there.

The XO however ships with video technology that IS open: namely the Ogg Theora/Vorbis video codec and software. This is incidentally also the codec that the next version of Firefox will be supporting out of the box without need of installation of a further plugin.

Unfortunately, most video content nowadays available on the Internet is not available in the Ogg Theora/Vorbis format. Therefore, Dailymotion and the OLPC Foundation launching this channel that is automatically republishing all the videos uploaded to the Dailymotion OLPC group is a really big thing: It’s a major social video site republishing video in an open format to enable it to be viewed on open systems.

New Ogg MIME Types ratified

The IETF has just ratified RFC 5334 “Ogg Media Types”, which I have co-authored.

The new Ogg MIME types are as follows:

  • audio/ogg for all Ogg files that contain predominantly audio, such as Ogg Vorbis files (.ogg or .oga), Ogg Speex files (.spx) or Ogg FLAC files. The file extension recommended to be used is .oga, but .ogg will continue to be used for Ogg Vorbis I files for backwards compatibility.
  • video/ogg for all Ogg files that contain predominantly video, such as Ogg Theora or Ogg Dirac files. The file extension recommended to be used is .ogv. Please stop using .ogg for Ogg Theora files, since that causes havoc for any application trying to determine which application to use for opening such a file.
  • application/ogg used to be the MIME type recommended for any Ogg encapsulated file. This is obsoleted by the new RFC. Instead, application/ogg is a generic MIME type that can be used for Ogg files containing custom content tracks. This may e.g. be a Ogg file with 5 vorbis, 2 speex, 2 theora, 5 CMML, 2 Kate, and a custom image tracks. Such files have to use the Skeleton extension to Ogg to be able to describe the content of the file. The file extension recommended to be used is .ogx.

The RFC also specifies the possibility of using codec parameters to the MIME types to specify directly within the MIME type what codecs are contained inside the files. This may for example be “video/ogg; codecs=’dirac,speex,CMML'”.

More details on these decisions and on further considered MIME types are in the Xiph wiki.

Disclaimer: I had no influence on the funny number game that happened between the obsoleted rfc3534 and the new rfc5334. 🙂

Happy MIME-typing!!