Ogg has struggled for the last few years to recommend the best format to provide caption and subtitle support for Ogg Theora. The OGM fork had a firm focus on using subtitles in SRT, SSA or VobSub format. However, in Ogg we have always found these too simplistic and wanted a more comprehensive solution. The main aim was to have timed text included into the video stream in a time-aligned fashion. Writ, CMML, and now Kate all do this. And yet, we have still not defined which is the one format that we want everybody to support as the caption/subtitle format.

With Ogg Theora having been chosen by Mozilla as the baseline video codec for Firefox and the HTML5 <video> tag, Mozilla is looking to solve this problem in a community fashion: the solution needs to be acceptable to Xiph, supported by Opera who are also experimenting with Ogg Theora, and ultimately provide a proposal to the W3C and WHATWG that can sensibly be included into HTML5.

As a first step in this direction, Mozilla have contracted me to analyse the situation and propose a way forward.

The contract goes beyond simple captions and subtitles though: it analyses all accessibility requirements for video, which includes audio annotations for the blind, sign language video tracks, and also transcripts, karaoke, and metadata tracks as more generic timed text example tracks. The analysis will thus be about how to enable a framework for creating a timed text track in Ogg and which concrete formats should be supported for each of the required functionalities.

While I can do much of the analysis myself, a decision on how to move forward can only be made with lots of community input. The whole process of this analysis will therefore be an open one with information being collected on the Mozilla Wiki, see .

An open mailing list is also set up at Xiph to create a discussion forum for video accessibility: Join there if you’d like to provide input. I am particularly keen for people with disabilities to join because we need to get it right for them!

I am very excited about this project and feel honoured for being supported to help solve accessibility issues for Ogg and Firefox! Let’s get it right!

  1. Hi. well, as a website developer, and doing my own hobby site and also got some student projects which i did at tafe, as looking for part time it employment via, and also a programming student, using visual 2008, i am also interested in accessability for video and audio, as downloading video tracks, via u tube, and sites like nine msn, for example, i went to the video about what the ladies were wearing on the wide world of sports link, and jaws, that is the screen reader i am using for internet explorer 7 and firefox 3.0, and the web visun accessible plugin, which allows the user to be able to hear contents of captures, use download and install the softwar ena and get an invite at i think.
    a firefox addon, which when you install it, and use firefox 3.0, which i have done say on, for signing up, it allow you to send the capture to the site, using alt control 6 or 7 on the number row of the keyboard, sends it to the site, then sends it back to your browser, you then are able to hear the capture word, then copy and paste, so this is useful, for such sites that use capture. but i am digressing. it would be handy, if there was audio narration or audio description, which are in some dvds where you have a separate audio track, and some one describing the non dialog sense, say like under special features on a dvd, like spiderman 3, you can go to special features and audio description. like the video was listening to today, it would be handy if the non dialog cences in the video, wer audio descrbied, and would have liked to know what the people were wearing, the ladies, and what colour, and what type, shape, etc. you get the idea. just my thoughts as a blind computer user, using the web with assistive technologies.

  2. Hi Marvin,

    Thanks for the feedback and the description of what you suggest is useful. We are definitely looking at captions and audio anotations.

    In fact, we are thinking about having audio annotations as text to make them more easily usable.

    They can then be either transferred to a text-to-speech (TTS) reader and played on a separate channel (possibly even in faster speed), or displayed as braille, or translated into a different language, or even indexed by a search engeine. All of this would be in addition to the actual sound track of the video.

    Would that make sense to you?


  3. Well, initially I have no suggestions because I do need to look what’s really going on first, but let me thank you for what are you going to do during the following weeks or months. As an audio geek I was following ogg vorbis development from the user’s perspective for some 4 or more years. I am also using vorbis when doing some private recordings here and there. I am interested in Mozilla’s browser because of their dedication to accessibility. Now when xiph and mozilla are “joining forces” I really feel it might lead to success at least from my personal perspective. You know I find it surprising hunting for an audio related news in the xiph’s mailing lists and reading brings me to this resource.

