Tag Archives: Ogg

Accessibility support in Ogg and liboggplay

At the recent FOMS/LCA in Wellington, New Zealand, we talked a lot about how Ogg could support accessibility. Technically, this means support for multiple text tracks (subtitles/captions), multiple audio tracks (audio descriptions parallel to main audio track), and multiple video tracks (sign language video parallel to main video track).

Creating multitrack Ogg files
The creation of multitrack Ogg files is already possible using one of the muxing applications, e.g. oggz-merge. For example, I have my own little collection of multitrack Ogg files at http://annodex.net/~silvia/itext/elephants_dream/multitrack/. But then you are stranded with files that no player will play back.

Multitrack Ogg in Players
As Ogg is now being used in multiple Web browsers in the new HTML5 media formats, there are in particular requirements for accessibility support for the hard-of-hearing and vision-impaired. Either multitrack Ogg needs to become more of a common case, or the association of external media files that provide synchronised accessibility data (captions, audio descriptions, sign language) to the main media file needs to become a standard in HTML5.

As it turn out, both these approaches are being considered and worked on in the W3C. Accessibility data that are audio or video tracks will in the near future have to come out of the media resource itself, but captions and other text tracks will also be available from external associated elements.

The availability of internal accessibility tracks in Ogg is a new use case – something Ogg has been ready to do, but has not gone into common usage. MPEG files on the other hand have for a long time been used with internal accessibility tracks and thus frameworks and players are in place to decode such tracks and do something sensible with them. This is not so much the case for Ogg.

For example, a current VLC build installed on Windows will display captions, because Ogg Kate support is activated. A current VLC build on any other platform, however, has Ogg Kate support deactivated in the build, so captions won’t display. This will hopefully change soon, but we have to look also beyond players and into media frameworks – in particular those that are being used by the browser vendors to provide Ogg support.

Multitrack Ogg in Browsers
Hopefully gstreamer (which is what Opera uses for Ogg support) and ffmpeg (which is what Chrome uses for Ogg support) will expose all available tracks to the browser so they can expose them to the user for turning on and off. Incidentally, a multitrack media JavaScript API is in development in the W3C HTML5 Accessibility Task Force for allowing such control.

The current version of Firefox uses liboggplay for Ogg support, but liboggplay’s multitrack support has been sketchy this far. So, Viktor Gal – the liboggplay maintainer – and I sat down at FOMS/LCA to discuss this and Viktor developed some patches to make the demo player in the liboggplay package, the glut-player, support the accessibility use cases.

I applied Viktor’s patch to my local copy of liboggplay and I am very excited to show you the screencast of glut-player playing back a video file with an audio description track and an English caption track all in sync:

elephants_dream_with_audiodescriptions_and_captions

Further developments
There are still important questions open: for example, how will a player know that an audio description track is to be played together with the main audio track, but a dub track (e.g. a German dub for an English video) is to be played as an alternative. Such metadata for the tracks is something that Ogg is still missing, but that Ogg can be extended with fairly easily through the use of the Skeleton track. It is something the Xiph community is now working on.

Summary
This is great progress towards accessibility support in Ogg and therefore in Web browsers. And there is more to come soon.

More video accessibility work

It’s already old news, but I am really excited about having started a new part-time contract with Mozilla to continue pushing the HTML5 video and audio elements towards accessibility.

My aim is two-fold: firstly to improve the HTML5 audio and video tags with textual representations, and secondly to hook up the Ogg file format with these accessibility features through an Ogg-internal text codec.

The textual representation that I am after is closely based on the itext elements I have been proposing for a while. They are meant to be a simple way to associate external subtitle/caption files with the HTML5 video and audio tags. I am initially looking at srt and DFXP formats, because I think they are extremes of a spectrum of time-aligned text formats from simple to complex. I am preparing a specification and javascript demonstration of the itext feature and will then be looking for constructive criticism from accessibility, captioning, Web, video and any other expert who cares to provide input. My hope is to move the caption discussion forward on the WHATWG and ultimately achieve a cross-browser standard means for associating time-aligned text with media streams.

The Ogg-internal solution for subtitles – and more generally for time-aligned text – is then a logical next step towards solving accessibility. From the many discussions I have had on the topic of how best to associate subtitles with video I have learnt that there is a need for both: external text files with subtitles, as well as subtitles that are multiplexed with the media into a single binary fie. Here, I am particularly looking at the Kate codec as a means of multiplexing srt and DFXP into Ogg.

Eventually, the idea is to have a seamless interface in the Web Browser for dealing with subtitles, captions, karaoke, timed metadata, and similar time-aligned text. The user interaction should be identical no matter whether the text comes from within a binary media file or from a secondary Web resource. Once this seamless interface exists, hooking up accessibility tools such as screen readers or braille devices to the data should in theory be simple.

Firefox plugin to encode Ogg video

Michael Dale just posted this to theora-dev. Go to one of the given URLs to install the Firefox plugin that lets you transcode video to Ogg using your Web browser.

Firefogg is developed by Jan Gerber and lives at http://www.firefogg.org/. There is a javascript API available so you can make use of Firefogg in your own Website project to allow people to upload any video and transcode it to Ogg on the fly.

Enjoy!

On Fri, Jun 5, 2009 at 7:08 AM, Michael Dale wrote:
> I mentioned it in the #theora channel a few days ago but here it is with
> a more permanent url:
>
> http://www.firefogg.org/make/advanced.html
> &
> http://www.firefogg.org/make/
>
> These will be simple links you can send people so that they can encode
> source footage to a local ogg video file with the latest and greatest
> ogg encoders (presently thusnelda and vorbis). Updates to thusnelda and
> possible other free codecs will be pushed out via firefogg updates 😉
>
> Pass along any feedback if things break or what not.
>
> I am also doing testing with “embed” these encoder interface. For those
> familiar with jQuery: an example to rewrite all your file inputs with
> firefogg enhanced inputs: $(“input:[type=’file’]”).firefogg() … Feel
> free to expeirment based on those examples. The form rewrite has mostly
> only been tested in the mediaWiki context:
> http://sandbox.kaltura.com/testwiki/index.php/Special:Upload
> but with minor hacking should work elsewhere 🙂
>
> enjoy
> –michael
>
> _______________________________________________
> theora mailing list
> theora@xiph.org
> http://lists.xiph.org/mailman/listinfo/theora
>

First draft of a new media fragment URI addressing standard

Those who know me well know that a few years ago (in fact, almost 10 years now) we developed the Annodex set of technologies at the CSIRO in a project called “Continuous Media Web”.

The idea was to make time-continuous data (read: audio and video) a integral part of the Web. It would be possible to search for media through standard search engines. It would be possible to link into and out of media as we link into and out of Web pages. It would be possible to mash up video from different Web servers into a single media stream just like we are able to mash up images, text and other Web resources from different Web servers.

As you are all aware, we have made huge steps towards this vision in the last 10 years. We now have what is called “universal search” – search engines like Google and Yahoo don’t return only links to HTML pages any longer, but return links to videos and images just as well.

But it doesn’t go far enough yet – even now we still cannot link into a long-form video to the right fragment that has the exact context of what we have been searching for.

In the Annodex project we implemented a working version of such a deep universal search engine in the year 2003 on top of the Panoptic search engine (a enterprise search engine developed by CSIRO, later spun out and now sold as Funnelback).

The basis for our implementation was the combination of specifications that we developed around Ogg:

  • An extension on Ogg that allows to create valid Ogg streams from subparts of Ogg streams – this is now part of Ogg as Skeleton.
  • A means of annotating Ogg streams with time-aligned text that could be interleaved with the Ogg media stream to produce streams that knew more about themselves – the format was called CMML for Continuous Media Markup Language.
  • And an extension to the URI addressing of Ogg streams using temporal URIs.

I am very proud that in the last 2 years, the development of a generic media fragment URI addressing approach has been taken up by the W3C and Conrad Parker and I are invited experts on the Working Group.

I am even more proud that the Working Group has just published a First Public Working Draft of a document called “Use cases and requirements for Media Fragments“. It contains a large collection of examples for situations in which users will want to make use of media fragments. It defines that the key dimensions of fragmentation that need to be specified are:

  1. Temporal fragmentation
  2. Spatial fragmentation
  3. Track fragmentation
  4. Name fragmentation

Beyond mere use cases and requirements, the document also contains a survey of technologies that address multimedia fragments.

In a first step towards the development of a Media Fragments W3C Recommendation, this document also discusses a proposed syntax for media fragment URI addressing and proposes different processing approaches. These sections will eventually be moved into the recommendation and are the most incomplete sections at this point.

To explain some of the approaches that are being proposed in more detail, here are some examples of media fragment URIs that are proposed through this WD:

  • http://www.example.com/example.ogv#t=10s,20s – addresses the fragment of example.ogv that lies between the 10s and the 20s offset
  • http://www.example.com/example.ogv#track='audio' – addresses the track called “audio” in the example.ogv file
  • http://www.example.com/example.ogv#track='audio'&t=10s,20s – addresses the track called “audio” on the subpart between the 10s and 20s offset in the example.ogv file
  • http://www.example.com/example.ogv#xywh=pixel:160,120,320,240 – addresses the example.ogv file but with a video track cut to a region of the size 320x240px positioned at 160x120px offset
  • http://www.example.com/example.ogv#id='chapter-1' – addresses the named fragment called “chapter-1” which is specified through some mechanism, e.g. Kate or CMML in Ogg

Note that the latter example works only if the encapsulation format provides a means of specifying a name for a fragment. Such a means is e.g. available in QuickTime through chapter tracks, or in Flash through cuepoints.

We know from our experience with Ogg that temporal fragmentation can be realized. For track addressing it is possible to use the recently developed ROE specification. The id tags used there could be included into Skeleton and then be used to address tracks by name. What concerns spatial fragmentation on Ogg Theora – I don’t think it can be achieved for an arbitrary rectangular selection without transcoding.

The next tasks of the Working Group are in creating implementations for these specifications on diverse formats and thus finding out which processes work the best.

Progress on captions for HTML5 video

Paul Rouget this week published another example implementation for using srt with HTML5 video with a javascript library. This is at least the fourth javascript implementation that I know of for attaching srt subtitles to the video element.

It is great to see such a huge need for this. At the same time I am also worried about the amount of incompatible implementations of this feature. It will inhibit search engines from realising which text relates to and describes a particular video. It will also inhibit accessibility technology such as screen readers or braille devices from realising there is text that would be necessary to be rendered.

A standard means of associating srt (or other format) subtitle files with the video tag is really necessary. So, where are we at with this?

Recently, Greg Millam from Google posted a proposal to WHATWG, that shares a lot of elements with the proposal that has been previously discussed between Mozilla, Xiph, and Opera, the current state of which is summarised in the Mozilla wiki. No implementation into a Browser has been made yet, but initial implementations in javascript exist. I think that we will ultimately come out with a harmonised solution between the browser vendors. It just needs implementation work and continuous improvement.

At the same time, in-band captions that come multiplexed within the Ogg file are also being progressed. At Xiph we are now focusing on using Ogg Kate for these purposes – it really don’t make much sense to invent another codec when Ogg Kate is already so close to solving most problems. So, between the developer of Ogg Kate and myself, we are preparing a Google Summer of Code project that should see a implementation for Firefox 3.1 that is capable of extracting the text from an Ogg file that has a Kate track and displaying that track as though it was a srt file. If you are interested, shoot me an email!

UPDATE: Firefox 3.1 is apparently now called Firefox 3.5 – sorry guys. 🙂

ANOTHER UPDATE: My post seemed to imply that Firefox 3.5 will have Ogg Kate support. This is not the case. There is a patch for Firefox and liboggplay to provide Ogg Kate support into Firefox and this patch will be the basis of the Summer of Code project. The student will then work mostly on implementing a comprehensive javascript library to display Ogg Kate encoded time-aligned text (read: captions, Karaoke etc) in the Web browser. This is a proof-of-concept and a first step towards standardising the handling of time-aligned text in Web browsers that suppor the HTML5 video tag.

LCA 2009 talk on video accessibility

During the LCA 2009 Multimedia Miniconf, I gave a talk on video accessibility. Videos have been recorded, but haven’t been published yet. But here are the talk slides:

Lca2009 Video A11y

View more presentations or upload your own. (tags: deaf captions)

I basically gave a very brief summary of my analysis of the state of video accessibility online and what should be done. More information can be found on the Mozilla wiki.

The recommendation is to support the most basic and possibly the most widely online used of all subtitle/captioning formats first: srt. This will help us explore how to relate to out-of-band subtitles for a HTML5 video tag – a proposal of which has been made to the WHATWG and is presented in the slides. It will also help us create Ogg files with embedded subtitles – a means of encapsulation has been proposed in the Xiph wiki.

Once we have experience with these, we should move to a richer format that will also allow the creation of other time-aligned text formats, such as ticker text, annotations, karaoke, or lyrics.

Further, there is non-text accessibility data for videos, e.g. sign language recordings or audio annotations. These can also be multiplexed into Ogg through creating secondary video and audio tracks.

Overall, we aim to handle all such accessibility data in a standard way in the Web browser to achieve a uniform experience with text for video and a uniform approach to automating the handling of text for video. The aim is:

  • to have a default styling of time-aligned text categories,
  • to allow styling override of time-aligned through CSS,
  • to allow the author of a Web page with video to serve a multitude of time-aligned text categories and turn on ones of his/her choice,
  • to automatically use the default language and accessibility settings of a Web browser to request appropriate time-aligned text tracks,
  • to allow the consumer of a Web page with video to manually select time-aligned text tracks of his/her choice, and
  • to do all of this in the same way for out-of-band and in-line time-aligned text.

At the moment, none of this is properly implemented. But we are working on a liboggtext library and are furher discussing how to include out-of-band text with the video in the Webpage – e.g. should it go into the Webpage DOM or in a separate browsing context.

If you feel strongly about video a11y, get involved at http://lists.xiph.org/mailman/listinfo/accessibility.

Embedding time-aligned text into Ogg

As part of my accessibility work for Mozilla and Xiph, it is necessary to define how time-aligned text such as subtitles, captions, or annotations, are encapsulated into Ogg. In the fansubber community this is called “hard subtitles” as opposed to “soft subtitles” which are subtitles that stay in a text file and are loaded separately to the video file into a media player and synchronised with the video by the media player. (as per comment below, all text annotations are “soft” – or also “closed”.)

I can hear you ask: so how do I do subtitles/captions with Ogg now? Well, it would have been possible to simply choose one subtitling format and map that into Ogg, then ask everyone to just use that one format and be done. But which one to choose? And why prefer a simpler one over a more complex one? And why just do subtitles and not any other time-aligned text?

So, instead, I analysed what types of time-aligned text “codecs” I have come across. Each one would have a multitude of text formats to capture the text data, because it is easy to invent a new format and standardisation hasn’t really happened in this space yet.

I have come up with the following list of typical time-aligned text codecs:

  • CC: closed captions (for the deaf)
  • SUB: subtitles
  • TAD: textual audio descriptions (for the blind – to be transferred to braille or TTS)
  • KTV: karaoke
  • TIK: ticker text
  • AR: active regions
  • NB: metadata & semantic annotations
  • TRX: transcripts / scripts
  • LRC: lyrics
  • LIN: linguistic markup
  • CUE: cue points, DVD style chapter markers and similar navigational landmarks

Let me know if you can think of any other classes of video/audio-related time-aligned text.

All of these texts can be represented in text files with some kind of time marker, and possibly some header information to set up the interpretation environment. So, the simplest way of creating a representation of these inside Ogg was to define a generic mapping for time-aligned text into Ogg.

The Xiph wiki holds the current draft specification for mapping text codecs into Ogg. For anyone wanting to map a text codec into Ogg, this should provide the framework. The idea is to separate the text codec’s data into header data and into timed text segments (which can have all sorts of styling and other information with it). Then, the mapping is simple. An example for srt is described on the wiki page.

The specification is still in draft status, because we’re still expecting feedback. In fact, what we now need is people trying an implementation and providing fixes to the specification.

To map your text codec of choice into Ogg, you will probably requrie further mapping specifications. Dependent on how complex your text codec of choice is, these additional mapping specifications may be rather simple or quite complicated. In the case of srt, it should be trivial. Considering the massive amount of srt already freely available online, the srt mapping may well have a really large impact. Enough hits. Let me know if you’re coding up something!

My next duty is to look for a representation that is generic enough to provide representations for any of the above listed text codecs. This representation is what will need to be available to a Web Browser when working with a Web video that has related text. Current contenders are OggKate and W3C TimedText, but I am not sure if either are too restrictive. I am indeed looking for the next generation of captioning technology that will be able to provide any type of time-aligned text that relates to audio/video.