All posts by silvia

$100K towards Xiph developers

Today, Wikimedia and Mozilla announced a grant provided by the Mozilla Corporation towards maturing the support of Ogg in the Firefox Web browser. I’m happy to have helped in making the proposal become concrete and now we have the following three Xiph developers working on it:

  • Viktor Gal – the maintainer of liboggplay
  • Conrad Parker – the key developer of multiple Ogg support libraries, in particular liboggz
  • Tim Terriberry – the key developer of Ogg Theora

Viktor will work towards stabilising the current Ogg Theora support in Firefox, Conrad will work towards Ogg network seeking, language selection and improved library support, and Tim will include the new Thusnelda Theora encoder improvements into Theora mainstream.

Looking forward to awesome Firefox video technology!

UPDATE – Other posts on this topic:

LCA 2009 talk on video accessibility

During the LCA 2009 Multimedia Miniconf, I gave a talk on video accessibility. Videos have been recorded, but haven’t been published yet. But here are the talk slides:

Lca2009 Video A11y

View more presentations or upload your own. (tags: deaf captions)

I basically gave a very brief summary of my analysis of the state of video accessibility online and what should be done. More information can be found on the Mozilla wiki.

The recommendation is to support the most basic and possibly the most widely online used of all subtitle/captioning formats first: srt. This will help us explore how to relate to out-of-band subtitles for a HTML5 video tag – a proposal of which has been made to the WHATWG and is presented in the slides. It will also help us create Ogg files with embedded subtitles – a means of encapsulation has been proposed in the Xiph wiki.

Once we have experience with these, we should move to a richer format that will also allow the creation of other time-aligned text formats, such as ticker text, annotations, karaoke, or lyrics.

Further, there is non-text accessibility data for videos, e.g. sign language recordings or audio annotations. These can also be multiplexed into Ogg through creating secondary video and audio tracks.

Overall, we aim to handle all such accessibility data in a standard way in the Web browser to achieve a uniform experience with text for video and a uniform approach to automating the handling of text for video. The aim is:

  • to have a default styling of time-aligned text categories,
  • to allow styling override of time-aligned through CSS,
  • to allow the author of a Web page with video to serve a multitude of time-aligned text categories and turn on ones of his/her choice,
  • to automatically use the default language and accessibility settings of a Web browser to request appropriate time-aligned text tracks,
  • to allow the consumer of a Web page with video to manually select time-aligned text tracks of his/her choice, and
  • to do all of this in the same way for out-of-band and in-line time-aligned text.

At the moment, none of this is properly implemented. But we are working on a liboggtext library and are furher discussing how to include out-of-band text with the video in the Webpage – e.g. should it go into the Webpage DOM or in a separate browsing context.

If you feel strongly about video a11y, get involved at http://lists.xiph.org/mailman/listinfo/accessibility.

Top 10 commercials for 2008 on YouTube

I spent the last few days doing some nice research for Vquence, where I was able to watch lots of videos on YouTube. Fun job this is! 🙂 The full article is on the Vquence metrics blog.

One of the key things that I’ve put together is a list of top 10 commercials for 2008:

Rank Video Views Added
1 Pepsi – SoBe Lifewater Super Bowl 2008 3,652,217 February 02, 2008
2 Cadbury – Gorilla 3,338,011 August 31, 2007
3 Nike – Take it to the NEXT LEVEL 3,184,329 April 28, 2008
4 Macbook Air 2,648,717 January 15, 2008
5 Centraal Beheer Insurance – Gay Adam 2,512,425 May 30, 2008
6 Vodafone – Beatbox 2,380,237 March 17, 2008
7 E*Trade – Trading Baby 2,061,818 February 01, 2008
8 Guitar Hero – Heidi Klum 1,068,055 November 03, 2008
9 Bridgestone – Scream 980,406 January 30, 2008
10 Bud Light- Will Ferrell 966,177 February 04, 2008
Favorable mention OLPC – John Lennon 527,953 December 25, 2008
Favorable mention Blendtec – iPhone 3G 2,711,195 July 11, 2008
Favorable mention Stide Gum – Where the hell is Matt? 15,859,204 June 20, 2008

There are many more details over at vquence.com.

Enjoy! And let me know in the comments if you know of any other video ad released in 2008 in the same ballpark number of views that is an actual tv-style commercial.

NOTE: I just had to change the list, because the SoBe Lifewater Super Bowl ad of 2008 actually came out ahead. It’s difficult to discover an ad that has neither ad nor commercial in its annotations!

OSDC 2008 talks

The “Open Source Developer Conference” 2008 took place in Sydney between 2nd-5th December. I gave two talks at it:

As requested by the organisers, I just uploaded the slides to Slideshare, which incidentally can now also synchronise audio recordings of your talk to your slides. Here are my slides – even if they don’t actually give you much without the demo:

I had lots of fun giving the talks. The “YouTube” one talks about the Fedora Commons document repository and how we turned it into a video transcoding, keyframing, publication and sharing system. The one on Metavidwiki shows off the Annodex-technology using video wiki that is in use by Wikipedia. Most certainly, I also mentioned that open source CMS systems now have video extensions. However, they are not video-centric sites in general.

Of all the open source Web video technology, I find Fedora Commons and MetaVidWiki the most exciting ones. The former is exciting for its ability to archive and publish video and their metadata in a way that integrates with document management. The latter is even more exciting for using Ogg and the open Annodex technologies to create a completely open source system using open codecs, and for being the world’s second video wiki (just after CMMLwiki), but the first one to achieve wide uptake.

Attaching subtitles to HTML5 video

During the last week, I made a proposal to the HTML5 working group about how to support out-of-band time-aligned text in HTML5. What I mean by that is basically: how to link a subtitle file to a video tag in HTML5. This would mirror the way in which in desktop-players you can load separate subtitle files by hand to go alongside a video.

My suggestion is best explained by an example:


<video src="http://example.com/video.ogv" controls>
<text category="CC" lang="en" type="text/x-srt" src="caption.srt"></text>
<text category="SUB" lang="de" type="application/ttaf+xml" src="german.dfxp"></text>
<text category="SUB" lang="jp" type="application/smil" src="japanese.smil"></text>
<text category="SUB" lang="fr" type="text/x-srt" src="translation_webservice/fr/caption.srt"></text>
</video>

  • “text” elements are subelements of the “video” element and therefore clearly related to one video (even if it comes in different formats).
  • the “category” tag allows us to specify what text category we are dealing with and allows the web browser to determine how to display it. The idea is that there would be default display for the different categories and css would allow to override these.
  • the “lang” tag allows the specification of alternative resources based on language, which allows the browser to select one by default based on browser preferences, and also to turn those tracks on by default that a particular user requires (e.g. because they are blind and have preset the browser accordingly).
  • the “type” tag allows specification of what actual time-aligned text format is being used in this instance; again, it will allow the browser to determine whether it is able to decode the file and thus make it available through an interface or not.
  • the “src” attribute obviously points to the time-aligned text resource. This could be a file, a script that extracts data from a database, or even a web service that dynamically creates the data
    based on some input.

This proposal provides for a lot of flexibility and is somewhat independent of the media file format, while still enabling the Web browser to deal with the text (as long as it can decode it). Also note that this is not meant as the only way in which time-aligned text would be delivered to the Web browser – we are continuing to investigate how to embed text inside Ogg as a more persistent means of keeping your text with your media.

Of course you are now aching to see this in action – and this is where the awesomeness starts. There are already three implementations.

First, Jan Gerber independently thought out a way to provide support for srt files that would be conformant with the existing HTML5 tags. His solution is at http://v2v.cc/~j/jquery.srt/. He is using javascript to load and parse the srt file and map it into HTML and thus onto the screen. Jan’s syntax looks like this:


<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="jquery.srt.js"></script>

<video src="http://example.com/video.ogv" id="video" controls>
<div class="srt"
data-video="video"
data-srt="http://example.com/video.srt" />

Then, Michael Dale decided to use my suggested HTML5 syntax and add it to mv_embed. The example can be seen here – it’s the bottom of the two videos. You will need to click on the “CC” button on the player and click on “select transcripts” to see the different subtitles in English and Spanish. If you click onto a text element, the video will play from that offset. Michael’s syntax looks like this:


<video src="sample_fish.ogg" poster="sample_fish.jpg" duration="26">
<text category="SUB" lang="en" type="text/x-srt" default="true"
title="english SRT subtitles" src="sample_fish_text_en.srt">
</text>
<text category="SUB" lang="es" type="text/x-srt"
title="spanish SRT subtitles" src="sample_fish_text_es.srt">
</text>
</video>

Then, after a little conversation with the W3C Timed Text working group, Philippe Le Hegaret extended the current DFXP test suite to demonstrate use of the proposed syntax with DFXP and Ogg video inside the browser. To see the result, you’ll need Firefox 3.1. If you select the “HTML5 DFXP player prototype” as test player, you can click on the tests on the left and it will load the DFXP content. Philippe actually adapted Jan’s javascript file for this. And his syntax looks like this:


<video src="example.ogv" id="video" controls>
<text lang='en' type="application/ttaf+xml" src="testsuite/Content/Br001.xml"></text>
</video>

The cool thing about these implementations is that they all work by mapping the time-aligned text to HTML – and for DFXP the styling attributes are mapped to CSS. In this way, the data can be made part of the browser window and displayed through traditional means.

For time-aligned text that is multiplexed into a media file, we just have to do the same and we will be able to achieve the same functionality. Video accessibility in HTML5 – we’re getting there!

Whatever happened to the Innovation Review?

In September, as reported, Terry Cutler delivered an extensive review on the Australian innovation system and what should be done. But then, the worldwide economic downturn hit us, and nobody seems to think about creating incentives for Australian innovation any longer.

Where did it go? Nothing is happening on the innovation website.

In particular as a small business owner, I ask: what happened to incentives to innovate for SMEs? Before the review, the government as a precaution scrapped the existing Australian grant system for innovation – in particular the Commercial Ready grants. The idea was to have some leeway with building a new system. That seemed fair enough at the time – but it seems more and more that there will be no replacement for this scrapped funding. Is the government simply forgetting what they promised in view of other, more pressing needs? Do we need to remind them?

Terry Cutler in a recent interview with SmartCompany paints a dark picture for the Australian future. He is aware that commercial innovation originates from SMEs and is fearful that by dropping all support for SMEs, the whole SME technology sector will be wiped out, with a negative influence also on larger companies and Australia will fall so far behind in innovation that it may be hard to ever catch up (remember what we did with our advantage in creating computer hardware at the time of CSIRAC?).

The Commercial Ready grants and other innovation funding were abruptly scrapped in May 2008. It is now more than 6 months later and we haven’t seen an indication of anything new to replace them. In the midst of the global crisis, how can we make this important cause heard?

Embedding time-aligned text into Ogg

As part of my accessibility work for Mozilla and Xiph, it is necessary to define how time-aligned text such as subtitles, captions, or annotations, are encapsulated into Ogg. In the fansubber community this is called “hard subtitles” as opposed to “soft subtitles” which are subtitles that stay in a text file and are loaded separately to the video file into a media player and synchronised with the video by the media player. (as per comment below, all text annotations are “soft” – or also “closed”.)

I can hear you ask: so how do I do subtitles/captions with Ogg now? Well, it would have been possible to simply choose one subtitling format and map that into Ogg, then ask everyone to just use that one format and be done. But which one to choose? And why prefer a simpler one over a more complex one? And why just do subtitles and not any other time-aligned text?

So, instead, I analysed what types of time-aligned text “codecs” I have come across. Each one would have a multitude of text formats to capture the text data, because it is easy to invent a new format and standardisation hasn’t really happened in this space yet.

I have come up with the following list of typical time-aligned text codecs:

  • CC: closed captions (for the deaf)
  • SUB: subtitles
  • TAD: textual audio descriptions (for the blind – to be transferred to braille or TTS)
  • KTV: karaoke
  • TIK: ticker text
  • AR: active regions
  • NB: metadata & semantic annotations
  • TRX: transcripts / scripts
  • LRC: lyrics
  • LIN: linguistic markup
  • CUE: cue points, DVD style chapter markers and similar navigational landmarks

Let me know if you can think of any other classes of video/audio-related time-aligned text.

All of these texts can be represented in text files with some kind of time marker, and possibly some header information to set up the interpretation environment. So, the simplest way of creating a representation of these inside Ogg was to define a generic mapping for time-aligned text into Ogg.

The Xiph wiki holds the current draft specification for mapping text codecs into Ogg. For anyone wanting to map a text codec into Ogg, this should provide the framework. The idea is to separate the text codec’s data into header data and into timed text segments (which can have all sorts of styling and other information with it). Then, the mapping is simple. An example for srt is described on the wiki page.

The specification is still in draft status, because we’re still expecting feedback. In fact, what we now need is people trying an implementation and providing fixes to the specification.

To map your text codec of choice into Ogg, you will probably requrie further mapping specifications. Dependent on how complex your text codec of choice is, these additional mapping specifications may be rather simple or quite complicated. In the case of srt, it should be trivial. Considering the massive amount of srt already freely available online, the srt mapping may well have a really large impact. Enough hits. Let me know if you’re coding up something!

My next duty is to look for a representation that is generic enough to provide representations for any of the above listed text codecs. This representation is what will need to be available to a Web Browser when working with a Web video that has related text. Current contenders are OggKate and W3C TimedText, but I am not sure if either are too restrictive. I am indeed looking for the next generation of captioning technology that will be able to provide any type of time-aligned text that relates to audio/video.