All posts by silvia

Video accessibility for the HTML5 video tag

This is a submission to the W4A 2009 Web accessibility conference (http://www.w4a.info/ ). In the video, we explain the current status of video accessibility on the Web and means forward for HTML5. We propose a solution for associating textual captions with video and explain it on the example of Ogg Kate, SRT and DFXP. We then explain further challenges such as Sign Language, Audio Annotations, and more general types of time-aligned text, e.g. Karaoke, music lyrics, ticker-text, transcripts, or annotations with hyperlinks.

Category: 2
Uploaded by: Silvia Pfeiffer
Hosted: youtube

Video accessibility for the HTML5 video tag

Higher quality copy: http://www.youtube.com/watch?v=qpbtpeofN3c This is a submission to the W4A 2009 Web accessibility conference (http://www.w4a.info/ ). In the video, we explain the current status of video accessibility on the Web and means forward for HTML5. We propose a solution for associating textual captions with video and explain it on the example of Ogg Kate, SRT and DFXP. We then explain further challenges such as Sign Language, Audio Annotations, and more general types of time-aligned text, e.g. Karaoke, music lyrics, ticker-text, transcripts, or annotations with hyperlinks.

Category: 2
Uploaded by: Silvia Pfeiffer
Hosted: youtube

Beyond HTML5 Video

Short 2:30min long video presented as a lightning talk at TPAC 2008 to make a case for making more out of the HTML5 video element. It demonstrates Metavid and its wiki-style editing functionality of transcripts. It further shows the use of the transcript to navigate and search in long-form video. Exposing the structure and content of video to the User Agent and the Server enables video accessibility, content adaptation, and deep search. The video also indicates links to the W3C Media Fragments working group, the W3C Timed Text working group, and the W3C Media Annotations working group.

Category: 2
Uploaded by: Silvia Pfeiffer
Hosted: youtube

An unintentional experiment in social networking

Today I logged back into a Bebo account that I hadn’t used in ages. The first thing it asked me to do was to add friends – and like so many other social networks it does so by logging into one of your email accounts and finding email addresses. Since their terms and conditions say they won’t store your gmail password and just retrieve email addresses, I tend to use this service to add a few select friends.

But Bebo is really f….d up in this respect. Parden my words, but today I have spammed each and every one of my contacts and all the mailing lists I am subscribed to, I have been blocked from multiple mailing lists and my email has probably been marked as spammer by many, so that I may ultimately have to change my mailing account.

It was unintentional and I apologise to all of you who had to suffer.

I usually regard myself as fairly intelligent, but Bebo’s system totally tricked me. So, when you sign up with Bebo, make sure not to fall into the same trap. It looks like this is a pretty common complaint. Here are links to at least three others that had the same issues:

If you look at the second post, it has screenshots. My screen was indeed slightly different to that. I had three sections with contacts – one already on Bebo out of which I selected a hand-ful, ones suggested as contacts through contacts which I completely unselected, and a third one with all my email addresses. Each of the sections had a submit button. Naturally assuming that only the ticks in the particular section would be submitted with the button underneath that section, I clicked ok underneath the hand-ful of selected contact. Half an hour later I am now totally shattered and don’t know how to apologize.

The only interesting effect out of this faux pas is that within the first I hour I made 22 friends! An amazingly fast way to get to a lot of friends in a social network. Unfortunately, Bebo has broken my trust and made it to the bottom of my social networks, so if you’d like to be in contact with me, try me on LinkedIn or Facebook.

What is the raw format of time-aligned text?

My grant with Mozilla on exploring the state and possible ways forward for video accessibility on the Web is over. I have posted a detailed report in the Mozilla wiki, which you should read if you care about the details. It has been a very instructive time of analysis and I have learnt a lot about the needs and requirements for time-aligned text.

For example, I learnt that for many deaf people, our spoken language is just their second language while their first language is actually sign language, thus making it very important to allow for picture-in-picture display of sign-language video tracks in Web video.

I also learnt about more technical challenges, e.g. how difficult it may be to simply map the content of a linked Web resource into a current Web page when one cannot be certain about the security implications of this mapping. This will be of importance as we synchronise out-of-band time-aligned text (e.g. a subtitle file on a server) to a video file that is included in a Web page through a HTML5 <video> tag.

There are two large work items that need to be undertaken next in my opinion.

Firstly we have to create a collection of example files that explain the different categories of time-aligned text that were identified and their specific requirements. For example, the requirements of simple subtitle files are clear. But what about karaoke files? Or ticker-text? We need pseudo-code example files that explain the details of what people may want to display with such files.

I regard the DFXP test suite as one source of such files – the guys at the W3C TimedText working group have done a tremendous job at collecting the types of timed text documents that they want to be able to display.

Another source will be the examples directory in libkate, which implements OggKate, a format that I can envisage as the default encoding format for time-aligned text inside Ogg, because of its existing extensive code base and the principles with which OggKate was developed.

The second work item is more challenging and more fundamental to time-aligned text in Web browsers. We have to create a specification of how to represent time-aligned text inside Web browsers – basically the DOM and the API, but also what processing needs to be done to get the text there. I have a proposal on using a <text> element inside the <video> element to reference out-of-band time-aligned text resources. However, the question is what to do with them then.

The more I thought about this, the more the question is reduced to finding the “raw format” of time-aligned text: When a Web browser decodes a time-aligend text file, what is its internal representation of it, its “raw” state. This will map to HTML, CSS, javascript, and other existing Web technology. But what is this minimal, “raw” representation? Text and graphics with positioning information, style information, timing information, state information, and potentially hyperlinks? is that all?

These are the questions that I think need to be explored next.

In parallel we should start with an implementation of support for the simplest type of time-aligned text: plain SRT. The raw format for this is simple: just a series of text with start and end times. Even though this is simple, it has no straightforward mapping into HTML since HTML does not understand time, so it can only be dealt with in javascript or through SVG. It may be time to include a simple concept of time into HTML. Let’s just avoid making it as complex as HTML+Time!

A basic support of SRT in Firefox would create a first step towards a framework for more complicated time-aligned text. It would also create access to video for a large number of hearing-impaired and non-native viewers and get us a large step towards comprehensive media accessibility on the Web. I hope we can address this in 2009.

The argument for Xiph codecs

Yesterday I had a random technology developer email me with the question why he should use Ogg over other codecs that have a much more widespread uptake. Of course with “Ogg” he meant “Xiph codecs”, since a comparison of container formats isn’t really what people are asking for. He felt positive towards open codecs, but didn’t really know how to express this with reason. So I wrote up some arguments that can be made for open codecs.

First of all the royalty-free character of Xiph technology makes it possible for them to be used for any application without having to consider what impact the use of the codec has on ROI and scalability of business models. It is important to have a video and audio codec available that you can just use for exchanging audio and video data
just like you can exchange text – nobody would consider paying a license for ASCII either.

Second the flexibility that you get with Xiph is important for developing new applications. One example is the development of a scheme for encrypting audio or video for DRM and then transport it inside Ogg. Since everything around Ogg is open, one can just go ahead and implement this, even if the Xiph community is not interested in such technology.

Third let’s talk about quality. Ogg Vorbis is an audio codec that is of higher quality at comparable compression rates than MP3. Ogg Theora compares well to H.261 and also to MPEG-2 video. To achieve high-quality video such as in H.264, you will need to move into Dirac territory. And yet, Xiph has more to offer: for VoIP you can currently use the highly competitive Speex codec. And a new, hybrid speech/audio codec of low delay and high compression rate with very low quality loss is CELT, the new codec developed by the author of Speex. CELT has no comparison in proprietary codecs. All of the software is available for free and in source code from svn.xiph.org and the authors are easily reachable for discussions. Should there be need for improvement, everyone has the opportunity to develop such.

Lastly, I’d like to look at the capabilities of Ogg based technology in a Web environment. Over the last years we have developed technology that is now being included by the W3C into future Web standards. This includes URL addressing to time offsets into videos, which can ultimately help develop e.g. a Web-based video editor. This includes the implementation of Ogg Theora/Vorbis as baseline video codec in Firefox for the new HTML5 video element. This includes technology for making audio and video accessible, in particular to Web search engines through deeply searchable content, but also to hearing- and visually impaired. All the base techology and specifications are available.

“Ogg” is totally the future of media technology, because the future has to be open and royalty-free to allow everybody on this planet equal rights and possibilities to participate in a media-centric Internet, and because anything else will continue to be a burden on innovation.

FOMS 2009 Awesomeness

I am a slacker, I know – sorry. FOMS happened almost 4 weeks ago and I have neither blogged about it nor uploaded the videos.

So, you will have to take my word for it for the moment: it was a totally awesome and effective workshop that led to a lot of work being started during LCA and having an impact far beyond FOMS.

Every year, the discussions we are having at FOMS are captured in so-called community goals. These are activities that we see as top priorities for open media software to be addressed to improve its use and uptake.

You can read up on our 2009 community goals here in detail. They fall into the following 10 sections:

  1. Patent and legal issues around codecs
  2. Ogg in Firefox: liboggplay
  3. Authoring tools for open media codecs
  4. Server Technology for open media
  5. Time-aligned text and accessibility challenges
  6. FFmpeg challenges
  7. GStreamer challenges
  8. Dirac challenges
  9. Jack challenges
  10. OpenMAX challenges

In this post, I’d just like to point out some cool activities that have already emerged since FOMS.

I’ve already written on the patents issue and how OpenMediaNow will hopefully be able to make a difference here.

Liboggplay provides a simple API to decoding and playback of Ogg codecs and is therefore in use for baseline Ogg Theora support in Firefox 3.1. A bunch of bugs were found around it and the opportunity of having Shane Stephens, its original developer, together with Viktor Gal, its new maintainer, in the same room made for a whole lot of bug fixes. The $100K Mozilla grant towards the work of Xiph developers that was announced at FOMS will further help to mature this and other Xiph software. Conrad Parker, Viktor Gal, and Timothy Terriberry, the Xiph developers that will cut code under this grant, were incidentally all present at FOMS.

The discussion about the need for authoring software support for open media codecs is always a difficult one. We all know that it is important to have usable and graphically attractive authoring tools in order to get adoption. However, looking at reality, it is really difficult to design and implement a GUI authoring tool such as a video editor to a competitive quality. In other areas, it has also taken quite some time to gain good authoring software such as e.g. the Gimp or Inkscape. Plus there is the additional need to make it cross-platform. With video, often the underlying editing functionality is missing from media frameworks. Ed Hervey explained how he extended gstreamer with the required subroutines and included them into the gstreamer python plugin, so now he will be able to focus on user interface work in PiTiVi rather than the underlying video editing functionality.

The authoring discussion smoothly led over to the server technology discussion. Robin Garvin explained how he implemented a server-side video editor through EDLs. Michael Dale showed us the latest version of his video editor in the Mediawiki Metavid plugin. And Jan Gerber showed us the Firefogg Firefox plugin for transcoding to Ogg. Web-based tools are certainly the future of video authoring and will make a huge difference in favor of Ogg.

Then there was the accessibility discussions. During FOMS I was in the process of writing up my final report on the Mozilla video accessibility project and it was really important to get input from the FOMS community – in particular from Charles McCathyNevile from Opera, Michael Dale from Metavid/Wikipedia/Archive.org and Jan Gerber. In the end we basically agreed that a lot of work still needs to be done and that a standard way of providing srt support into HTML5 through Ogg, but also out-of-band will be a great step forward, though by far not the final one.

The remaining topics were focused discussions on how to improve support, uptake or functionality of specific tools. Peter Ross took FOMS concerns about ffmpeg to the ffmpeg community and it seems there will be some changes, in particular an upcoming ffmpeg release. Ed Hervey took home a request for new API functions for gstreamer. Anuradha Suraparaju talked with Jan Gerber about support of Dirac in firefogg and with Viktor Gal about support in liboggplay. Further, the idea of libfisheye was born to have a similar abstraction library for Ogg video codecs as libfishsound is for Ogg audio codecs.

As can be seen, there are already some awesome outcomes from FOMS 2009. We are looking forward to a FOMS 2010 in Wellington, New Zealand!

Patents and the bright future of open media codecs

It is clear that there is resistance by established video technology vendors to support open and patent-unencumbered media codecs over and on top of codecs that are either proprietary or are covered by a registered patent portfolio, in particular since Nokia’s attack of Ogg Theora in December 2007. The threat that is repeatedly expressed by corporates like Apple and Opera is of so-called “submarine patents”.

The 2007 Alcatel-Lucent vs. Microsoft case shows that even so-called “standard codecs”, i.e. codecs for which the patent portfolio is registered and for which you can buy a license from a consortium, are not free of such submarine patent threats.

Given this situation, the open media community is continuing to demand equal treatment for open codecs, i.e. native support in desktop media applications by vendors. Even a simple things such as making available the XiphQT components on Apple’s Quicktime Components download page would be a big step forward towards treating open codecs equally to proprietary ones.

For Web video applications, the situation becomes even more complicated. Because of their freedom from license fees, Xiph codecs would make for a perfect baseline codecs for the new HTML5 video and audio elements. But because vendors are not willing to support them on the desktop or in their browsers, the WHATWG was forced to take Ogg Theora out of the HTML5 specification. This will ultimately create many headaches for Web developers – but it will save vendors’ investment in proprietary codec technology and continue to provide a market place for a large number of media utility software whose only reason for existance is to address the complexities created by a lack of standardisation. It’s a great inhibitior of simplification in the media space and therefore an inhibitor of innovation.

Given this rather depressing situation, it is not surprising that patents have been a major topic at every FOMS workshop (2007, 2008, 2009). They stop the rare set of open media software programmers from achieving success in many different ways.

OpenMediaNow is a new initiative. Rob Savoye from OpenMediaNow attended FOMS in 2009 and explained where he wants to take it. It is best said in his own words, so here is a copy of an email that he sent me.

On Thu, Jan 22, 2009 at 11:05 PM, Rob Savoye wrote:
> Here’s a few quick notes, since I figure you need this in the morning:
>
> * Build a freely accessible database of prior art involving multimedia patents
> * Research prior-art for Ogg and Theora to ensure that these codecs remain free.
> * Build an international community of legal volunteers who can contribute research on
> international patents.
> * Work on negotiating royalty-free redistribution terms for FOSS projects
> * Work on finding the legal ways FOSS can deal with codec patents
> * If need be, craft legal workarounds for the codec patents to allow them to be
> freely redistributable
>
> Any money we raise will initially go to getting the database, forums, etc.. set up.
> After support that, we plan to hire a para-legal or two to work on the actual research.
> More than that, we’d add an engineer experienced in codecs to work with the legal
> folks to define ways FOSS can legally support proprietary codecs.
>
> – rob –

Rob is an amazing free software hacker and seems to have been around forever, so he has seen and participated in a lot of successful fights in the free world. Rob has the right connections to actually achieve the goals he has set for OpenMediaNow. He is a key member of the Free Software Foundation, which in itself has a great community to get behind these goals. He also has great connections into the law community, in particular with the EFF and groklaw. If ever I have met anyone capable of fighting and surviving this dispute, while also successfully achieving its goals, it would be Rob.

The first thing that Rob needs to create is a Website – probably in collaboration with PJ from groklaw – through which the patent research work can be undertaken. For this, he has estimated that he will require $10K. In the currently desperate global economic situation, Rob has lost some of his key financial contributors to the project.

In the spirit of “every little bit helps” and “if we give Rob moral support, he may find further financial support elsewhere”, FOMS decided to donate $1,000 towards the OpenMediaNow effort. If you consider that $1K is about 15% of the FOMS budget, it is actually quite a large contribution and it is going towards the right activity. The donation was announced on the last day at LCA with this presentation.

If you would like to also donate to the OpenMediaNow effort – be that with your skills or your money – please write to Rob (rob@www.omnow.org) (sorry for the spamming, Rob) or donate here. It may take years to address all the existing patents in the video space, unvalue some and tone down others, so don’t despair but keep supporting.