This is a screencast of a first demo implementation of caption/subtitle support in Firefox where captions are stored in a Kate stream in Ogg video. This is a demo for HTML5 media accessibility.
I am sure Google is thinking about it. But … what does “it” mean?
Simply open sourcing it and making it available under a free license doesn’t help. That just provides open source code for a codec where relevant patents are held by a commercial entity and any other entity using it would still need to be afraid of using that technology, even if it’s use is free.
So, Google has to make the patents that relate to VP8 available under an irrevocable, royalty-free license for the VP8 open source base, but also for any independent implementations of VP8. This at least guarantees to any commercial entity that Google will not pursue them over VP8 related patents.
Now, this doesn’t mean that there are no submarine or unknown patents that VP8 infringes on. So, Google needs to also undertake an intensive patent search on VP8 to be able to at least convince themselves that their technology is not infringing on anyone else’s. For others to gain that confidence, Google would then further have to indemnify anyone who is making use of VP8 for any potential patent infringement.
I believe – from what I have seen in the discussions at the W3C – it would only be that last step that will make companies such as Apple have the confidence to adopt a “free” codec.
An alternative to providing indemnification is the standardisation of VP8 through an accepted video standardisation body. That would probably need to be ISO/MPEG or SMPTE, because that’s where other video standards have emerged and there are a sufficient number of video codec patent holders involved that a royalty-free publication of the standard will hold a sufficient number of patent holders “under control”. However, such a standardisation process takes a long time. For HTML5, it may be too late.
Also, let’s not forget that VP8 is just a video codec. A video codec alone does not encode a video. There is a need for an audio codec and a encapsulation format. In the interest of staying all open, Google would need to pick Vorbis as the audio codec to go with VP8. Then there would be the need to put Vorbis and VP8 in a container together – this could be Ogg or MPEG or QuickTime’s MOOV. So, apart from all the legal challenges, there are also technology challenges that need to be mastered.
It’s not simple to introduce a “free codec” and it will take time!
Google and Theora
There is actually something that Google should do before they start on the path of making VP8 available “for free”: They should formulate a new license agreement with Xiph (and the world) over VP3 and Theora. Right now, the existing license that was provided by On2 Technologies to Theora (link is to an early version of On2’s open source license of VP3) was only for the codebase of VP3 and any modifications of it, but doesn’t in an obvious way apply to an independent re-implementations of VP3/Theora. The new agreement between Google and Xiph should be about the patents and not about the source code. (UPDATE: The actual agreement with Xiph apparently also covers re-implementations – see comments below.)
That would put Theora in a better position to be universally acceptable as a baseline codec for HTML5. It would allow, e.g. Apple to make their own implementation of Theora – which is probably what they would want for ipods and iphones. Since Firefox, Chrome, and Opera already support Ogg Theora in their browsers using the on2 licensed codebase, they must have decided that the risk of submarine patents is low. So, presumably, Apple can come to the same conclusion.
Free codecs roadmap
I see this as the easiest path towards getting a universally acceptable free codec. Over time then, as VP8 develops into a free codec, it could become the successor of Theora on a path to higher quality video. And later still, when the Internet will handle large resolution video, we can move on to the BBC’s Dirac/VC2 codec. It’s where the future is. The present is more likely here and now in Theora.
Please note the comments from Monty from Xiph and from Dan, ex-On2, about the intent that VP3 was to be completely put into the hands of the community. Also, Monty notes that in order to implement VP3, you do not actually need any On2 patents. So, there is probably not a need for Google to refresh that commitment. Though it might be good to reconfirm that commitment.
ADDITION 10th April 2010:
Today, it was announced that Google put their weight behind the Theorarm implementation by helping to make it BSD and thus enabling it to be merged with Theora trunk. They also confirm on their blog post that Theora is “really, honestly, genuinely, 100% free”. Even though this is not a legal statement, it is good that Google has confirmed this.
At the recent FOMS/LCA in Wellington, New Zealand, we talked a lot about how Ogg could support accessibility. Technically, this means support for multiple text tracks (subtitles/captions), multiple audio tracks (audio descriptions parallel to main audio track), and multiple video tracks (sign language video parallel to main video track).
Creating multitrack Ogg files
The creation of multitrack Ogg files is already possible using one of the muxing applications, e.g. oggz-merge. For example, I have my own little collection of multitrack Ogg files at http://annodex.net/~silvia/itext/elephants_dream/multitrack/. But then you are stranded with files that no player will play back.
Multitrack Ogg in Players
As Ogg is now being used in multiple Web browsers in the new HTML5 media formats, there are in particular requirements for accessibility support for the hard-of-hearing and vision-impaired. Either multitrack Ogg needs to become more of a common case, or the association of external media files that provide synchronised accessibility data (captions, audio descriptions, sign language) to the main media file needs to become a standard in HTML5.
As it turn out, both these approaches are being considered and worked on in the W3C. Accessibility data that are audio or video tracks will in the near future have to come out of the media resource itself, but captions and other text tracks will also be available from external associated elements.
The availability of internal accessibility tracks in Ogg is a new use case – something Ogg has been ready to do, but has not gone into common usage. MPEG files on the other hand have for a long time been used with internal accessibility tracks and thus frameworks and players are in place to decode such tracks and do something sensible with them. This is not so much the case for Ogg.
For example, a current VLC build installed on Windows will display captions, because Ogg Kate support is activated. A current VLC build on any other platform, however, has Ogg Kate support deactivated in the build, so captions won’t display. This will hopefully change soon, but we have to look also beyond players and into media frameworks – in particular those that are being used by the browser vendors to provide Ogg support.
Multitrack Ogg in Browsers
The current version of Firefox uses liboggplay for Ogg support, but liboggplay’s multitrack support has been sketchy this far. So, Viktor Gal – the liboggplay maintainer – and I sat down at FOMS/LCA to discuss this and Viktor developed some patches to make the demo player in the liboggplay package, the glut-player, support the accessibility use cases.
I applied Viktor’s patch to my local copy of liboggplay and I am very excited to show you the screencast of glut-player playing back a video file with an audio description track and an English caption track all in sync:
There are still important questions open: for example, how will a player know that an audio description track is to be played together with the main audio track, but a dub track (e.g. a German dub for an English video) is to be played as an alternative. Such metadata for the tracks is something that Ogg is still missing, but that Ogg can be extended with fairly easily through the use of the Skeleton track. It is something the Xiph community is now working on.
This is great progress towards accessibility support in Ogg and therefore in Web browsers. And there is more to come soon.
During last week’s LCA, Jan Gerber, Michael Dale and I gave a 3 hour tutorial on how to publish HTML5 video in an open format.
We basically taught people how to create and publish Ogg Theora video in HTML5 Web pages and how to make them work across browsers, including much of the available tools and libraries. We’re hoping that some people will have learnt enough to include modules in CMSes such as Drupal, Joomla and WordPress, which will easily support the publishing of Ogg Theora.
I have been asked to share the material that we used. It consists of:
Note that if you would like to walk through the exercises, you should install the following software beforehand:
The exercises include:
For those that would like to see the slides here immediately, a special flash embed:
Vimeo started last week with a HTML5 beta test. They use the H.264 codec, probably because much of their content is already in this format through the Flash player.
But what really surprised me was their claim that roughly 25% of their users will be able to make use of their HTML5 beta test. The statement is that 25% of their users use Safari, Chrome, or IE with Chrome Frame. I wondered how they got to that number and what that generally means to the amount of support of H.264 vs Ogg Theora on the HTML5-based Web.
According to Statcounter’s browser market share statistics, the percentage of browsers that support HTML5 video is roughly: 31.1%, as summed up from Firefox 3.5+ (22.57%), Chrome 3.0+ (5.21%), and Safari 4.0+ (3.32%) (Opera’s recent release is not represented yet).
Out of those 31.1%,
8.53% browsers support H.264
27.78% browsers support Ogg Theora.
Given these numbers, Vimeo must assume that roughly 16% of their users have Chrome Frame in IE installed. That would be quite a number, but it may well be that their audience is special.
So, how is Ogg Theora support doing in comparison, if we allow such browser plugins to be counted?
With an installation of XiphQT, Safari can be turned into a browser that supports Ogg Theora. The Chome Frame installation will also turn IE into a Ogg Theora supporting browser. These could get the browser support for Ogg Theora up to 45%. Compare this to a claimed 48% of MS Silverlight support.
But we can do even better for Ogg Theora. If we use the Java Cortado player as a fallback inside the video element, we can capture all those users that have Java installed, which could be as high as 90%, taking Ogg Theora support potentially up to 95%, almost up to the claimed 99% of Adobe Flash.
I’m sure all these numbers are disputable, but it’s an interesting experiment with statistics and tells us that right now, Ogg Theora has better browser support than H.264.
UPDATE: I was told this article sounds aggressive. By no means am I trying to be aggressive – I am stating the numbers as they are right now, because there is a lot of confusion in the market. People believe they reach less audience if they publish in Ogg Theora compared to H.264. I am trying to straighten this view.
I am a very happy camper today! Not because of the New Year – well, yes, there are new opportunities and challenges for the New Year. But I’ve just received an email from Philip J
In the year 2000, while working at CSIRO as a research scientist, I had the idea that video (and audio) should be hyperlinked content on the Web just like any Web page. Conrad Parker and I developed the vision of a “Continuous Media Web” and called the technology that was necessary to develop “Annodex” for “annotated and indexed media”.
Not many people now know that this was really the beginning of Ogg on the Web. Until then, Ogg Vorbis and the emerging Ogg Theora were only targeted at desktop applications in competition to MP3 and MPEG-2.
Within a few years, we developed the specifications for a markup language for video called CMML that would provide the annotations, anchor points, and hyperlinks for video to make it possible to search and index video, hyperlink into video section, and hyperlink out of video sections.
We further developed the specification of temporal URIs to actually address to temporal offsets or segments in video.
And finally, we developed extensions to the Xiph Ogg framework to allow it to carry CMML, and more generally multi-track codecs. The resulting files were originally called “Annodex files”, but through increasing collaboration with Xiph, the specifications were simplified and included natively into Ogg and are now known as “Ogg Skeleton”.
Apart from specifications, we also developed lots of software to make the vision actually come true. Conrad, in particular, developed many libraries that helped develop software on top of the raw Xiph codecs, which include liboggz and libfishsound. Libraries were developed to deal with CMML and with embedding CMML into Ogg. Apache modules were developed to deal with segmenting sections from Ogg files and deliver them as a reply to a temporal URI request. And finally we actually developed a Firefox extension that would allow us to display the Ogg Theora/Vorbis videos inside a Web Browser.
Over time, a lot more sofware was developed, amongst them: php, perl and python bindings for Annodex, DirectShow filters to have Ogg Theora/Vorbis support on Windows, an ActiveX control for Windows, an authoring tool for CMML on Windows, Ogg format validation software, mobile phone support for Ogg Theora/Vorbis, and a video wiki for CMML and Ogg Theora called cmmlwiki. Several students and Annodex team members at CSIRO helped develop these, including Andre Pang (who now works for Pixar), Zen Kavanagh (who now works for Microsoft), and Colin Ward (who now works for Symbian). Most of the software was released as open source software by CSIRO and is available now either in the Annodex repository or the Xiph repositories.
Annodex technology became increasingly part of Xiph technology as team members also became increasingly part of the Xiph community, such as by now it’s rather difficult to separate out the Annodex people from the Xiph people.
Over time, other projects picked up on the Annodex technology. The first were in fact ethnographic researchers, who wanted their audio-visual ethnographic recordings usable in deeply. Also, other multimedia scientists experimented with Annodex. The first actual content site to publish a large collection of Ogg Theora video with annotations was OpenRoadTrip by Scott Shawcroft and Brandon Hines in 2006. Soon after, Michael Dale and Aphid from Metavid started really using the Annodex set of technologies and contributing to harden the technology. Michael was also a big advocate for helping Wikimedia and Archive.org move to using Ogg Theora.
By 2006, the team at CSIRO decided that it was necessary to develop a simple, cross-platform Ogg decoding and playback library that would allow easy development of applications that need deep control of Ogg audio and video content. Shane Stephens was the key developer of that. By the time that Chris Double from Firefox picked up liboggplay to include Ogg support into Firefox natively, CSIRO had stopped working on Annodex, Shane had left the project to work for Google on Wave, and we eventually found Viktor Gal as the new maintainer for liboggplay. We also found Cristian Adam as the new maintainer for the DirectShow filters (oggcodecs).
Now that the basic Ogg Theora/Vorbis support for the HTML5 <video> element is starting to be available in all major browsers (well, as soon as an ActiveX control is implemented for IE), we can finally move on to develop the bigger vision. This is why I am an invited expert on the W3C media fragments working group and why I am working with Mozilla on sorting out accessibility for <video>. Accessibility is an inherent part of making video searchable. So, if we can find a way to extend the annotations with hyperlinks, we will also be able to build Webs of videos and completely new experiences on the Web. Think about mashing up simply by creating a list of URLs. Think about tweeting video segments. Think about threaded video email discussions (Shane should totally include that into Google Wave!). And think about all the awesome applications that come to your mind that I haven’t even thought about yet!
I spent this week at the Open Video Conference in New York and was amazed about the 800 and more people that understand the value of open video and the need for open video technologies to allow free innovation and sharing. I can feel that the ball has got rolling – the vision developed almost 10 years ago is starting to take shape. Sometimes, in very very rare moments, you can feel that history has just been made. The Open Video Conference was exactly one such point in time. Things have changed. Forever. For the better. I am stunned.
On Jun 13th 2009 Chris DiBona of Google claimed on the WhatWG mailing list:
“If were to switch to theora and maintain even a semblance of the current youtube quality it would take up most available bandwidth across the Internet.”
Everyone who has ever encoded a Ogg Theora/Vorbis file and in parallel encoded one with another codec will have to immediately protest. It is sad that even the best people fall for FUD spread by the un-enlightened or the ones who have their own agenda.
Fortunately, Gregory Maxwell from Wikipedia came to the rescue and did an actual “YouTube / Ogg/Theora comparison”. It’s a good read and a comparison on one video. He has put his instructions there, so anyone can repeat it for themselves. You will have to start with a pretty good quality video though to see such differences.
This past week was amazing, not because of Google Wave, which everybody seems to be talking about now, and not because of Microsoft’s launch of the bing search engine, but amazing for the world of open video.
Overall a really exciting week for open media and HTML5 video! I think things are only going to heat up more in this space as more content publishers and more browsers will join the video tag implementations and the Ogg Theora/Vorbis support.