Category Archives: open codecs

Upcoming conferences / workshops

Lots is happening in open source multimedia land in the next few months.

Check out these cool upcoming conferences / workshops / miniconfs…

September 29th and 30th, New York
Open Subtitles Design Summit

October 1st and 2nd, New York
Open Video Conference

October 3rd and 4th, New York
Foundations of Open Media Software Developer Workshop

January 24/25th, Brisbane, Australia
LCA Multimedia Miniconf

VP8/WebM: Adobe is the key to open video on the Web

Google have today announced the open sourcing of VP8 and the creation of a new media format WebM.

Technical Challenges

As I predicted earlier, Google had to match VP8 with an audio codec and a container format – their choice was a subpart of the Matroska format and the Vorbis codec. To complete the technical toolset, Google have:

  • developed ffmpeg patches, so an open source encoding tool for WebM will be available
  • developed GStreamer and DirectShow plugins, so players that build on these frameworks will be able to decode WebM,
  • and developed an SDK such that commercial partners can implement support for WebM in their products.

This has already been successful and several commercial software products are already providing support for WebM.

Google haven’t forgotten the mobile space either – a bunch of Hardware providers are listed as supporters on the WebM site and it can be expected that developments have started.

The speed of development of software and hardware around WebM is amazing. Google have done an amazing job at making sure the technology matures quickly – both through their own developments and by getting a substantial number of partners included. That’s just the advantage of being Google rather than a Xiph, but still an amazing achievement.

Browsers

As was to be expected, Google managed to get all the browser vendors that are keen to support open video to also support WebM: Chrome, Firefox and Opera all have come out with special builds today that support WebM. Nice work!

What is more interesting, though, is that Microsoft actually announced that they will support WebM in future builds of IE9 – not out of the box, but on systems where the codec is already installed. Technically, that is be the same situation as it will be for Theora, but the difference in tone is amazing: in this blog post, any codec apart from H.264 was condemned and rejected, but the blog post about WebM is rather positive. It signals that Microsoft recognize the patent risk, but don’t want to be perceived of standing in the way of WebM’s uptake.

Apple have not yet made an announcement, but since it is not on the list of supporters and since all their devices exclusively support H.264 it stands to expect that they will not be keen to pick up WebM.

Publishers

What is also amazing is that Google have already achieved support for WebM by several content providers. The first of these is, naturally, YouTube, which is offering a subset of its collection also in the WebM format and they are continuing to transcode their whole collection. Google also has Brightcov, Ooyala, and Kaltura on their list of supporters, so content will emerge rapidly.

Uptake

So, where do we stand with respect to a open video format on the Web that could even become the baseline codec format for HTML5? It’s all about uptake – if a substantial enough ecosystem supports WebM, it has all chances of becoming a baseline codec format – and that would be a good thing for the Web.

And this is exactly where I have the most respect for Google. The main challenge in getting uptake is in getting the codec into the hands of all people on the Internet. This, in particular, includes people working on Windows with IE, which is still the largest browser from a market share point of view. Since Google could not realistically expect Microsoft to implement WebM support into IE9 natively, they have found a much better partner that will be able to make it happen – and not just on Windows, but on many platforms.

Yes, I believe Adobe is the key to creating uptake for WebM – and this is admittedly something I have completely overlooked previously. Adobe has its Flash plugin installed on more than 90% of all browsers. Most of their users will upgrade to a new version very soon after it is released. And since Adobe Flash is still the de-facto standard in the market, it can roll out a new Flash plugin version that will bring WebM codec support to many many machines – in particular to Windows machines, which will in turn enable all IE9 users to use WebM.

Why would Adobe do this and thus cement its Flash plugin’s replacement for video use by HTML5 video? It does indeed sound ironic that the current market leader in online video technology will be the key to creating an open alternative. But it makes a lot of sense to Adobe if you think about it.

Adobe has itself no substantial standing in codec technology and has traditionally always had to license codecs. Adobe will be keen to move to a free codec of sufficient quality to replace H.264. Also, Adobe doesn’t earn anything from the Flash plugins themselves – their source of income are their authoring tools. All they will need to do to succeed in a HTML5 WebM video world is implement support for WebM and HTML5 video publishing in their tools. They will continue to be the best tools for authoring rich internet applications, even if these applications are now published in a different format.

Finally, in the current hostile space between Apple and Adobe related to the refusal of Apple to allow Flash onto its devices, this may be the most genius means of Adobe at getting back at them. Right now, it looks as though the only company that will be left standing on the H.264-only front and outside the open WebM community will be Apple. Maybe implementing support for Theora wouldn’t have been such a bad alternative for Apple. But now we are getting a new open video format and it will be of better quality and supported on hardware. This is exciting.

IP situation

I cannot, however, finish this blog post on a positive note alone. After reading the review of VP8 by a x.264 developer, it seems possible that VP8 is infringing on patents that are outside the patent collection that Google has built up in codecs. Maybe Google have calculated with the possibility of a patent suit and put money away for it, but Google certainly haven’t provided indemnification to everyone else out there. It is a tribute to Google’s achievement that given a perceived patent threat – which has been the main inhibitor of uptake of Theora – they have achieved such an uptake and industry support around VP8. Hopefully their patent analysis is sound and VP8 is indeed a safe choice.

UPDATE (22nd May): After having thought about patents and the situation for VP8 a bit more, I believe the threat is really minimal. You should also read these thoughts of a Gnome developer, these of a Debian developer and the emails on the Theora mailing list.

HTML5 Media and Accessibility presentation

Today, I was invited to give a talk at my old workplace CSIRO about the HTML5 media elements and accessibility.

A lot of the things that have gone into Ogg and that are now being worked on in the W3C in different working groups – including the Media Fragments and HTML5 WGs – were also of concern in the Annodex project that I worked on while at CSIRO. So I was rather excited to be able to report back about the current status in HTML5 and where we’re at with accessibility features.

Check out the presentation here. It contains a good collection of links to exciting demos of what is possible with the new HTML5 media elements when combined with other HTML features.

I tried something now with this presentation: I wrote it in a tool called S5, which makes use only of HTML features for the presentation. It was quite a bit slower than I expected, e.g. reloading a page always included having to navigate to that page. Also, it’s not easily possible to do drawings, unless you are willing to code them all up in HTML. But otherwise I have found it very useful for, in particular, including all the used URLs and video element demos directly in the slides. I was inspired with using this tool by Chris Double’s slides from LCA about implementing HTML 5 video in Firefox.

Google’s challenges of freeing VP8

Since On2 Technology’s stockholders have approved the merger with Google, there are now first requests to Google to open up VP8.

I am sure Google is thinking about it. But … what does “it” mean?

Freeing VP8
Simply open sourcing it and making it available under a free license doesn’t help. That just provides open source code for a codec where relevant patents are held by a commercial entity and any other entity using it would still need to be afraid of using that technology, even if it’s use is free.

So, Google has to make the patents that relate to VP8 available under an irrevocable, royalty-free license for the VP8 open source base, but also for any independent implementations of VP8. This at least guarantees to any commercial entity that Google will not pursue them over VP8 related patents.

Now, this doesn’t mean that there are no submarine or unknown patents that VP8 infringes on. So, Google needs to also undertake an intensive patent search on VP8 to be able to at least convince themselves that their technology is not infringing on anyone else’s. For others to gain that confidence, Google would then further have to indemnify anyone who is making use of VP8 for any potential patent infringement.

I believe – from what I have seen in the discussions at the W3C – it would only be that last step that will make companies such as Apple have the confidence to adopt a “free” codec.

An alternative to providing indemnification is the standardisation of VP8 through an accepted video standardisation body. That would probably need to be ISO/MPEG or SMPTE, because that’s where other video standards have emerged and there are a sufficient number of video codec patent holders involved that a royalty-free publication of the standard will hold a sufficient number of patent holders “under control”. However, such a standardisation process takes a long time. For HTML5, it may be too late.

Technology Challenges
Also, let’s not forget that VP8 is just a video codec. A video codec alone does not encode a video. There is a need for an audio codec and a encapsulation format. In the interest of staying all open, Google would need to pick Vorbis as the audio codec to go with VP8. Then there would be the need to put Vorbis and VP8 in a container together – this could be Ogg or MPEG or QuickTime’s MOOV. So, apart from all the legal challenges, there are also technology challenges that need to be mastered.

It’s not simple to introduce a “free codec” and it will take time!

Google and Theora
There is actually something that Google should do before they start on the path of making VP8 available “for free”: They should formulate a new license agreement with Xiph (and the world) over VP3 and Theora. Right now, the existing license that was provided by On2 Technologies to Theora (link is to an early version of On2’s open source license of VP3) was only for the codebase of VP3 and any modifications of it, but doesn’t in an obvious way apply to an independent re-implementations of VP3/Theora. The new agreement between Google and Xiph should be about the patents and not about the source code. (UPDATE: The actual agreement with Xiph apparently also covers re-implementations – see comments below.)

That would put Theora in a better position to be universally acceptable as a baseline codec for HTML5. It would allow, e.g. Apple to make their own implementation of Theora – which is probably what they would want for ipods and iphones. Since Firefox, Chrome, and Opera already support Ogg Theora in their browsers using the on2 licensed codebase, they must have decided that the risk of submarine patents is low. So, presumably, Apple can come to the same conclusion.

Free codecs roadmap
I see this as the easiest path towards getting a universally acceptable free codec. Over time then, as VP8 develops into a free codec, it could become the successor of Theora on a path to higher quality video. And later still, when the Internet will handle large resolution video, we can move on to the BBC’s Dirac/VC2 codec. It’s where the future is. The present is more likely here and now in Theora.


ADDITION:
Please note the comments from Monty from Xiph and from Dan, ex-On2, about the intent that VP3 was to be completely put into the hands of the community. Also, Monty notes that in order to implement VP3, you do not actually need any On2 patents. So, there is probably not a need for Google to refresh that commitment. Though it might be good to reconfirm that commitment.


ADDITION 10th April 2010:
Today, it was announced that Google put their weight behind the Theorarm implementation by helping to make it BSD and thus enabling it to be merged with Theora trunk. They also confirm on their blog post that Theora is “really, honestly, genuinely, 100% free”. Even though this is not a legal statement, it is good that Google has confirmed this.

Accessibility support in Ogg and liboggplay

At the recent FOMS/LCA in Wellington, New Zealand, we talked a lot about how Ogg could support accessibility. Technically, this means support for multiple text tracks (subtitles/captions), multiple audio tracks (audio descriptions parallel to main audio track), and multiple video tracks (sign language video parallel to main video track).

Creating multitrack Ogg files
The creation of multitrack Ogg files is already possible using one of the muxing applications, e.g. oggz-merge. For example, I have my own little collection of multitrack Ogg files at http://annodex.net/~silvia/itext/elephants_dream/multitrack/. But then you are stranded with files that no player will play back.

Multitrack Ogg in Players
As Ogg is now being used in multiple Web browsers in the new HTML5 media formats, there are in particular requirements for accessibility support for the hard-of-hearing and vision-impaired. Either multitrack Ogg needs to become more of a common case, or the association of external media files that provide synchronised accessibility data (captions, audio descriptions, sign language) to the main media file needs to become a standard in HTML5.

As it turn out, both these approaches are being considered and worked on in the W3C. Accessibility data that are audio or video tracks will in the near future have to come out of the media resource itself, but captions and other text tracks will also be available from external associated elements.

The availability of internal accessibility tracks in Ogg is a new use case – something Ogg has been ready to do, but has not gone into common usage. MPEG files on the other hand have for a long time been used with internal accessibility tracks and thus frameworks and players are in place to decode such tracks and do something sensible with them. This is not so much the case for Ogg.

For example, a current VLC build installed on Windows will display captions, because Ogg Kate support is activated. A current VLC build on any other platform, however, has Ogg Kate support deactivated in the build, so captions won’t display. This will hopefully change soon, but we have to look also beyond players and into media frameworks – in particular those that are being used by the browser vendors to provide Ogg support.

Multitrack Ogg in Browsers
Hopefully gstreamer (which is what Opera uses for Ogg support) and ffmpeg (which is what Chrome uses for Ogg support) will expose all available tracks to the browser so they can expose them to the user for turning on and off. Incidentally, a multitrack media JavaScript API is in development in the W3C HTML5 Accessibility Task Force for allowing such control.

The current version of Firefox uses liboggplay for Ogg support, but liboggplay’s multitrack support has been sketchy this far. So, Viktor Gal – the liboggplay maintainer – and I sat down at FOMS/LCA to discuss this and Viktor developed some patches to make the demo player in the liboggplay package, the glut-player, support the accessibility use cases.

I applied Viktor’s patch to my local copy of liboggplay and I am very excited to show you the screencast of glut-player playing back a video file with an audio description track and an English caption track all in sync:

elephants_dream_with_audiodescriptions_and_captions

Further developments
There are still important questions open: for example, how will a player know that an audio description track is to be played together with the main audio track, but a dub track (e.g. a German dub for an English video) is to be played as an alternative. Such metadata for the tracks is something that Ogg is still missing, but that Ogg can be extended with fairly easily through the use of the Skeleton track. It is something the Xiph community is now working on.

Summary
This is great progress towards accessibility support in Ogg and therefore in Web browsers. And there is more to come soon.

Audio Track Accessibility for HTML5

I have talked a lot about synchronising multiple tracks of audio and video content recently. The reason was mainly that I foresee a need for more than two parallel audio and video tracks, such as audio descriptions for the vision-impaired or dub tracks for internationalisation, as well as sign language tracks for the hard-of-hearing.

It is almost impossible to introduce a good scheme to deliver the right video composition to a target audience. Common people will prefer bare a/v, vision-impaired would probably prefer only audio plus audio descriptions (but will probably take the video), and the hard-of-hearing will prefer video plus captions and possibly a sign language track . While it is possible to dynamically create files that contain such tracks on a server and then deliver the right composition, implementation of such a server method has not been very successful in the last years and it would likely take many years to roll out such new infrastructure.

So, the only other option we have is to synchronise completely separate media resource together as they are selected by the audience.

It is this need that this HTML5 accessibility demo is about: Check out the demo of multiple media resource synchronisation.

I created a Ogg video with only a video track (10m53s750). Then I created an audio track that is the original English audio track (10m53s696). Then I used a Spanish dub track that I found through BlenderNation as an alternative audio track (10m58s337). Lastly, I created an audio description track in the original language (10m53s706). This creates a video track with three optional audio tracks.

I took away all native controls from these elements when using the HTML5 audio and video tag and ran my own stop/play and seeking approaches, which handled all media elements in one go.

I was mostly interested in the quality of this experience. Would the different media files stay mostly in sync? They are normally decoded in different threads, so how big would the drift be?

The resulting page is the basis for such experiments with synchronisation.

The page prints the current playback position in all of the media files at a constant interval of 500ms. Note that when you pause and then play again, I am re-synching the audio tracks with the video track, but not when you just let the files play through.

I have let the files play through on my rather busy Macbook and have achieved the following interesting drift over the course of about 9 minutes:

Drift between multiple parallel played media elements

You will see that the video was the slowest, only doing roughly 540s, while the Spanish dub did 560s in the same time.

To fix such drifts, you can always include regular re-synchronisation points into the video playback. For example, you could set a timeout on the playback to re-sync every 500ms. Within such a short time, it is almost impossible to notice a drift. Don’t re-load the video, because it will lead to visual artifacts. But do use the video’s currentTime to re-set the others. (UPDATE: Actually, it depends on your situation, which track is the best choice as the main timeline. See also comments below.)

It is a workable way of associating random numbers of media tracks with videos, in particular in situations where the creation of merged files cannot easily be included in a workflow.

Tutorial on HTML5 open video at LCA 2010

During last week’s LCA, Jan Gerber, Michael Dale and I gave a 3 hour tutorial on how to publish HTML5 video in an open format.

We basically taught people how to create and publish Ogg Theora video in HTML5 Web pages and how to make them work across browsers, including much of the available tools and libraries. We’re hoping that some people will have learnt enough to include modules in CMSes such as Drupal, Joomla and WordPress, which will easily support the publishing of Ogg Theora.

I have been asked to share the material that we used. It consists of:

Note that if you would like to walk through the exercises, you should install the following software beforehand:

You might need to look for packages of your favourite OS (e.g. Windows or Mac, Ubuntu or Debian).

The exercises include:

  • creating a Ogg video from an editor
  • transcoding a video using http://firefogg.org/
  • creating a poster image using OggThumb
  • writing a first HTML5 video Web page with Ogg Theora
  • publishing it on a Web Server, with correct MIME type & Duration hint
  • writing a second HTML5 video Web page with Ogg Theora & MP4 to cover Safari/Webkit
  • transcoding using ffmpeg2theora in a script
  • writing a third HTML5 video Web page with Cortado fallback
  • writing a fourth Web page using “Video for Everybody”
  • writing a fifth Web page using “mwEmbed”
  • writing a sixth Web page using firefogg for transcoding before upload
  • and a seventh one with a progress bar
  • encoding srt subtitles into an Ogg Kate track
  • writing an eighth Web page using cortado to display the Ogg Kate track

For those that would like to see the slides here immediately, a special flash embed:

Enjoy!

HTML5 video: 25% H.264 reach vs. 95% Ogg Theora reach

Vimeo started last week with a HTML5 beta test. They use the H.264 codec, probably because much of their content is already in this format through the Flash player.

But what really surprised me was their claim that roughly 25% of their users will be able to make use of their HTML5 beta test. The statement is that 25% of their users use Safari, Chrome, or IE with Chrome Frame. I wondered how they got to that number and what that generally means to the amount of support of H.264 vs Ogg Theora on the HTML5-based Web.

According to Statcounter’s browser market share statistics, the percentage of browsers that support HTML5 video is roughly: 31.1%, as summed up from Firefox 3.5+ (22.57%), Chrome 3.0+ (5.21%), and Safari 4.0+ (3.32%) (Opera’s recent release is not represented yet).

Out of those 31.1%,

8.53% browsers support H.264

and

27.78% browsers support Ogg Theora.

Given these numbers, Vimeo must assume that roughly 16% of their users have Chrome Frame in IE installed. That would be quite a number, but it may well be that their audience is special.

So, how is Ogg Theora support doing in comparison, if we allow such browser plugins to be counted?

With an installation of XiphQT, Safari can be turned into a browser that supports Ogg Theora. The Chome Frame installation will also turn IE into a Ogg Theora supporting browser. These could get the browser support for Ogg Theora up to 45%. Compare this to a claimed 48% of MS Silverlight support.

But we can do even better for Ogg Theora. If we use the Java Cortado player as a fallback inside the video element, we can capture all those users that have Java installed, which could be as high as 90%, taking Ogg Theora support potentially up to 95%, almost up to the claimed 99% of Adobe Flash.

I’m sure all these numbers are disputable, but it’s an interesting experiment with statistics and tells us that right now, Ogg Theora has better browser support than H.264.

UPDATE: I was told this article sounds aggressive. By no means am I trying to be aggressive – I am stating the numbers as they are right now, because there is a lot of confusion in the market. People believe they reach less audience if they publish in Ogg Theora compared to H.264. I am trying to straighten this view.