All posts by silvia

New Theora encoder further improved

After posting only a month ago about the new Thusnelda release, there continues to be good news from the open codec front.

Monty posted last week about further improvements and this time there are actual statistics thanks to Greg Maxwell. Looking at the PSNR (peak signal-to-noise ratio) measure, the further improved Thusdnelda outstrips even the X.264 implementation of H.264.

Don’t get me wrong: PSNR is only one measure, it is an objective measure and the statistics were only calculated on one particular piece. Further analysis are needed, though these are very encouraging statistics.

This is important not just because it shows that open codecs can be as good in quality as proprietary ones. What is more important though is that Ogg Theora is royalty free and implementable in both proprietary and free software browsers.

H.264’s licensing terms, however, will really kick in in 2010, so that may well encourage more people to actually use Ogg Theora/Vorbis (or another open codec like Ogg Dirac/Vorbis) with the new HTML5 video element.

First draft of a new media fragment URI addressing standard

Those who know me well know that a few years ago (in fact, almost 10 years now) we developed the Annodex set of technologies at the CSIRO in a project called “Continuous Media Web”.

The idea was to make time-continuous data (read: audio and video) a integral part of the Web. It would be possible to search for media through standard search engines. It would be possible to link into and out of media as we link into and out of Web pages. It would be possible to mash up video from different Web servers into a single media stream just like we are able to mash up images, text and other Web resources from different Web servers.

As you are all aware, we have made huge steps towards this vision in the last 10 years. We now have what is called “universal search” – search engines like Google and Yahoo don’t return only links to HTML pages any longer, but return links to videos and images just as well.

But it doesn’t go far enough yet – even now we still cannot link into a long-form video to the right fragment that has the exact context of what we have been searching for.

In the Annodex project we implemented a working version of such a deep universal search engine in the year 2003 on top of the Panoptic search engine (a enterprise search engine developed by CSIRO, later spun out and now sold as Funnelback).

The basis for our implementation was the combination of specifications that we developed around Ogg:

  • An extension on Ogg that allows to create valid Ogg streams from subparts of Ogg streams – this is now part of Ogg as Skeleton.
  • A means of annotating Ogg streams with time-aligned text that could be interleaved with the Ogg media stream to produce streams that knew more about themselves – the format was called CMML for Continuous Media Markup Language.
  • And an extension to the URI addressing of Ogg streams using temporal URIs.

I am very proud that in the last 2 years, the development of a generic media fragment URI addressing approach has been taken up by the W3C and Conrad Parker and I are invited experts on the Working Group.

I am even more proud that the Working Group has just published a First Public Working Draft of a document called “Use cases and requirements for Media Fragments“. It contains a large collection of examples for situations in which users will want to make use of media fragments. It defines that the key dimensions of fragmentation that need to be specified are:

  1. Temporal fragmentation
  2. Spatial fragmentation
  3. Track fragmentation
  4. Name fragmentation

Beyond mere use cases and requirements, the document also contains a survey of technologies that address multimedia fragments.

In a first step towards the development of a Media Fragments W3C Recommendation, this document also discusses a proposed syntax for media fragment URI addressing and proposes different processing approaches. These sections will eventually be moved into the recommendation and are the most incomplete sections at this point.

To explain some of the approaches that are being proposed in more detail, here are some examples of media fragment URIs that are proposed through this WD:

  • http://www.example.com/example.ogv#t=10s,20s – addresses the fragment of example.ogv that lies between the 10s and the 20s offset
  • http://www.example.com/example.ogv#track='audio' – addresses the track called “audio” in the example.ogv file
  • http://www.example.com/example.ogv#track='audio'&t=10s,20s – addresses the track called “audio” on the subpart between the 10s and 20s offset in the example.ogv file
  • http://www.example.com/example.ogv#xywh=pixel:160,120,320,240 – addresses the example.ogv file but with a video track cut to a region of the size 320x240px positioned at 160x120px offset
  • http://www.example.com/example.ogv#id='chapter-1' – addresses the named fragment called “chapter-1” which is specified through some mechanism, e.g. Kate or CMML in Ogg

Note that the latter example works only if the encapsulation format provides a means of specifying a name for a fragment. Such a means is e.g. available in QuickTime through chapter tracks, or in Flash through cuepoints.

We know from our experience with Ogg that temporal fragmentation can be realized. For track addressing it is possible to use the recently developed ROE specification. The id tags used there could be included into Skeleton and then be used to address tracks by name. What concerns spatial fragmentation on Ogg Theora – I don’t think it can be achieved for an arbitrary rectangular selection without transcoding.

The next tasks of the Working Group are in creating implementations for these specifications on diverse formats and thus finding out which processes work the best.

Google video: 2.5 years later, my predictions come true

When Google bought YouTube in October 2006, I wrote a blog entry about how Google video is a hosting site and that with the purchase of YouTube, Google has the opportunity to turn the Google brand back to video search.

Well, today, that prediction has come true and Google video has stopped hosting videos for users. So, things are now clear: YouTube is a video publishing site and Google video is a search engine.

Hold on: not so fast.

According to ComScore’s most U.S. search engine Rankings for August 2008, YouTube is the second largest search engine on the Web, ahead of Yahoo. At Vquence, we explain to customers that many people now use YouTube search as their entry point into the Web. Video is their Web. And when it comes to video, it’s all about YouTube.

Because people search for videos on YouTube, most videos that get published will have a copy on YouTube. Thus, YouTube is the dominant place to find video – not Google video. Also, YouTube is turning more and more into a search engine like Google: just this week they published “featured search results“, making a YouTube search result page look almost identical to a Google search result page: there is some featured content on top of the actual search results and there are some paid-for ads on the right.

Since it has taken Google such a long time to move Google video from hosting service to search service, I wonder if it’s not too late for Google video already. It feels now just like an add-on to YouTube – a place you go when all other searches fail.

Yahoo video search was once the best video search around. Then came Truveo and blinkx and a whole bunch more. Now, nobody writes about them any more – everybody just goes to YouTube itself or to Google Universal Search to go and find a video.

It would be nice if Google video search stayed around – if only as a discovery tool for when Web video goes directly onto our TVs. But I doubt, Google will find a good way to monetize it. YouTube’s search will be monetized quicker and more effectively.

Alpha version of next generation Theora codec released

On Thursday, Ralph Giles announced the alpha release of Thusnelda, the next generation implementation of the Theora encoder.

The primary change in comparison to the first generation Theora implementation is a completely rewritten encoder with vastly improved quality vs. bitrate in the default vbr/constant-quality mode, and better tracking of the target bitrate in cbr mode.

Jan Schmidt made some experiments to compare the two versions and found a 20% compression improvement for no loss in quality while at the same time also achieving a 14% improvement in speed.

In 2007 there was a huge (and mostly uninformed) discussion about the lack of quality of Theora on slashdot and Monty wrote a reply clarifying some of the misinformation and explaining the shortcomings that the Xiph team wants to work on to improve the codec. A lot of these issues are now being attacked through the community and through the financial support of the Mozilla grant.

Theora is now much closer to H.264, if not even having overtaken it in some dimensions. Congratulations to the Theora team, in particular Tim Terriberry, Monty, and Ralph Giles. Once this Theora generation is released, it will be a competitive modern video codec.

Happy Ada Lovelace Day – Pia Waugh

The choice of a single great woman for this day was a really difficult decision. There are so many great women to adore and point out in open source and in other fields. And since Pia has just written her Ada Lovelace Day blog post about me, I half considered changing my mind.

My favorite women of past centuries are actually Marie Curie and Joan of Arc – one for her scientific achievements and the other for her courage. In current times I could have chosen Anuradha Suraparaju, together with me one of the few female software developers for open media software (leave me a comment if I missed you!!). I could also have chosen Stormy Peters who I adore for her work in Gnome, Mitchell Baker for her work in Mozilla, or Akkana Peck who is an awesome Firefox and Gimp developer. But weeks ago I had made up my mind to write about Pia, one of my best friends and an open source advocate with endless energy.

Pia Waugh

I can’t remember the day that I met Pia first, but I remember when she made the first impression on me. She came seemingly out of nowhere and decided to stand for president of Linux Australia. At the time, Linux Australia was a pretty useless organisation, having little more than a bank account through which the annual Linux.conf.au was organised. All of a sudden, with Pia spearheading Linux Australia, the organisation received a face, a goal and a purpose. I watched in amazement as LA became more and more representative of the *nix developers and enthousiasts in Australia, while AUUG took the opposite development.

And Pia also knew when to stop standing for LA – the moment that the organisation was threatened by just becoming “Pia’s organisation”, Pia decided to stand back and leave it to others to carry on her work. All they had to do was to stand on the back of a giant (well, almost).

Yes, I regard Pia as a giant – even if she’s actually really small in bodysize. 🙂 But we all know that short people can be very energetic. And Pia is one of the most energetic people that I know. She has a talent to express the opportunities and issues in open source with powerful words and has influenced many people to change their minds about open source. She has the energy to run her consultancy together with her husband Jeff, while at the same time being president of Software Freedom International which organises Software Freedom Day internationally, being active in many other groups, and particularly being a driving and coordinating force for OLPC work in Australia the region.

While driving open source and OLPC Pia has also a huge aim in getting more girls interested in technology, since she is enjoying it so much. She does many events at schools and has inspired many a girl to try out computing. Part of this is to expose great women in computing more such that they can work as role models to the younger generation. Her “Heroes” presentation is an awesome example for her efforts. Pia is my hero for the day. 🙂

Xiph / Annodex part of GSoC this year

Yup, we made it again! And because we want to create some awesome code, I’m posting our call for student applications here.

To all students: Please apply to Xiph GSoC projects and help us make open media technology rock the world!

Xiph.org/Annodex.net seeking Summer of Code student applications!

2009 is an important year for free codecs: Ogg Vorbis on every Android device, Ogg Theora support in development for Mozilla Firefox 3.5, and expanded Ogg hosting by the Internet Archive and Wikimedia. Xiph.org and Annodex.net, who develop free codecs (Ogg Vorbis, Theora, Dirac, Speex, CELT, FLAC) and web video support for them, have been selected as a mentoring organization for Google Summer of Code 2009.

We are actively seeking student projects for Summer of Code.
A list of project suggestions is at:
http://wiki.xiph.org/index.php/Summer_of_Code_2009

Students should feel free to select one of these, develop a variation, or propose their own ideas! Some examples:

  • Develop a conference bridge or reference SIP client for CELT, the new, ultra-low delay audio codec that bridges the gap between Vorbis and Speex for applications where both high quality audio and low delay
    are desired. If you enjoy hacking on networks, you’ll have fun with these CELT projects.
  • Develop components to support all Ogg codecs for OpenMAX IL, the media plugins used in Maemo, Android and LIMO mobile devices. This touches on many interesting projects, and is perfect for anyone with
    an interest in mobile and embedded systems who wants a broad introduction to multimedia codecs.
  • Write a JavaScript Library for Subtitles, Captions and other time-aligned text. The main focus of this project is around enabling video accessibility for Ogg in Firefox. The project requires a student with experience in JavaScript development, HTML and CSS, but also with some understanding of C for liboggplay and libkate, and of C++ for Firefox.
  • Make a Proof of Concept for HTML5 Ogg Video support in the Chromium Browser, using liboggplay (our Ogg Theora playback library, as used in Mozilla Firefox). Full support for HTML5 <video> is a lot of work, but let’s get the ball rolling with a proof of concept for Theora frame decoding and rendering.
  • Add support for import and export of XSPF playlists to Songbird, the Mozilla-powered open music player. This project requires good XML foo, the opportunity to work with cross-platform XUL and JavaScript, and perhaps some C++.

Submissions

The student application period starts on Monday (March 23):
http://socghop.appspot.com/document/show/program/google/gsoc2009/timeline
and runs for a little under 2 weeks, until Friday April 3.

Details of our application process are at:
http://wiki.xiph.org/index.php/Summer_of_Code_Applications

Interested students *must* get involved with the project development community, on project mailing lists and IRC, before the application deadline. When selecting projects, preference will be given to students who have submitted at least one patch to a Xiph.org or Annodex.net project before the application deadline.

Students will receive a grant from Google for successful work on their GSoC projects. Hacking on free multimedia projects is fun and can have a big impact. We need students who love to hack, to help put support for free codecs into more applications, browsers and networks.

Progress on captions for HTML5 video

Paul Rouget this week published another example implementation for using srt with HTML5 video with a javascript library. This is at least the fourth javascript implementation that I know of for attaching srt subtitles to the video element.

It is great to see such a huge need for this. At the same time I am also worried about the amount of incompatible implementations of this feature. It will inhibit search engines from realising which text relates to and describes a particular video. It will also inhibit accessibility technology such as screen readers or braille devices from realising there is text that would be necessary to be rendered.

A standard means of associating srt (or other format) subtitle files with the video tag is really necessary. So, where are we at with this?

Recently, Greg Millam from Google posted a proposal to WHATWG, that shares a lot of elements with the proposal that has been previously discussed between Mozilla, Xiph, and Opera, the current state of which is summarised in the Mozilla wiki. No implementation into a Browser has been made yet, but initial implementations in javascript exist. I think that we will ultimately come out with a harmonised solution between the browser vendors. It just needs implementation work and continuous improvement.

At the same time, in-band captions that come multiplexed within the Ogg file are also being progressed. At Xiph we are now focusing on using Ogg Kate for these purposes – it really don’t make much sense to invent another codec when Ogg Kate is already so close to solving most problems. So, between the developer of Ogg Kate and myself, we are preparing a Google Summer of Code project that should see a implementation for Firefox 3.1 that is capable of extracting the text from an Ogg file that has a Kate track and displaying that track as though it was a srt file. If you are interested, shoot me an email!

UPDATE: Firefox 3.1 is apparently now called Firefox 3.5 – sorry guys. 🙂

ANOTHER UPDATE: My post seemed to imply that Firefox 3.5 will have Ogg Kate support. This is not the case. There is a patch for Firefox and liboggplay to provide Ogg Kate support into Firefox and this patch will be the basis of the Summer of Code project. The student will then work mostly on implementing a comprehensive javascript library to display Ogg Kate encoded time-aligned text (read: captions, Karaoke etc) in the Web browser. This is a proof-of-concept and a first step towards standardising the handling of time-aligned text in Web browsers that suppor the HTML5 video tag.

Professional Tool support for open media codecs

Michael Dale from Metavid has posted an article on why we are about to hit the tipping point for professional video producers to move to open media codecs. His statement is that it’s not just because the H.264 licensing grace period is about to end, but has a lot to do with the support that open media codecs are increasingly seeing on the Web, where the next big professional video market will happen. I totally agree.

The increasing amount of open tools on the Web for open codecs was all stimulated by the HTML5 <video> element and is based on year-long efforts that have gone into Annodex and applications using Annodex technology such as Metavid. There is now Firefox 3.1’s native support for Ogg Theora/Vorbis through liboggplay, there is the mod_annodex Apache plugin and the oggzchop CGI tool to serve time ranges of Ogg media over HTTP from Web Servers, there is the new firefogg Firefox Ogg Theora/Vorbis encoding plugin, and this all closes the tool-chain from encoding to publishing to viewing.

Native editing of Ogg Theora/Vorbis video is still a challenge, but any professional video producer will not want to move away from their favorite tool for editing video anyway, so it is a matter of having an export function included into these professional editors. While such export functions will take some time to emerge in these proprietary editors, the use of ffmpeg2theora and similar transcoding tools will be perfectly sufficient to fulfill these needs.

If you want to see why open source codecs and open video technology make such a difference, just go and check out Metavid, the best software around for wiki-style editing of time-aligned annotations for long-form video. I look forward to all the cool new applications that will emerge with open media software on the Web – applications that are not possible with proprietary video technology because of their lack of flexibility, interoperability, and adaptability.

Website madness of marketing agencies

I have spent a lot of time recently researching Sydney-based agencies to invite to the upcoming Launch of our Vquence VQmetrics service. This involved finding their websites, finding out about their target business (do they do online video?), finding a relevant contact, and emailing an invitation to them.

I am close to institutional confinement!

I do understand that agencies need to show off their creativity on their Website. The result of this is that most agency Websites are completely written in Flash. Fortunately I have the latest version of Flash installed, so I can load them all. But my Web browser and MacBook do not deal well with having more than about 5 tabs open with Flash content – my machine almost grunts to a halt. So, there goes the idea of opening multiple tabs at the same time while waiting for the lengthy Flashs of the sites to load…

Then, once the pages are loaded, it is always a surprise to see what the agency has come up with. At the beginning of the exercise it was a surprise. Later it became a nuisance. Now, I am utterly terrified before opening another agency Website. Will it break my browser? Will it start playing a video? Will it start playing music so loud that it blasts off my ears? Will I feel really stupid because I cannot navigate the site? Will I be able to locate the “Contact Us” section? Will they have bothered to publish an email address or do I have to fill in a stupid contact form that I know nobody will look at? Will the contact email work or just bounce?

It almost feels like the creation of the Website is a competition between the agencies as to who can create the maddest, most unusual, and most unusable Website.

Please, please! Can I just have a simple, usable site with obvious navigation, a simple and fast loading list of reference work, and a list of key people working at the agency with their email contacts?

Oh, and Mumbrella has just published a post that gives me scientific proof that this is a conspiracy against me by the agencies! No, stop that – I am not ready to be locked up yet!