Category Archives: Open Source

YouTube Ogg Theora+Vorbis & H.263/H.264 comparison

On Jun 13th 2009 Chris DiBona of Google claimed on the WhatWG mailing list:

“If were to switch to theora and maintain even a semblance of the current youtube quality it would take up most available bandwidth across the Internet.”

Everyone who has ever encoded a Ogg Theora/Vorbis file and in parallel encoded one with another codec will have to immediately protest. It is sad that even the best people fall for FUD spread by the un-enlightened or the ones who have their own agenda.

Fortunately, Gregory Maxwell from Wikipedia came to the rescue and did an actual “YouTube / Ogg/Theora comparison”. It’s a good read and a comparison on one video. He has put his instructions there, so anyone can repeat it for themselves. You will have to start with a pretty good quality video though to see such differences.

Cool HTML5 video demos

I’ve always thought that the most compelling reason to go with HTML5 Ogg video over Flash are the cool things it enables you to do with video within the webpage.

I’ve previously collected the following videos and demos:

First there was a demo of a potential javascript interface to playing Ogg video inside the Web browser, which was developed by CSIRO. The library in use later became the library that Mozilla used in Firefox 3.5:

Then there were Michael Dale’s demos of Metavidwiki with its direct search, access and reuse of video segments, even a little web-based video editor:

Then there was Chris Double’s video SVG demo with cool moving, resizing and reshaping of video:

and Chris kept them coming:

Then Chris Blizzard also made a cool demo for showing synchronised video and graph updates as well as a motion detector:

And now we have Firefox Director Mike Belitzer show off the latest and coolest to TechCrunch, the dynamic content injection bit of which you can try out yourself here:

It just keeps getting better!

UPDATE: Here are some more I’ve come across:

Sites with Ogg in HTML5 video tag

Yesterday, somebody mentioned that the HTML5 video tag with Ogg Theora/Vorbis can be played back in Safari if you have XiphQT installed (btw: the 0.1.9 release of XiphQT is upcoming). So, today I thought I should give it a quick test. It indeed works straight through the QuickTime framework, so the player looks like a QuickTime player. So, by now, Firefox 3.5, Chrome, Safari with XiphQT, and experimental builds of Opera support Ogg Theora/Vorbis inside the HTML5 video tag. Now we just need somebody to write some ActiveX controls for the Xiph DirectShow Filters and it might even work in IE.

While doing my testing, I needed to go to some sites that actually use Ogg Theora/Vorbis in HTML5 video tags. Here is a list that I came up with in no particular order:

I’m sure there’s a lot more out there – feel free to post links in the comments.

Firefox plugin to encode Ogg video

Michael Dale just posted this to theora-dev. Go to one of the given URLs to install the Firefox plugin that lets you transcode video to Ogg using your Web browser.

Firefogg is developed by Jan Gerber and lives at http://www.firefogg.org/. There is a javascript API available so you can make use of Firefogg in your own Website project to allow people to upload any video and transcode it to Ogg on the fly.

Enjoy!

On Fri, Jun 5, 2009 at 7:08 AM, Michael Dale wrote:
> I mentioned it in the #theora channel a few days ago but here it is with
> a more permanent url:
>
> http://www.firefogg.org/make/advanced.html
> &
> http://www.firefogg.org/make/
>
> These will be simple links you can send people so that they can encode
> source footage to a local ogg video file with the latest and greatest
> ogg encoders (presently thusnelda and vorbis). Updates to thusnelda and
> possible other free codecs will be pushed out via firefogg updates 😉
>
> Pass along any feedback if things break or what not.
>
> I am also doing testing with “embed” these encoder interface. For those
> familiar with jQuery: an example to rewrite all your file inputs with
> firefogg enhanced inputs: $(“input:[type=’file’]”).firefogg() … Feel
> free to expeirment based on those examples. The form rewrite has mostly
> only been tested in the mediaWiki context:
> http://sandbox.kaltura.com/testwiki/index.php/Special:Upload
> but with minor hacking should work elsewhere 🙂
>
> enjoy
> –michael
>
> _______________________________________________
> theora mailing list
> theora@xiph.org
> http://lists.xiph.org/mailman/listinfo/theora
>

Dailymotion using Ogg and other recent cool open video news

This past week was amazing, not because of Google Wave, which everybody seems to be talking about now, and not because of Microsoft’s launch of the bing search engine, but amazing for the world of open video.

  1. YouTube are experimenting with the HTML5 video tag. The demo only works in HTML5 video capable browsers, such as Firefox 3.5, Safari, Opera, and the new Chrome, which leads me straight to the next news.
  2. The Google Chrome 3 browser now supports the HTML5 video tag. The linked release only supports MPEG encoded video, but that’s a big step forward.
  3. More importantly even, recently committed code adds Ogg Theora/Vorbis support to Google Chrome 3’s video tag! This is based on using ffmpeg at this stage, which needs some further work to e.g. gain Ogg Kate support. But this is great news for open media!
  4. And then the biggest news: Dailymotion, one of the largest social video networks, has re-encoded all their videos to Ogg Theora/Vorbis and have launched an openvideo platform. The blog post is slightly negative about video quality – probably because they used an older encoder. The Xiph community has already recommended use of recommends experimenting with the new Thusnelda encoder and the latest ffmpeg2theora release that supports it, since they provide higher compression ratios and better quality.
  5. That latest ffmpeg2theora release is really awesome news by itself, but I’d also like to mention two other encoding tools that were released last week: the updated XiphQT QuickTime components, that now allow export to Ogg Theora/Vorbis directly from iMovie (I tested it and it’s awesome) and the new GStreamer command-line based python encoder gst2ogg which works mostly like ffmpeg2theora.

Overall a really exciting week for open media and HTML5 video! I think things are only going to heat up more in this space as more content publishers and more browsers will join the video tag implementations and the Ogg Theora/Vorbis support.

FOMS 2009: video introductions available

In January this year we had the third Foundations of Open Media software workshop for developers. The focus this year was on legal issues around codecs, Xiph and Web video (HTML5 video and video servers), authoring/editing software, and accessibility. Check out the complete set of areas of concern and community goals that we decided upon.

As every year, at the beginning of the workshop every participant provided a 5 min introduction about their field of speciality and the current challenges. These are video recorded and shared with the community.

The videos and accompanying slides have been available for about 2 months now, but I haven’t gotten around to blogging about it – apologies everyone! So, here are your star videos in reverse alphabetic order published using open source video software only:

Enjoy!

Video as an enabler for broadband applications

Last week, I gave a brief statement on the importance of video as an enabler for broadband applications at the Public Sphere event of Senator Kate Lundy.

I found it really difficult to summarize all the things that I find important about video technology in a modern distributed online world in a 10 min speech. Therefore, I’d like to extend on some of the key points that I was trying to make in this blog post.

Video provides presence

One of the biggest problems we have with the online world is that it mostly still evolves around text. To exchange information with others, to publish, to chat (email, irc or twitter) or do our work, we mostly still rely on the written word as a communication means. However, we all know how restrictive this is – everyone who has ever seen a flame war develop on a mailing list, a friendship break over a badly formulated email, a host of negative comments posted on a mis-formulated blog post, or a twitter storm explode over a misunderstanding knows that text is very hard to get right. Lacking any sort of personal expression supporting the expressed words (other than the occasional emoticon), sentences can be read or interpreted in the wrong way.

A phone call (or skype call) is better than text: how often have you exchanged 10 or even 20 emails with a friend to e.g. arrange to meet for a beer, when a simple phone call would have solved it within seconds. But even a phone call provides a reduced set of communication channels in comparison to a personal meeting: gesture, posture, mime and motion are there to enrich communication channels and help us understand the other better. Just think about the cognitive challenges in a phone conference in comparison to the ease of speaking to people when you see them.

With communication that uses video, we have a much higher communication “bandwidth” between people, i.e. a lot less has to be actually said in words so we can understand each other, because gesture, posture, mime and motion speak for us, too. While we cannot touch each other in a video communication, e.g. for shaking hands or kissing cheeks, video provides for all these other channels of communication providing a much higher perceived feeling of “presence” to the remote person or people. When my son speaks over skype with my family in Germany, and we cannot turn on the web cam because the bandwidth and latency are too poor, he loses interest very quickly in speaking to these “soul-less” voices.

The availability of bandwidth will make it possible for humans to communicate with each other at a more natural level, feeling more engaged and involved. This has implications not just on immediate communications, such as person-to-person calls or video conferences, but on any application that requires the interaction of people.

Video requirements are the block to create new applications

Bandwidth requirements for most online applications are pretty low. Consider for example a remote surgery where a surgical expert on one end operates on a patient at a remote location with surgical staff and operating equipment. The actual data that needs to be exchanged between the surgeon and the operating machines is fairly low – they are mostly command-control data that has to be delivered at high accuracy and low delay, but does not require high bandwidth. What turns such a remote surgery scenario into a challenge with existing networks are the requirements for multiple video channels – the surgeon needs to be visible to the staff and probably to the patient – in turn, the surgeon needs to see the staff, needs to see the patient from multiple angles to gain the full picture, needs to see the supporting documents such as X-rays, schedules, blood analysis etc, and of course he needs to see the video coming from the operating equipment possibly from within the patient that gives him feedback on the actual operation.

As you can see, it is video that creates the need for high bandwidth.

This is not restricted to medical applications. Almost all new remote applications that we create end up having a huge visual requirement with multiple video streams. This is natural, since almost all remote applications involve more than one person and each person has the capability to look into different directions. Thus, the presence of each person has to be replicated and the representation of the environment has to be replicated.

Even in a simple scenario such as a video conference, a single camera and microphone are very restrictive and do not provide the ability to every participant to interact with any of the other people present, but restrict them to the person/group that the camera is currently focused on. Back channels such as affirmative side chats or mimic exchanges of opinion are lost. Multiple video channels can make up for this.

In my experience from the many projects I have been somewhat involved with over the years that tried to develop new remote applications – teleteaching at Mannheim University or the CeNTIE project at CSIRO – video is the bandwidth-needy channel, but video is not the main purpose of the application. Rather, the needs for information for the involved people are what drives the setup of the data and communication channels for a particular application.

Immediately, applications in the following areas come to mind that will be enabled through broadband:

  • education: remote lectures, remote seminars, remote tutoring, remote access to research text/data
  • health: remote surgery, remote expert visits, remote patient monitoring
  • business: remote workplace, remote person-to-person collaboration with data sharing and visualisation, remote water-cooler conversations, remote team presence
  • entertainment: remote theatre/concert/opera visit, home cinema, high-quality video-on-demand

But ultimately, there is impact into all aspects of our lives: consider e.g. the new possibilities for citizen involvement in politics with remote video technology, or collaborative remote video editing in video production, or in sports for data collection. Simply ask yourself “what would I do differently if I had unlimited bandwidth?” and I’m sure you will come up with at least another 2 or 3 new applications in your field of expertise that have not been mentioned before.

Technical challenges

Video (with audio) is an inherently volatile data stream that is highly sensitive to specific kinds of networking issues.

End-to-end delays such as are typical with satellite-based connections destroy the feeling of presence and create at best awkward communications, at worst destructive feedback-loops in live operations. Unfortunately, there is a natural limit to the speed in which data can flow between two points. Given that the largest distance between two points on earth is approx 20,000 km and the speed of light is approx 300,000 km/s, a roundtrip must take at least 133ms. Considering that humans can detect a delay as small as 10ms in a remote communication and are really put off by a delay of 100ms, this is a technical challenge that we will find hard to overcome. It shows, however, that it is a technical requirement to minimize end-to-end dealys as much as possible.

Packet jitter is another challenge that video deals with badly. In networks, packets cannot easily be guaranteed to arrive at a certain required rate. For example, video needs to play back at a fixed picture rate (typically 25 frames per second) for humans to be able to view it as smooth motion. Whether video is transferred live or from a file, video packets are required to arrive at a certain rate such that the pictures can be decoded and displayed at the expected rate. The variance in delay of packets arriving because of network congestion is called packet jitter. If packet jitter is high, the video will either have to stop and buffer packets until enough video frames have arrived for it to display again, or it will have to drop packets and therefore video frames to keep in sync with a live stream. Typically the biggest problem of dropping packets is the drop-out of audio – while we can tolerate some drop-outs in video, audio drop-outs are unacceptable to maintain a conversation.

In most of the application scenarios, there is a varying need for video quality.

For example, a head shot of a person that is required for communication doesn’t need high-quality video – it is sufficient if the person can be seen and the communication can be held. The audio resolution can be telephone quality (i.e. 8kHz audio sampling rate) and the video can be highly compressed and at a smallish resolution (e.g. 320×240 px) giving standard skype quality video which requires about 400Kbps in bandwidth.

At the other end of the scale are e.g. medical and large-screen applications where a high sound quality is required e.g. to hear heart beats properly (i.e. 48-96kHz audio sampling rate) and the video can’t be compressed (much) so as not to introduce artifacts, which gives at a high HDTV resolution of e.g. 1920x1080px bandwidth requirements of 30Mbps compressed – uncompressed would be about triple that.

So, depending on the tolerance of the application to picture size, compression artifacts, and the number of parallel video streams required, bandwidth requirements for video can be relatively low or really high.

Further technical issues around video are that online video can be handled differently to analog video. The video can have all sorts of metadata associated with it – it can have hyperlinks to other content – it can be accompanied by advertising in more flexible ways – and it can be automatically personalised towards the needs of the individual viewers, just to name a few rich functions of online video. It is here where a lot of new ideas for monetisation will evolve.

Non-technical challenges

Apart from technical challenges, the use of video also creates issues in other dimensions.

People are worried about their behaviour as it is always potentially recorded and thus may not perform their duties with the same focus and concentration as is necessary.

People are worried about video connections always being potentially enabled and thus having potentially a remote listener/viewer that is unwanted.

On top of such privacy issues come issues in data security as increasingly data is distributed remotely.

We should also not forget that there are people that have varying requirements for their communication. A large challenge for such new applications will be to make them accessible. For example the automated creation of captions for remote video communication may well turn out to be a major challenge, but also an opportunity for later archiving and search.

When looking at the expected move of professional video content from TV to online, there are more issues about copyrighted content and usage rights – mostly this has to do with legacy content where online use was not considered in licensing agreements. This is a large inhibitor e.g. for Australia in creating a Hulu-like service.

In fact, monetisation is a huge issue, since video is not cheap: there is a cost in the development of applications, there is a cost in bandwidth, in storage, and a cost in content production that has to be covered somewhere. Simply expecting the user to pay for being online and then to pay again for each separate application, potentially subscribing to a multitude of services, may not be the best way to cope with the cost. Advertising will certainly play a big role in the monetisation mix and new forms of advertising will emerge, such as personalised permission-based advertising based on the information available about a person e.g. through their Google searches.

In this context, the measurement of the use of video in bandwidth, storage and as part of an application will be a big enabler towards figuring out how to pay for all the involved expenditure and what new monetisation models to come up with.

Further in the context of cost and monetisation it should be added that the use of open source software, in particular open source video technology such as open codecs can help bring down cost while at the same time create more interoperability. For example, if Skype used an open codec and open protocols rather than their proprietary technology, other applications could be built using the skype infrastructure and user base.

Approach to developing good new applications

These are just the challenges for video streams themselves. However, in new applications, video streams will just be a tool for creating an integral application, ultimately driven by the processes and data needs of the application. The creation of all the other parts of the application – the machinery, control panels, the data pools, the processes, the human interface, security and privacy measures etc – are what make up the product challenge. A product ultimately has to function in a way that makes it a usable tool in achieving a certain outcome. Unless the use of the product becomes natural and the distance disappears from the minds of the people involved, a remote application does not succeed.

The CeNTIE project, the approach towards the development of new remote applications was to assume no limits on available bandwidth. Then a challenge would be identified in an application area, e.g. in the medical space, and a prototype would be built with lots of input from the domain experts. Then the prototype would actually be deployed into a real working situation and tested. The feedback from the domain experts would be used to improve the application with further technology and improved processes. Ultimately, a usable setup would emerge, which was then ready to be turned into a product for commercialisation.

We have the capabilities here in Australia to develop world-class new applications on high-bandwidth networks. We need to support this further with bandwidth – hopefully the NBN will achieve this. But we also need to support this further with commercialisation support – unfortunately most of the applications that I saw being developed at the CSIRO never made it past the successful prototype. But this is fodder for another blog post at a different time.

Finally, I’d like to point out that we also have a large challenge in overcoming tradition. Most of us would be challenged to trust a doctor and his equipment for doing a surgical operation on our body from a remote location. There are issues of trust and culture involved that may take us a while to deal with and accept.

UPDATE (11/6/09): It seems that CISCO’s latest report, which predicts global IP traffic to increase 5-fold over the next 3 years, agrees with the analysis that most of this increase will be caused by video.

New Theora encoder further improved

After posting only a month ago about the new Thusnelda release, there continues to be good news from the open codec front.

Monty posted last week about further improvements and this time there are actual statistics thanks to Greg Maxwell. Looking at the PSNR (peak signal-to-noise ratio) measure, the further improved Thusdnelda outstrips even the X.264 implementation of H.264.

Don’t get me wrong: PSNR is only one measure, it is an objective measure and the statistics were only calculated on one particular piece. Further analysis are needed, though these are very encouraging statistics.

This is important not just because it shows that open codecs can be as good in quality as proprietary ones. What is more important though is that Ogg Theora is royalty free and implementable in both proprietary and free software browsers.

H.264’s licensing terms, however, will really kick in in 2010, so that may well encourage more people to actually use Ogg Theora/Vorbis (or another open codec like Ogg Dirac/Vorbis) with the new HTML5 video element.

First draft of a new media fragment URI addressing standard

Those who know me well know that a few years ago (in fact, almost 10 years now) we developed the Annodex set of technologies at the CSIRO in a project called “Continuous Media Web”.

The idea was to make time-continuous data (read: audio and video) a integral part of the Web. It would be possible to search for media through standard search engines. It would be possible to link into and out of media as we link into and out of Web pages. It would be possible to mash up video from different Web servers into a single media stream just like we are able to mash up images, text and other Web resources from different Web servers.

As you are all aware, we have made huge steps towards this vision in the last 10 years. We now have what is called “universal search” – search engines like Google and Yahoo don’t return only links to HTML pages any longer, but return links to videos and images just as well.

But it doesn’t go far enough yet – even now we still cannot link into a long-form video to the right fragment that has the exact context of what we have been searching for.

In the Annodex project we implemented a working version of such a deep universal search engine in the year 2003 on top of the Panoptic search engine (a enterprise search engine developed by CSIRO, later spun out and now sold as Funnelback).

The basis for our implementation was the combination of specifications that we developed around Ogg:

  • An extension on Ogg that allows to create valid Ogg streams from subparts of Ogg streams – this is now part of Ogg as Skeleton.
  • A means of annotating Ogg streams with time-aligned text that could be interleaved with the Ogg media stream to produce streams that knew more about themselves – the format was called CMML for Continuous Media Markup Language.
  • And an extension to the URI addressing of Ogg streams using temporal URIs.

I am very proud that in the last 2 years, the development of a generic media fragment URI addressing approach has been taken up by the W3C and Conrad Parker and I are invited experts on the Working Group.

I am even more proud that the Working Group has just published a First Public Working Draft of a document called “Use cases and requirements for Media Fragments“. It contains a large collection of examples for situations in which users will want to make use of media fragments. It defines that the key dimensions of fragmentation that need to be specified are:

  1. Temporal fragmentation
  2. Spatial fragmentation
  3. Track fragmentation
  4. Name fragmentation

Beyond mere use cases and requirements, the document also contains a survey of technologies that address multimedia fragments.

In a first step towards the development of a Media Fragments W3C Recommendation, this document also discusses a proposed syntax for media fragment URI addressing and proposes different processing approaches. These sections will eventually be moved into the recommendation and are the most incomplete sections at this point.

To explain some of the approaches that are being proposed in more detail, here are some examples of media fragment URIs that are proposed through this WD:

  • http://www.example.com/example.ogv#t=10s,20s – addresses the fragment of example.ogv that lies between the 10s and the 20s offset
  • http://www.example.com/example.ogv#track='audio' – addresses the track called “audio” in the example.ogv file
  • http://www.example.com/example.ogv#track='audio'&t=10s,20s – addresses the track called “audio” on the subpart between the 10s and 20s offset in the example.ogv file
  • http://www.example.com/example.ogv#xywh=pixel:160,120,320,240 – addresses the example.ogv file but with a video track cut to a region of the size 320x240px positioned at 160x120px offset
  • http://www.example.com/example.ogv#id='chapter-1' – addresses the named fragment called “chapter-1” which is specified through some mechanism, e.g. Kate or CMML in Ogg

Note that the latter example works only if the encapsulation format provides a means of specifying a name for a fragment. Such a means is e.g. available in QuickTime through chapter tracks, or in Flash through cuepoints.

We know from our experience with Ogg that temporal fragmentation can be realized. For track addressing it is possible to use the recently developed ROE specification. The id tags used there could be included into Skeleton and then be used to address tracks by name. What concerns spatial fragmentation on Ogg Theora – I don’t think it can be achieved for an arbitrary rectangular selection without transcoding.

The next tasks of the Working Group are in creating implementations for these specifications on diverse formats and thus finding out which processes work the best.

Alpha version of next generation Theora codec released

On Thursday, Ralph Giles announced the alpha release of Thusnelda, the next generation implementation of the Theora encoder.

The primary change in comparison to the first generation Theora implementation is a completely rewritten encoder with vastly improved quality vs. bitrate in the default vbr/constant-quality mode, and better tracking of the target bitrate in cbr mode.

Jan Schmidt made some experiments to compare the two versions and found a 20% compression improvement for no loss in quality while at the same time also achieving a 14% improvement in speed.

In 2007 there was a huge (and mostly uninformed) discussion about the lack of quality of Theora on slashdot and Monty wrote a reply clarifying some of the misinformation and explaining the shortcomings that the Xiph team wants to work on to improve the codec. A lot of these issues are now being attacked through the community and through the financial support of the Mozilla grant.

Theora is now much closer to H.264, if not even having overtaken it in some dimensions. Congratulations to the Theora team, in particular Tim Terriberry, Monty, and Ralph Giles. Once this Theora generation is released, it will be a competitive modern video codec.