Category Archives: Digital Media

Australian Startup Carnival

Vquence was today presented on the “Australian Startup Carnival” site – go, check it out.

There are 28 participants to the startup carnival and each one of them is being introduced through an interview that was taken electronically. Questions for this interview were rather varied and detailed. They included technical and system backgrounds as well as asking for your use of open source software.

All the questions you have always wanted to ask about Vquence, and a few more. πŸ˜‰

UPDATE: The Startup Carnival has announced the prizes and they are amazing – first prize being an exhibition package at CeBIT. Good luck to us all!!

Vquence: Measuring Internet Video

I have been so busy with my work as CEO of Vquence since the end of last year that I’ve neglected blogging about Vquence. It’s on my list of things to improve on this year.

I get asked frequently what it is that we actually do at Vquence. So here’s an update.

Let me start by providing a bit of history. At the beginning of 2007 Vquence was totally focused on building a social video aggregation site. The site now lives at http://www.vqslices.com/ and is useful, but lacks some of the key features that we had envisaged to have a breakthrough.

As the year grew older and we tried to create a corporate business and an income with our video aggregation, search and publication technology, we discovered that we had something that is of much higher value than the video handling technology: we had quantitative usage information about videos on social video sites in our aggregated metadata. In addition, our “crawling” algorithms, are able to supply up-to-date quantitative data instantly.

In fact, I should not simply call our data acquisition technology a “crawler” because in the strict sense of the word, it’s not. Bill Burnham describes in his blog post about SkyGrid the difference between crawlers of traditional search engines and the newer “flow-based” approach that is based on RSS/ping servers. At Vquence we are embracing the new “flow-based” approach and are extending it by using REST APIs where available. A limitation of the flow-based approach is that just a very small part of the Web is accessible through RSS and REST APIs. We therefore complement flow-based search with our own new types of data-discovery algorithms (or “crawlers”) as we see fit. In particular: locating the long tail of videos stored on YouTube is a challenge that we have mastered.

But I digress…

So we have all this quantitative data about social videos, which we update frequently. With it, we can create graphs of the development of view counts, comment counts, video replies and such. See for example the below image for a graph that compares the aggregate view count of the videos that were published by the main political parties in Australia during last year’s federal election. The graph shows the development of the view count over the last 2.5 months before the election in 2007.

Aggregate Viewcount Graph Federal Election Australia

At first you will notice that Labor started far above everyone else. Unfortunately we didn’t start recording view counts that early, but we assume it is due to the Kevin07 website that was launched on 7th August. In the graph, you will notice a first increase on the coalition’s view count on the 2nd September – that’s when Howard published the video for the APEC meeting 2-9 Sept 2007. Then there’s another bend on the 14th September, when Google launched it’s federal election site and we saw first videos of the Nationals going up on YouTube. The dip in the curve of the Nationals a little after that is due to a software bug. Then on the 14th October the Federal Election was actually announced and you can see the massive increase in view count from there on for all parties, ending with a huge advantage of Labor over everybody else. Interestingly enough, this also mirrors the actual outcome of the election.

So, this is the kind of information that we are now collecting at Vquence and focusing our business around.

On that background, check out a recent blog post by Judah Phillips on “Thinking about Measuring Internet Video?”. It is actually a wonderful description of the kind of things we are either offering or working on.

Using his vocabulary: we can currently provide a mix of Instream and Outstream KPI to the video advertising market. Our larger aim is to provide outstream audience metrics that are exceptional and we know how to get them regardless of where the video goes on the Internet. Our technology plan centers around a mix of a panel-based approach (through a browser plugin) and a census-based approach (through a social network plugin for facebook et al, also using OpenID), and video duplicate identification.

This information isn’t yet published at our corporate website, which still mostly focuses on our capabilities in video aggregation, search, and publication. But we have a replacement in the making. Watch this space… πŸ™‚

Activities for a possible Web Video Working Group

The report of the recent W3C Video on the Web workshop has come out and has some recommendations to form a Video Metadata Working Group, or even more generally a Web Video Working Group.

I had some discussions with people that have a keen interest in the space and we have come up with a list of topics that a W3C Video Working Group should look into. I want to share this list here. It goes into somewhat more detailed than the topics that the W3C Video on the Web workshop has raised. Feel free to add any further concerns or suggestions that you have in the comments – I’d be curious to get feedback.

First, there are the fundamental issues:

  • Choice of royalty-free baseline codecs for audio and video
  • Choice of encapsulation format for multi-track media delivery

Both of these really require the generation of a list of requirements and use cases, then analysis of existing format with respect to these requirements and finally a decision on which ones to use.

Requirements for codecs would encompass, amongst others, the need to cover different delivery and receiving devices – from mobile phones with 3G bandwidth, over Web video, to full-screen TV video over ADSL.

Here are some requirements for an encapsulation format:

  • usable for live streaming and for canned delivery,
  • the ability to easily decode from any offset in a media file,
  • the use for temporal and spatial hyperlinking and the required partial delivery that comes with these,
  • the ability to dynamically create multi-track media streams on a server and to deliver requested tracks only,
  • the ability to compose valid streams by composing segments from different servers based on a (play)list of temporal hyperlinks,
  • the ability to cache segments in the network,
  • and the ability to easily add a different “codec” track into the encapsulation (as a means of preparing for future improved codecs or other codec plugins).

The decisions for an encapsulation format and for a/v codecs may potentially require a further specification of how to map specific codecs into the chosen encapsulation format.

Then we have the “Web” requirements:

The technologies that have created what is known as the World Wide Web are fundamentally a hypertext markup language (HTML), a hypertext transfer protocol (HTTP) and a resource addressing scheme (URIs). Together they define the distributed nature of the Web. We need to build an infrastructure for hypermedia that builds on the existing Web technologies so we can make video a first-class citizen on the Web.

  • Create a URI-compatible means of temporal hyperlinking directly into time offsets of media files.
  • Create a URI-compatible means of spatial hyperlinking directly into picture areas of video files.
  • Create a HTTP-compatible protocol for negotiating and transferring video content between a Web server and a Web client. This also includes a definition of how video can be cached in HTTP network proxies and the like.
  • Create a markup language for video that also enables hyperlinks from any time and region in a video to any other Web resource. Time-aligned annotations and metadata need to be part of this, just like HTML annotates text.

All of these measures together will turn ordinary media into hypermedia, ready for a distributed usage on the Web.

In addition to these fundamental Web technologies, to integrate into modern Web environments, there would need to be:

  • a standard definition of a javascript API to interact with the media data,
  • an event model,
  • a DOM integration of the textual markup,
  • and possibly the use of CSS or SVG to define layout, effects, transitions and other presentation issues.

Then there are the Metadata requirements:

We all know that videos have a massive amount of metadata – i.e. data about the video. There are different types of metadata and they need to be handled differently.

  • Time-aligned text, such as captions, subtitles, transcripts, karaoke and similar text.
  • Header-type metadata, such as the ID3 tags for mp3 files, or the vorbiscomments for Ogg files.
  • Manifest-type description of the relationships between different media file tracks, similar to what SMIL enables, like the recent ROE format in development with Xiph.

The time-aligned text should actually be regarded as a codec, because it is time-aligned just like audio or video data. If we want to be able to do live streaming of annotated media content and receive all the data as a multiplexed stream through one connection, we need to be able to multiplex the text codec into the binary stream just like we do with audio and video. Thus, the definition of the time-aligned text codecs have to ascertain the ability to multiplex.

Header-type metadata should be machine accessible and available for human consumption as required. They can be used to manage copyright and other rights-related information.

The manifest is important for dynamically creating multi-track media files as required through a client-server interaction, such as the request for a specific language audio track with the video rather than the default.

Other topics of interest:

There are two more topics that I would like to point out that require activities.

  • “DRM”: It needs to be analysed what the real need is here. Is it a need to encrypt the media file such that it can only be read by specific recipients? Maybe an encryption scheme with public and private keys could provide this functionality? Or is it a need to retain copyright and licensing information with the media data? Then the encapsulation of metadata inside the media files may be a good solution already, since this information stays with the media file after a delivery or copy act.
  • Accessibility: It needs to be ascertained that the association of captions, sign language, video descriptions and the like in a time-aligned fashion to the video is possible with the chosen encapsulation format. A standard time-aligned format for specifying sign language would be needed.

This list of required technologies has been built through years of experience experimenting with the seamless integration of video into the World Wide Web in the Annodex project and through further recent discussions from the W3C Video on the Web workshop and elsewhere.

This list is just providing a structure towards what is necessary to address in making video a first-class citizen on the Web. There are many difficult detail problems to solve in each one of these areas. It is a challenge to understand the complexity of the problem, but I hope this structure can contribute to break down some of the complexity and help us to start attacking the issues.

Metadata and Ogg

I am really excited about the huge progress we made at FOMS with metadata and Ogg. The metadata specifications are actually not Ogg-specific – only their mapping into Ogg is. Here are the things that I expect will make for a very structured and sensible distributed handling of metadata on the Web.

At FOMS, we started improving CMML and are now specifying the next version of CMML. CMML is a timed text description language that can easily be multiplexed alongside audio or video data. It is very flexible with its fields and satisfies needs for hypermedia, captions, annotations and other time-aligned text. We took out the Ogg dependencies and it can now be used in any media container format. The specification is now also in an XML schema rather than a DTD, which enables us to reuse modules from XHTML and make it generally more extensible.

We introduced ROE, a description language (or a “manifest”) for multitrack media files. It describes media tracks and their dependencies and thus goes much further than the old stream and import elements in CMML, that now have been deprecated.

ROE can be used to author multitrack media files – in the Ogg case to author Ogg files with a Skeleton track and multiple media tracks. We are in the process of extending Skeleton to incorporate the description of dependencies between logical bitstreams. To complete this, we will be creating a description of how to map ROE into Ogg/Skeleton and vice versa.

ROE can also be used to negotiate with a Web client what media streams to send from the complete manifest that is available on the server. For example, a Web client could request the German sound track with a movie rather than the default English one, and to add English subtitles. This requires a small protocol for negotiation, which can easily be build using Web infrastructure. We are introducing some new HTTP request/response parameters and specific URLs, such as e.g. http://example.com/movie.ogg?track=V1,A2,TT2.

The set of ROE, Skeleton, CMML, and the HTTP and URI specifications will enable a very structured means of interacting with metadata-rich video on the Web. It will be distributed and integrated into the Web infrastructure, much like the Annodex set of technologies already is today.

Since I am also a business owner aside of being an open media enthusiast, let me add that I expect it to have a huge impact on online business around audio and video, enabling business processes and business models that are not possible today. Watch this space!

The greatest gathering of open media sw developers

When I started organising the first FOMS (Foundations of open media software developers workshop) in 2007, I did it because I saw a need to have media hackers get together in a room and discuss stuff in person. Email, irc, svn, bugzilla and wikis only get you a certain distance for collaboration. But no distance communication tool can replace the energy and creative spirit that is created through an in-person meeting and the ability to have a beer together in the evening. Discussions are more intense, impossibilities are identified faster, progress is amazing – and the energy will last and have an impact on the community for months to come after the event.

FOMS 2007 was great in that respect, because some 25 hackers got to know each other for the first time, friendships were formed, trust was built and new ideas (speaking: new code) was created. It was awesome and gave me the motivation to go and organise FOMS 2008. At this point let me express my gratitude to the organising committees of both FOMS 2007 and FOMS 2008 for the support they have given me to organise both workshops and hope they will help again next year in Tasmania.

So then FOMS 2008 took place and what can I say!? It totally blew me away. For me it was a much better experience than the year before because I didn’t also organise the video recordings at LCA. I was therefore more relaxed, got involved in design discussions, and was able to sit down during the week after FOMS at LCA and actually interact with people. On a side note here: Thanks so much to Donna Benjamin, the main organiser of LCA 2008, for getting the FOMS participants a room to ourselves where we were able to gather and get an awesome whole lot of work done.

Nearly the whole Xiph community was at FOMS and issues that have been brewing for years were tabled and discussed. A large number of audio hackers were there, too, and the issue of a standard sound APIs got some heated discussion. There’s a press release and the proceedings of the FOMS discussions up on the FOMS 2008 website, where you can make yourself a complete picture of all the issues that were discussed.

In addition to FOMS, Conrad Parker and I had also organised a Multimedia Miniconf at LCA. It was a great place to communicate some of the outcomes of FOMS and to present some of the latest developments in open media software in the Linux community. Video proceedings are available on the site.

Overall I must say that January has become the highlight of my year in open media software.

Sexier new Vquence player

I’ve been meaning to write about this for a while, but haven’t found a good motivation yet. Today I stumbled across the videos from RailsConf2007 on Blip.tv and decided – this is it! I will show off the nice new sexy layout of the Vquence player with this content – after all, we are a rails shop (apart from all those other programming languages that we use).

Julian has worked over the design of the player in December and done an awesome job. The image pane’s scroll slows down as your reach the left or right border. It works similar to a scrollbar, where if you go to the middle of the image pane, it will scroll to the middle clip in the playlist. As you leave the image pane, it snaps back to focus on the clip that you are currently watching.

The new player also has a lot more text in it. As you mouse over the images, you get the titles of the clips. As you click on the (i) button, you get the annotations of the current clip (click (i) again to make it go away). At the beginning of each clip, there’s a small text reminder at the top that a click on the video will take you to the full video.

And finally – to give the video more space, the transport bar actually disappears as you keep watching and stop interacting with the player. This gives it more of a sit-back experience. The possibility to activate the full-screen display also adds to this experience.

Overall, I am really thrilled how far we have taken the player. Enjoy!

(But should you have any feedback or suggestions for improvement, feel free to shoot me an email or leave a comment.)

Quick links to Ogg-related W3C video Workshop papers

Michael Dale: Metavid & Free Online Video (University Of California at Santa Cruz)

Chris Double: Position Paper for the W3C Video On The Web Workshop (Mozilla Corporation)

HΓ₯kon Wium Lie: Opera Software’s position paper for Video on the Web (Opera Software)

Silvia Pfeiffer: Architecture of a Video Web – Experience with Annodex (Annodex Association)

Silvia Pfeiffer: Hyperlinking to time offsets: The temporal URI specification (Annodex Association)

Native javascript support for annotated and indexed media in Web browsers

Many people wonder what the future of video on the Web should be and want a more integrated and simpler video solution than what flash provides right now.

The W3C and WHATWG’s move towards a video element in HTML5 is a good first step.

However, it is not enough.

At the recent W3C’s video workshop, I realised that people’s requirements and expectations go far beyond what the HTML5 spec is currently providing. And most of those requirements can be satisfied with the Annodex technologies. But it will need a lot of explaining, documenting and demonstrating to show that Annodex provides these solutions in a simple, yet comprehensive manner. And what’s more: any technology developed to satisify the requirements will need to take on board many of the design decisions that we made for Annodex, so I hope, whatever will be the next Video Web technology, we can provide our input.

The most fundamental point to understand is that you cannot create a solution for video webs without considering all aspects of handling video on the Web in an integrated fashion. This includes topics such as the URI addressing scheme, seeking and indexing of video, the metadata and annotation scheme, and how all of this fits together with the binary video data and Web servers. Let me repeat: these topics have to be addressed together and not as separate projects, because they influence each other!

Apart from Annodex, no other existing or suggested video technology for the Web brings together all the required facets to really solve the big picture – and that includes video metadata specifications, hyperlinking approaches, codecs etc.

Having said all of this, let me demonstrate to you what I mean by full integration.

Shane Stephens has been coding on a library that brings native Annodex support into Web browsers (called liboggplay) and has provided me with a video that demonstrates what you can do as a programmer once your Web browser understands Annodex. Take note of the integrated use of annotations. And also of the simplicity of URI addressing. And the use of an adapted Web server.

Javascript video API liboggplay

The video is available in Ogg Theora format and on YouTube.

About baseline video codecs and HTML5

[I wrote this more than 8 months ago, but didn’t want to publish it at the time because I want us to solve the issues around video in HTML5 and not fight each other. But I’ve made some changes and I’m now ready to have it published.]

There’s a clash of ecosystems happening at the WHATWG mailing list around the need for the specification of a baseline codec for a future <video> tag in HTML.

The clash is mostly between the open community which want Ogg Theora as a recommended baseline codec and big vendors (Apple & Nokia), which wanted that recommendation taken out. They claim that such a recommendation has nothing to do in a HTML standard, which should specify tags but not recommend external file formats. From one perspective, I agree – some things are better left to the software engineers to decide and left open to the market. However, in this particular instance, I think it would be a big mistake not to specify a baseline video codec. In fact, it would in my mind make the whole move to a new HTML5 standard an irrelevant exercise.

Let’s look at history and play a mind game on the consequences of such a decision.

Around the turn of the century we had a wonderfully diverse situation: we had RealMedia, QuickTime and WindowsMedia all being video formats that people expected to find on the Internet and to stream video. It most certainly made business sense to the involved companies! However, it made no business sense to Web developers and media content producers. They had to set up a transcoding and streaming infrastructure for all these three formats in parallel if they were wanting to reach all their potential clientele. I have actually seen this happening here in Australia at the ABC, which has a mandate to serve all the Australian people and therefore had to provide video in all potential formats. I remember the pain that was written across the faces of the infrastructure people.

A few years fast forward and the ABC can now give sighs of relief: supporting Adobe Flash, they can do away with all this expensive and support-intensive infrastructure and just support one codec.

Another story from the past to keep in mind is the story of PNG and GIF http://www.libpng.org/pub/png/pnghist.html where the collecting of royalties on the GIF codec started the creation of the open and free PNG format, which became a W3C recommendation in 1996 (see http://www.w3.org/Press/PNG-PR.en.html). TBL states in there “We are seeing more of our Members adopt the format and are helping make it the industry standard.”

With these in mind, let’s try and project into the future.

Assuming we do not provide a baseline codec in the spec, what will happen is that we will see each browser adopt support for the codec that “makes business sense”, i.e. Microsoft will support WindowsMedia, and Apple will support QuickTime, while the rest will be looking for a “cheaper” codec which could e.g. be MPEG-1 or Ogg Theora. Or stated differently: we will end up with the same situation that we had around 2001 with streaming codecs, except that Web developers and content owners still have the choice of Flash through the object/embed tag. Who will we confuse? The consumers who will be wanting to create their own content and publish it online. They will want a free and interoperable option. Since that’s not to be had, they will choose what makes most sense on their OS platform – i.e. QuickTime on Macs (comes for “free”), WindowsMedia on Windows, and Ogg Theora on Linux. Yes, this makes business sense to some of us. It will certainly make Adobe happy because – as before – Flash will come out as the winner.

Assuming we do provide a baseline codec in the spec, a very similar situation will actually happen and the browsers will support different codecs initially, since Ogg Theora is just a recommendation, which will probably not be implemented in Apple or MS Web browsers. However, now, Web developer and content owners have a focus on what format they should be providing through the recommendation in the standard. And they will request support for the recommended baseline format from the vendors. So, there may actually be a chance that the confusing mess of codec formats may be sorted after a while. This is the chance we have to make things easier for Web developers and online businesses – and this is why a baseline codec is imperative.

What we now need is to address the issues of Apple, Nokia and MS with Ogg Theora. These are mostly around submarine patents. My suggestion is that the W3C pay an independent patent attorney to perform a patent research on Ogg Theora to address the perceived risks of the big vendors. If the patent search is as comprehensive as possible, we may reach a situation where the big vendors do not perceive the risk any longer. However, there is also a risk that Theora is found to infringe specific patents. I guess we will then either correct the codebase or just have put all our development efforts into Dirac. πŸ™‚ In any case – all the FUD that is currently being sent both ways can then be addressed more easily with some decent data behind it.