Category Archives: Open Source

Best practice in Web video publication

I’ve spent a lot of time recently analysing video on the Web.

YouTube and Google Video introduced what seems to be the standard now: videos get published on what I like to call a “host page”. This is one webpage completely dedicated to this video.

Why are there still so many videos out there that get published through a hyperlink behind two words of text instead of giving them proper recognition?

Think about it: the creation of a video usually costs a lot of effort and when it’s done, it needs a proper presentation. Hiding it behind a hyperlink is like putting your blog up on an ftp server in pdf format.

So, what information has to be on a video host page?

Best practice is to have an embeddable video player on the host page that displays a keyframe.

Other information that typically resides on a host page is a short textual description of the video, its duration, who published it, who created it, license rights (check out Creative Commons for this), tags & category attributions, comments from viewers, number of page views, and a description of how to use this thing in other environments, such as how to embed it in blogs or how to download it to the iPod or PSP.

We don’t need Google or YouTube to do this for us. We can publish video in that way ourselves. Well, maybe apart from the bit about transcoding to the iPod or PSP. Incidentally, is there any open source SW around to do that?

We can transcode our videos to Ogg Theora using ffmpeg2theora and then publish it with the embedded java theora player Cortado. Then we just need to create our own host page in html.

All we need now are a few more plugins for common Web content management systems like WordPress or drupal to simplify this process even more. Here’s your Friday afternoon challenge. 🙂

Making your video discoverable

Videos will be everywhere on the web! Yes, cope with it: soon the majority of videos won’t be with some hosting site like youtube, but it will reside on our private servers, on company servers, actually on any and all web servers. And there will be interesting stuff, but it will be hard to find.

Yes, history will repeat itself again and finding those videos on the Web that satisfy our need – be it for information or entertainment – will be a nightmare. Why? Because google’s pagerank (and many other ranking algorithms) rely on Web pages pointing to the videos to give them a higher rank. However, the way in which videos are currently published is through embedding them into Web pages (let’s call such a page the “embedding page”). Thus, the link analysis will actually return the pagerank for the embedding page – but not for the video itself!

Now, if the embedding page can actually be seen as representative for the video because the only reason that the webpage exists is to publish the video and its annotations, then the pagerank for the embedding page is actually the same as the pagerank for the video. This is the case for google video and for youtube and for many other hosting sites.

However, you and I mostly publish our videos in blogs or on Web pages that describe more than just the video – some will even have several videos embedded. This is where the chaos for a Web search engine for videos begins. And this is where the discoverability of your videos through video search engines ends.

Here is the solution.

Just as we do with normal Web pages, we have to introduce SEO (search engine optimisation) for videos. That means, we have to make it easier for the search engines to find out information about our videos, i.e. to index and rank them.

Because videos are binary data, a common Web search engine cannot extract information about this Web resource directly from it (let’s ignore signal analysis and automatic content analysis approaches for the moment). We have to help the search engine.

The solution is to have a text file sitting “next” to the actual video file which contains indexable text about the video. It will have all the annotations, meta data, tags, copyright information and other textual meta information that search engines require to index and rank it better. This text file is an indexable textual representation of the video.

So, whenever a video search engine reaches a video in a crawl, it will check out this text file for its indexing work. If this text file is HTML, then people may link directly to it and it will be included in the pagerank calculations again. If it is a XML file, there should be a simple way to transcode it to HTML, e.g. via a xslt script, so links can go there directly again.

So much for the theory: here comes the practice.

For every video file (and incidentally it would work for audio, too), you should start writing a CMML file and publish it on your Web server together with the original. Here is a xslt script that you can use to transcode CMML to HTML. If you actually use Ogg Theora as your Video publishing format, you can even publish Annodex videos and make direct access to the clips that you defined in CMML and to time offsets possible by using the Apache Annodex module. Try using it in your blog with the external embedding of the Annodex Firefox extension.

When we’ve done this, all that remains is to encourage the video search engines to exploit the CMML data in their crawls. 🙂

Running flumotion on Ubuntu

Flumotion is a streaming server product developed by Fluendo. Flumotion runs in a distributed environment, where the video capture, encoding, and transmission can be run on different computers, so the load can be better balanced.

I have found it rather difficult to find an introductory help on how to get flumotion set up and running, so I’ll share my insights with you here.

Imagine a setup where you want machine A to capture and encode the video from a DV camera, machine B relaying the stream onto the Internet to several clients, and machine C getting the stream off machine B and writing it to disk. The software that you’d need to run on each of these machines is the following:

  1. Run flumotion-manager on machine B. flumotion-manager is the central component of a flumotion streaming setup, which connects up all the components and makes sure that everything works. It has to run before anything else can happen.
  2. Run flumotion-worker on every machine where you want work to be done, i.e. on machine A, B, and C. The workers are demons that connect to the manager and wait for commands to do something.
  3. Run flumotion-admin on any machine to set up the details of the flumotion streaming setup.

So, here are the commands, that I use to get it running using the default setup:

  1. flumotion start
    (which will run flumotion-manager -D -n default /etc/flumotion/managers/default/planet.xml for you).
  2. flumotion-worker -u pants -p off &
    (yes, these are the default user name and password :).
  3. flumotion-admin
    (and go through the GUI setup wizard).

… and you should be up and going with either your DV camera, your Webcam or your TV tuner card. Watch the cute smileys go happy! And connect to the stream using your favorite media player that can decode Ogg Theora/Vorbis, e.g. totem, vlc, xine.

I’ve found online man pages of flumotion-manager, flumotion-worker, and flumotion-admin helpful, because the flumotion package that my Ubuntu dapper installation installed did not have them. You might actually be better off using Jeff Waugh’s packages for each of the flumotion commands if you are setting up on Ubuntu Dapper. Another hint: use the library theora-mmx to get better performance.

Flumotion is an excellent solution to setting up video streaming. I have found the following conferences have used it before:

  • GUADEC, June 2006, http://guadec.org/GUADEC2006/Live
  • DebConf, May 2006, http://technocrat.net/d/2006/5/12/3384
  • Linux Audio Conference, May 2006, http://lac.zkm.de/2006/streaming.shtml
  • Washington DC LUG, http://dclug.tux.org/webcast/