In the W3C Media Fragment Working Group (MFWG) we have had long discussions about the use of the URI query (“?”) or the URI fragment (“#”) addressing approach for addressing directly into media fragments, and the diverse new HTTP headers required to serve such URI requests, considering such side conditions as the stripping-off of fragment parameters from a URI by Web browsers, or the existence of caching Web proxies.
As explained earlier, URI queries request (primary) resources, while URI fragments address secondary resources, which have a relationship to their primary resource. So, in the strictest sense of their specifications, to address segments in media resources without losing the context of the primary resource, we can only use URI fragments.
Browser-supported Media Fragment URIs
For this reason, URI fragments are also the way in which my last media fragment addressing demo has been implemented. For example, I would address
Sylvia,
Cannot agree more with the “URI queries for media fragments” paragraph, because most of the time, redirecting from a seconds-range query to a bytes-range query will not work, especially with media containers you may find today on the web (i.e. FLV, MP4 & OGG).
A server rewrite is necessary most of the time, as it will synthetize some headers to create a new self-contained valid A/V fragment, according to each container type specificity (building an FLV metadata block is fairly easy, re-creating a MOOV atom in an MP4 is quite another story).
The response will also contain extra information pertaining to the parent document, so that media players may display the fragment at the right place in the whole document representation (either in-band using containers facilities, like FLV metadata or MP4 iTunes tags (!), or out-bound using HTTP headers, provided the media players have access to these headers).
Hi Pierre-Yves,
Redirecting from a seconds-range query to a bytes-range query actually does work very well with Ogg. It is implemented it in oggz-chop and mod_annodex, which will help the server to re-synthesize the headers for the now smaller file. However, it may well be very complicated for other container format, such as quicktime or MP4 as you point out.
I also agree: some container will support the delivery of the extra information required to correctly display the shortened resource (Ogg with skeleton does for temporal queries), but others may not and thus the extra HTTP headers are required. It will indeed take a while before all media players and Web browsers will support these extra headers.
The good thing is: it can all be done in stages. The most fundamental and important media fragment addressing using URI fragments has already been shown to work and is a trivial extension for Web browsers that have already implemented the HTML5 video tag. So, I am hopeful that we can get that support soon. The optimisations that I list and the different other options can be added over time. Server components will need to be written, media players/browsers/other UAs will need to be adapted, and ultimately Web proxies (such as squid) will want to learn about the new range headers and support them. Lots to do still!
Hi Silvia,
“It is implemented it in oggz-chop and mod_annodex, which will help the server to re-synthesize the headers for the now smaller file.” : yes, my point exactly. There is no “direct” mapping between a time-range and a byte-range for most container, and one need some server assistance to re-build a different document most of the time.
The only cases I can think about for MFWG[7.2] (the seconds-to-bytes range conversion through a Vary range referrer) are bare MP3 audio or MPEG-2 PS files, where data is organized as short closed GOP (so to speak for MP3 ^_^), with additional re-synchronization sequences at the beginning of each frame (these containers were specifically designed so hardware decoders could re-synchronize easily without relying on extra headers or index tables).
With today’s formats found on the Internet (FLV, MP4 and OGG), it’s not actually practical to do so. So either one build a server extension to “understand” the different containers and deal with them, or we are back to the client-side “try and guess” bytes-ranges requests we actually see in Quicktime player (iPhone clients) or HTML5 video players (Firefox and Chrome do exactly that). When most of your content is VBR-encoded, its obviously sub-optimal and error-prone (as you pointed out above, despite any clever algorithm you may throw at it).
We (at Dailymotion) built such a server extension from scratch, supporting the 3 above containers with full key-frames seeking and delivery throttling support, entirely at the server side (with some minor client-side tweaks (Flash player for now) to present the user with a consistent time-line while seeking through a given document). The extension primarily focused on efficiency so that we could activate the seeking and bandwidth throttling features without upgrading our current hardware (when it comes to tens of thousands of simultaneous delivered streams translating into tens of Gbps, you may find the exercise not so easy all of a sudden ^_^).
It also integrates nicely within our streaming architecture, with multi-layers caching and streams security management. We may release the source code at some point in an (hopefully soon) future, provided we can remove all Dailymotion-specific code from it. We are still trying to build an HTML5 video tag prototype with in- and out-of buffer seeking support, and are working closely with Mozilla on this to provide some additional access to the underlying HTTP connection headers.
Pierre-Yves
Great post, it helps to see how all the HTTP headers are meant to interact!
“It seems we can integrate the two without problems: the user agent can include both request ranges in one HTTP request.”
I have my doubts about this. In order to make a guess about the byte range the UA will need to know the duration of the resource. At least for Ogg, that means that we already need to have done at least 2 requests: 1 for headers and 1 at the end of the resource for getting the duration. The only exception is a server that supports X-Content-Duration but not time range requests, which seems unlikely.
I also would be surprised if there wasn’t a lot of server software that assumes that there will be at most 1 Range HTTP header and misbehaves otherwise.
It looks like there’s a lot to do but as you say support for media fragments will happen progressively rather than as one big step.
@pierre-yves: sounds like some awesome work that you’ve done! It would be great if that was available as open source.
@philip: I believe the request for the headers is actually part of “setting up the video tag”, i.e. it is done anyway by the browser to set up its decoding pipeline when it encounters a video tag. For the duration bit, I would indeed expect X-Content-Duration to be available.
There’s now a more in-depth specification of the HTTP header exchange for media fragments at http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/ . Feedback very welcome!