A hangout with Sam discussing the HTML5 track element.
Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube
A hangout with Sam discussing the HTML5 track element.
Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube
People have been asking me lots of questions about WebVTT (Web Video Text Tracks) recently. Questions about its technical nature such as: are the features included in WebVTT sufficient for broadcast captions including positioning and colors? Questions about its standardisation level: when is the spec officially finished and when will it move from the WHATWG to the W3C? Questions about implementation: are any browsers supporting it yet and how can I make use of it now?
I’m going to answer all of these questions in this post to make it more efficient than answering tweets, emails, and skype and other phone conference requests. It’s about time I do a proper post about it.
Implementations
I’m starting with the last area, because it is the simplest to answer.
No, no browser has as yet shipped support for the <track> element and therefore there is no support for WebVTT in browsers yet. However, implementations are in progress. For example, Webkit has recently received first patches for the track element, but there is still an open bug for a WebVTT parser. Similarly, Firefox can now parse the track element, but is still working on the element’s actual functionality.
However, you do not have to despair, because there are now a couple of JavaScript polyfill libraries for either just the track element or for video players with track support. You can start using these while you are waiting for the browsers to implement native support for the element and the file format.
Here are some of the libraries that I’ve come across that will support SRT and/or WebVTT (do leave a comment if you come across more):
I am actually most excited about the work of Ronny Mennerich from LeanbackPlayer on WebVTT, since he has been the first to really attack full support of cue settings and to discuss with Ian, me and the WHATWG about their meaning. His review notes with visual description of how settings are to be interpreted and his demo will be most useful to authors and other developers.
Standardisation
Before we dig into the technical progress that has been made recently, I want to answer the question of “maturity”.
The WebVTT specification is currently developed at the WHATWG. It is part of the HTML specification there. When development on it started (under its then name WebSRT), it was also part of the HTML5 specification of the W3C. However, there was a concern that HTML5 should be independent of the chosen captioning format and thus WebVTT currently only exists at the WHATWG.
In recent months – and particularly since browser vendors have indicated that they will indeed implement support for WebVTT as their implementation of the <track> element – the question of formal standardization of WebVTT at the W3C has arisen. I’m involved in this as a Google contractor and we’ve put together a proposed charter for a WebVTT Working Group at the W3C.
In the meantime, standardization progresses at the WHATWG productively. Much feedback has recently been brought together by Ian and changes have been applied or at least prepared for a second feature set to be added to WebVTT once the first lot is implemented. I’ve captured the potentially accepted and rejected new features in a wiki page.
Many of the new features are about making the WebVTT format more useful for authoring and data management. The introduction of comments, inline CSS settings and default cue settings will help authors reduce the amount of styling they have to provide. File-wide metadata will help with the exchange of management information in professional captioning scenarios and archives.
But even without these new features, WebVTT already has all the features necessary to support professional captioning requirements. I’ve prepared a draft mapping of CEA-608 captions to WebVTT to demonstrate these capabilities (CEA-608 is the TV captioning standard in the US).
So, overall, WebVTT is in a great state for you to start implementing support for it in caption creation applications and in video players. There’s no need to wait any longer – I don’t expect fundamental changes to be made, but only new features to be added.
New WebVTT Features
This takes us straight to looking at the recently introduced new features.
Further to this, the email identifies the means in which WebVTT is extensible:
Given this background, the following V2 extensions have been discussed:
WEBVTT Language=zh Kind=Caption Version=V1_ABC License=CC-BY-SA 1 00:00:15.000 --> 00:00:17.950 first cue
WEBVTT DEFAULTS --> D:vertical A:end 00:00.000 --> 00:02.000 This is vertical and end-aligned. 00:02.500 --> 00:05.000 As is this. DEFAULTS --> A:start 00:05.500 --> 00:07.000 This is horizontal and start-aligned.
WEBVTT STYLE --> ::cue(v[voice=Bob]) { color: green; } ::cue(c.narration) { font-style: italic; } ::cue(c.narration i) { font-style: normal; } 00:00.000 --> 00:02.000 <v Bob>Welcome. 00:02.500 --> 00:05.000 <c .narration>To <i>WebVTT</i>.
WEBVTT COMMENT --> 00:02.000 --> 00:03.000 two; this is entirely commented out 00:06.000 --> 00:07.000 this part of the cue is visible <! this part isn't > <and neither is this>
Finally, I believe we still need to add the following features:
Aside from these changes to WebVTT, there are also some things that can be improved on the <track> element. I personally support the introduction of the source element underneath the track element, because that allows us to provide different caption files for different devices through the @media media queries attribute and it allows support for more than just one default captioning format. This change needs to be made soon so we don’t run into trouble with the currently empty track element.
I further think a oncuelistchange event would be nice as well in cases where the number of tracks is somehow changed – in particular when coming from within a media file.
Other than this, I’m really very happy with the state that we have achieved this far.