Attaching subtitles to HTML5 video

During the last week, I made a proposal to the HTML5 working group about how to support out-of-band time-aligned text in HTML5. What I mean by that is basically: how to link a subtitle file to a video tag in HTML5. This would mirror the way in which in desktop-players you can load separate subtitle files by hand to go alongside a video.

My suggestion is best explained by an example:


<video src="http://example.com/video.ogv" controls>
<text category="CC" lang="en" type="text/x-srt" src="caption.srt"></text>
<text category="SUB" lang="de" type="application/ttaf+xml" src="german.dfxp"></text>
<text category="SUB" lang="jp" type="application/smil" src="japanese.smil"></text>
<text category="SUB" lang="fr" type="text/x-srt" src="translation_webservice/fr/caption.srt"></text>
</video>

  • “text” elements are subelements of the “video” element and therefore clearly related to one video (even if it comes in different formats).
  • the “category” tag allows us to specify what text category we are dealing with and allows the web browser to determine how to display it. The idea is that there would be default display for the different categories and css would allow to override these.
  • the “lang” tag allows the specification of alternative resources based on language, which allows the browser to select one by default based on browser preferences, and also to turn those tracks on by default that a particular user requires (e.g. because they are blind and have preset the browser accordingly).
  • the “type” tag allows specification of what actual time-aligned text format is being used in this instance; again, it will allow the browser to determine whether it is able to decode the file and thus make it available through an interface or not.
  • the “src” attribute obviously points to the time-aligned text resource. This could be a file, a script that extracts data from a database, or even a web service that dynamically creates the data
    based on some input.

This proposal provides for a lot of flexibility and is somewhat independent of the media file format, while still enabling the Web browser to deal with the text (as long as it can decode it). Also note that this is not meant as the only way in which time-aligned text would be delivered to the Web browser – we are continuing to investigate how to embed text inside Ogg as a more persistent means of keeping your text with your media.

Of course you are now aching to see this in action – and this is where the awesomeness starts. There are already three implementations.

First, Jan Gerber independently thought out a way to provide support for srt files that would be conformant with the existing HTML5 tags. His solution is at http://v2v.cc/~j/jquery.srt/. He is using javascript to load and parse the srt file and map it into HTML and thus onto the screen. Jan’s syntax looks like this:


<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="jquery.srt.js"></script>

<video src="http://example.com/video.ogv" id="video" controls>
<div class="srt"
data-video="video"
data-srt="http://example.com/video.srt" />

Then, Michael Dale decided to use my suggested HTML5 syntax and add it to mv_embed. The example can be seen here – it’s the bottom of the two videos. You will need to click on the “CC” button on the player and click on “select transcripts” to see the different subtitles in English and Spanish. If you click onto a text element, the video will play from that offset. Michael’s syntax looks like this:


<video src="sample_fish.ogg" poster="sample_fish.jpg" duration="26">
<text category="SUB" lang="en" type="text/x-srt" default="true"
title="english SRT subtitles" src="sample_fish_text_en.srt">
</text>
<text category="SUB" lang="es" type="text/x-srt"
title="spanish SRT subtitles" src="sample_fish_text_es.srt">
</text>
</video>

Then, after a little conversation with the W3C Timed Text working group, Philippe Le Hegaret extended the current DFXP test suite to demonstrate use of the proposed syntax with DFXP and Ogg video inside the browser. To see the result, you’ll need Firefox 3.1. If you select the “HTML5 DFXP player prototype” as test player, you can click on the tests on the left and it will load the DFXP content. Philippe actually adapted Jan’s javascript file for this. And his syntax looks like this:


<video src="example.ogv" id="video" controls>
<text lang='en' type="application/ttaf+xml" src="testsuite/Content/Br001.xml"></text>
</video>

The cool thing about these implementations is that they all work by mapping the time-aligned text to HTML – and for DFXP the styling attributes are mapped to CSS. In this way, the data can be made part of the browser window and displayed through traditional means.

For time-aligned text that is multiplexed into a media file, we just have to do the same and we will be able to achieve the same functionality. Video accessibility in HTML5 – we’re getting there!

27 thoughts on “Attaching subtitles to HTML5 video

  1. Is a similar functionality/proposal planned for HTML5 audio? It might also be helpful for long (political) speeches, podcasts, etc.

    1. Brett Zamir, SMIL is much too complex to take it straight into HTML5. Possibly a subpart of it could be used for a specific purpose. But we have seen from SVG that taking over some SMIL markup isn’t always satisfactory – in fact it is often too restrictive and at the same time too broad. That doesn’t mean inspiration cannot be had from it though!

  2. Brett,

    XHTML+SMIL do not actually include any solutions for out-of-band time-aligned text. It is still much too complicated for what we are looking to solve.

    The basic problem is that SMIL still comes from a background of creating multimedia experiences. In particular it is about animation, content control, media objects, timing and synchronization, and transition effects. The idea is to create interactive experiences with SMIL. XHTML+SMIL doesn’t change that.

    HTML5 video comes from a *much* simpler approach: let’s just be able to include <video> as an element into a Web page and possibly associate some text with it. No animations, no interactivity, no transitions.

    Where the more complex functionality is required, SMIL should indeed be regarded. But not for the simple <video> or <audio> element.

  3. Pingback: Propuesta de subt
  4. Another new javascript has been released – this time as a greasmonkey script, that will automatically attache captions created in a wiki with the html5 video element. Felipe Sanches from Bazil published it at http://bighead.poli.usp.br/~juca/code/greasemonkey/wiki_subs.user.js and a test page is at http://www.gpopai.usp.br/subs/test_wikisubs.html .

    The greasemonkey script fetches the subtitles from the following wikipage:
    http://www.wstr.org/subs/index.php?title=Subtitles/URL/http://www.fabricio.org/talks/2009/fisl10/FISL10-Zuardi.ogg
    and asks for contribution of subtitles if it doesn’t exist.

  5. Since my interest in mobile computing confines my work to MPEG-4, I am more interested in HTML 5 constructs that can be used to switch between two or more alternate audio tracks and between the default (none) and one or more soft subtitle tracks. These tracks are easily installed (muxed) into an MPEG-4 file using applications such as Subler (http://code.google.com/p/subler/).

    You can see where I am heading with these at: http://hercules.gcsu.edu/~flowney/research/MPEG-4/subtitles/

    Previously, these things were handled nicely by plug-ins such as the QuickTime Plug-in so as we transition to HYML 5, there is a need (strong, I would say) for methods to switch between and among these tracks so as to be able to create a web interface that uses the same conventions to play video as popular applications such as iTunes.app, QuickTime Player X, the Videos.app on iPhone OS devices and others do. As far as I can tell, there are no such HTML 5 methods at present.

    I would very much like to hear your thoughts on this aspect of the challenge to render video in a more versatile and accessible manner.

  6. Frank, yes, totally agree on the usability aspect of this work – more than just accessibility!

    As for the label element – that is a tag on externally referenced timed tracks, so goes beyond just tracks inside the video resource. But indeed these attributes exist to allow for the creation of a uniform JavaScript API independent if tracks are sourced from internal data or external files.

  7. EasyCaptions should not be overlooked.
    http://pipwerks.com/2010/06/07/for-your-reading-pleasure-easycaptions/

    The uncompressed source javascript library is only 222 lines.

    Introducing EasyCaptions: A simple system for adding captions and an interactive transcript to online videos. EasyCaptions uses progressive enhancement to provide the best possible experience for all visitors, regardless of their browser’s JavaScript, HTML5 or Flash support.

    EasyCaptions is not a commercial offering. It requires a slightly different format than ttml xml.

  8. silvia – that is really elegant. To reacquaint myself with perl, I wrote a little script to parse the ttml output of SubtitleEditor to an EasyCaptions format. Now I want to do the same for your solution. Are you aware of any other subtitle editors that output ttml?

  9. I’ve generally stayed away from external subtitling preferring internal, track-based approaches for reasons of portability. I can see the benefit of external systems to highly automated processes that might use voice recognition to generate the transcript and subtitle code but that seems not to be well in hand at this point in history. Are there other, more tangible arguments in favor of external subtitles?

  10. @Frank do you mean by “internal” those where the subtitles are part of the video and by “external” those where subtitles are in a separate file? If so, both actually have their advantages and disadvantages.

    When “internal”, you author the subtitles/captions and encode them into the file and you are pretty sure to never lose them again (unless somebody decodes the file and rips them out). However, it is very difficult to make changes/corrections to such a file. Also, it is impossible to use the same file for different encodings (e.g. for a WebM and a Ogg and a MP4 file).

    So, from a production POV, keeping the text separate to the media content is much easier. It is also more easily distributed. It can also be stored as text in a DB and extracted dynamically to a text file on a Web server. And if you are supporting 100 or more languages, you really do not want all of these subtitle tracks inside the video resource, but only make them available on the server.

    So, overall, keeping subtitles external is actually a lot more flexible and scalable.

  11. Toyed a bit with subtitles in JS, myself, and came up with this
    HTML5 Subtitles, Video Javascript, BubblesJS, Bubbles JS

    it makes no use of the track tag though. Generally I am not overally pleased with the support of such features and track tag is only the tip of the iceberg in HTML5 Video. The fact that you need WebM, OGG, Mp4 ? CHECK. The fact that you cannot embed video in other pages without having links to the source etc (like youtube does)? CHECK. The fact that most native player don’t even have a “full screen” button? CHECK.

    It just feels to half-baked and I hope the W3C comes up with a strong and complete standard soon. It’s 2011… and still video is a problem…

  12. BTW – there is more to this than accessibility. These are pedagogically very useful in a number of ways.wow what a fast update crystal palace Actually i was searching on google i have seen ur website……
    nice huge regular collection ..awesome.

Comments are closed.