Talk at Web Directions South, Sydney: HTML5 audio and video

On 14th October I gave a talk at Web Directions South on “HTML5 audio and video – using these exciting new elements in practice”.

I wanted to give people an introduction into how to use these elements while at the same time stirring their imagination as to the design possibilities now that these elements are available natively in browsers. I re-used some of the demos that I have put together for the book that I am currently writing, added some of the cool stuff that others have done and finished off with an outlook towards what new features will probably arrive next.

“Slides” are now available, which are really just a Web page with some demos that work in modern browsers.

Table of contents:

HTML5 Audio and Video

  1. Cross browser <video> element
  2. Cross browser <audio> element
  3. Encoding
  4. Fallback considerations
  5. CSS and <video> – samples
  6. <video> and the JavaScript API
  7. <video> and SVG
  8. <video> and Canvas
  9. <video> and Web Workers
  10. <video> and Accessibility
  11. audio plans

6 thoughts on “Talk at Web Directions South, Sydney: HTML5 audio and video

  1. Hi Silvia, you might be interested in the script I have been working on to drop in TTML caption support to HTML 5 video( and there are some more use cases at

    Its a fairly complete implementaiton of TTML and it also has CSS integration, I’ll finishing it up and writing it up a bit over the coming weeks.(
    It seems to work in IE9, FF3 and Safari (although I only have a very old Mac, so the last isnt that well tested).


  2. @Sean Nice work! I am particularly impressed that you did the synchronization between the sign language video and the main video. It’s the only way to do it now and it somewhat works. I ran it in FF4 and it ran smoothly. In Chrome it was annoyingly flickering. There’s probably something in Chrome that removes the currently displayed image during a seek. It didn’t work in Opera. In Safari it was a lot less smooth than in FF4. I think it’s something we will want to refer to in the a11y TF!

    As for TTML – the problem with the demo is that it only uses bare bones TTML – nothing more than a simple start time/ end time/ text combination. This will just confirm that there is no need for anything more than that. I think we have to do better.


  3. The demo also includes an example with two simultaneous text sections, one for the descriptions and one for the captions, which is more than simple start/end timing. This is also the kind of thing that is needed when there are multiple speakers, or narration etc.

    In the second “use cases” link, I’m working on some more complex examples, including so far: highlighted transcript (which is also the same needs as for karaoke), a vertical text Ruby combination (needs writing-mode support which afaik is only in IE at the moment) and a hyperlink, and examples of ticker tape and rollup timing.

    Let me know what else you think would make good examples.

  4. @Sean – I just went back to the same link and the sign language is gone. Do you have multiple pages now? Might be good to have a page with links to the different examples.

    I like that you have extended it now also with furigana and the timed transcript etc. Just keep going through use cases of the a11y TF – I think everything is there.

Comments are closed.