Audio Track Accessibility for HTML5

I have talked a lot about synchronising multiple tracks of audio and video content recently. The reason was mainly that I foresee a need for more than two parallel audio and video tracks, such as audio descriptions for the vision-impaired or dub tracks for internationalisation, as well as sign language tracks for the hard-of-hearing.

It is almost impossible to introduce a good scheme to deliver the right video composition to a target audience. Common people will prefer bare a/v, vision-impaired would probably prefer only audio plus audio descriptions (but will probably take the video), and the hard-of-hearing will prefer video plus captions and possibly a sign language track . While it is possible to dynamically create files that contain such tracks on a server and then deliver the right composition, implementation of such a server method has not been very successful in the last years and it would likely take many years to roll out such new infrastructure.

So, the only other option we have is to synchronise completely separate media resource together as they are selected by the audience.

It is this need that this HTML5 accessibility demo is about: Check out the demo of multiple media resource synchronisation.

I created a Ogg video with only a video track (10m53s750). Then I created an audio track that is the original English audio track (10m53s696). Then I used a Spanish dub track that I found through BlenderNation as an alternative audio track (10m58s337). Lastly, I created an audio description track in the original language (10m53s706). This creates a video track with three optional audio tracks.

I took away all native controls from these elements when using the HTML5 audio and video tag and ran my own stop/play and seeking approaches, which handled all media elements in one go.

I was mostly interested in the quality of this experience. Would the different media files stay mostly in sync? They are normally decoded in different threads, so how big would the drift be?

The resulting page is the basis for such experiments with synchronisation.

The page prints the current playback position in all of the media files at a constant interval of 500ms. Note that when you pause and then play again, I am re-synching the audio tracks with the video track, but not when you just let the files play through.

I have let the files play through on my rather busy Macbook and have achieved the following interesting drift over the course of about 9 minutes:

Drift between multiple parallel played media elements

You will see that the video was the slowest, only doing roughly 540s, while the Spanish dub did 560s in the same time.

To fix such drifts, you can always include regular re-synchronisation points into the video playback. For example, you could set a timeout on the playback to re-sync every 500ms. Within such a short time, it is almost impossible to notice a drift. Don’t re-load the video, because it will lead to visual artifacts. But do use the video’s currentTime to re-set the others. (UPDATE: Actually, it depends on your situation, which track is the best choice as the main timeline. See also comments below.)

It is a workable way of associating random numbers of media tracks with videos, in particular in situations where the creation of merged files cannot easily be included in a workflow.

13 thoughts on “Audio Track Accessibility for HTML5

    1. @Olivier Yes, you are right. I guess I chose the video track because there’s only one of them and there are three audio tracks (at least in this example). It is difficult then to chose one audio track over the others as the main one. Also, since the video was the slowest, it made sense to re-sync the audio to it, since otherwise you might lose some frames. I guess, you just have to chose the best solution for your particular situation.

  1. I’m not sure that this very detached approach is correct, given that it makes it absolutely impossible to lip sync with devices such as my Bluetooth headset which introduces about a 1/5 second delay, which the video needs to be synchronised to, not the other way around.

  2. @Jeremy that’s quite a massive delay for a headset. How do you get lipsync with a normal video? In any case: the javascript is actually quite flexible and you can hack it together yourself in the way you need it.

  3. Yes, it’s a massive delay. My Bluetooth headset also comes with an analogue-to-Bluetooth dongle which lets me use it with a normal audio jack, e.g. MP3 players, etc. If I use that with my computer’s audio output, then lip sync is way off.

    However, when pairing the headset with my computer directly (no analogue

  4. The JavaScript resync makes sure that the media framework below is told to align the two based on playback time. This is at a higher level than the media framework and will be slightly less accurate, but will not be basically different to having them played back in sync by the media framework itself.

    If you run pulseaudio or alsa underneath, these tell the media framework (and thus the JavaScript) what time the device has managed to playback. So, you really want to use the audio as the synchronising timeline and align the video to it (which then would also lead to a small pause in the video). With multiple audio tracks, that becomes almost impossible indeed.

    It would be nice to build synchronisation ability into the browser so the browser can have the media framework do the synchronisation without the browser having to step in. But from talks with several browser vendors, this is a features that is not going to be implemented for a while. So, I have tried to find out how limited the use of JavaScript would be for this purpose and for some use cases it will work fine, while for others it will indeed be difficult to do well.

  5. Silvia,
    I know I’m starting to sound like a broken record:-) but it’s precisely to solve issues like this that SMIL got underway, 13 years ago. The syncBehavior/syncMaster/syncTolerance allow the document author fine-grained control over whether media items should be synchronised or not, and how tight that synchronisation should be.

    Of course, you still rely on the underlying engine to actually implement the syncing correctly, but at least the web page author doesn’t have to add tricky javascript to resync things.

  6. @Jack No need to be defensive about SMIL. I think SMIL did a great job in specifying synchronisation issues between all kinds of different input media for a multimedia presentation that can take on all sorts of compositions. It’s however not what the media elements in HTML5 need, since there is only one timeline and everything runs along that timeline.

    I would, btw, consider introducing a <par> element into HTML5 to group such independent media resources along a single combined timeline, if that was something that I thought HTML5 was ready for yet. I don’t think so, but I wanted to experiment with HTML5 and JavaScript to see if it was at least possible to imitate that functionality with existing HTML5 elements and JavaScript APIs. It is, alas not perfectly.

    Just as SMIL has to rely on the underlying media framework to get this right, an introduction of something like a <par> element would need to rely on the media framework to do this, too. It is particularly hard to do with media coming from different servers, since you also have to deal with network effects. So, I can’t see this happening in HTML5 in the near future.

  7. Thanks very much for this I am trying to figure out how to do an animated film that will run native on the ipad (the client says that flash won’t work). Your lag catchup suggestion was superb. I have to do an arrow effect.

    1. @jb As far as I know on the iPad only MPEG-4 video in Safari will work – Flash is definitely not supported. But I don’t own an iPad, so cannot really help.

  8. I don’t understand how to manage audio track. I guessed that by video timeout I extract the time witch video is now playing, but how to extract audio timeout and how to correct audio playback?

  9. @megas you get the video’s current playback position through video.currentTime and you set the audio’s playback position to the video’s position with audio.currentTime=video.currentTime . As simple as this.

Comments are closed.