deck.js is one of the new HTML5-based presentation tools. It’s simple to use, in particular for your basic, every-day presentation needs. You can also create more complex slides with animations etc. if you know your HTML and CSS.
Yesterday at linux.conf.au (LCA), I gave a presentation using deck.js. But I didn’t give it from the lectern in the room in Perth where LCA is being held – instead I gave it from the comfort of my home office at the other end of the country.
I used my laptop with in-built webcam and my Chrome browser to give this presentation. Beforehand, I had uploaded the presentation to a Web server and shared the link with the organiser of my speaker track, who was on site in Perth and had set up his laptop in the same fashion as myself. His screen was projecting the Chrome tab in which my slides were loaded and he had hooked up the audio output of his laptop to the room speaker system. His camera was pointed at the audience so I could see their reaction.
I loaded a slide master URL: http://html5videoguide.net/presentations/lca_2014_webrtc/?master
and the room loaded the URL without query string: http://html5videoguide.net/presentations/lca_2014_webrtc/.
Then I gave my talk exactly as I would if I was in the same room. Yes, it felt exactly as though I was there, including nervousness and audience feedback.
How did we do that? WebRTC (Web Real-time Communication) to the rescue, of course!
We used one of the modules of the rtc.io project called rtc-glue to add the video conferencing functionality and the slide navigation to deck.js. It was actually really really simple!
Here are the few things we added to deck.js to make it work:
Code added to index.html to make the video connection work:
The iceServers config is required to punch through firewalls – you may also need a TURN server. Note that you need a signalling server – in our case we used http://rtc.io/switchboard/, which runs the code from rtc-switchboard.
Code added to index.html to synchronize slide navigation:
glue.events.once('connected', function(signaller) {
if (location.search.slice(1) !== '') {
$(document).bind('deck.change', function(evt, from, to) {
signaller.send('/slide', {
idx: to,
sender: signaller.id
});
});
}
signaller.on('slide', function(data) {
console.log('received notification to change to slide: ', data.idx);
$.deck('go', data.idx);
});
});
This simply registers a callback on the slide master end to send a slide position message to the room end, and a callback on the room end that initiates the slide navigation.
Feel free to write your own slides in this manner – I would love to have more users of this approach. It should also be fairly simple to extend this to share pointer positions, so you can actually use the mouse pointer to point to things on your slides remotely. Would love to hear your experiences!
Note that the slides are actually a talk about the rtc.io project, so if you want to find out more about these modules and what other things you can do, read the slide deck or watch the talk when it has been published by LCA.
Many thanks to Damon Oehlman for his help in getting this working.
BTW: somebody should really fix that print style sheet for deck.js – I’m only ever getting the one slide that is currently showing. 😉
I decided to use socket.io for the signalling following the idea of Luc, which made the server code even smaller and reduced it to a mere reflector:
var app = require('http').createServer().listen(1337);
var io = require('socket.io').listen(app);
io.sockets.on('connection', function(socket) {
socket.on('message', function(message) {
socket.broadcast.emit('message', message);
});
});
Then I turned to the client code. I was surprised to see the massive changes that PeerConnection has gone through. Check out my slide deck to see the different components that are now necessary to create a PeerConnection.
I was particularly surprised to see the SDP object now fully exposed to JavaScript and thus the ability to manipulate it directly rather than through some API. This allows Web developers to manipulate the type of session that they are asking the browsers to set up. I can imaging e.g. if they have support for a video codec in JavaScript that the browser does not provide built-in, they can add that codec to the set of choices to be offered to the peer. While it is flexible, I am concerned if this might create more problems than it solves. I guess we’ll have to wait and see.
I was also surprised by the need to use ICE, even though in my experiment I got away with an empty list of ICE servers – the ICE messages just got exchanged through the socket.io server. I am not sure whether this is a bug, but I was very happy about it because it meant I could run the whole demo on a completely separate network from the Internet.
The most exciting news since my talk is that Mozilla and Google have managed to get a PeerConnection working between Firefox and Chrome – this is the first cross-browser video conference call without a plugin! The code differences are minor.
Since the specification of the WebRTC API and of the MediaStream API are now official Working Drafts at the W3C, I expect other browsers will follow. I am also looking forward to the possibilities of:
multi-peer video conferencing like the efforts around webrtc.io,
The best places to learn about the latest possibilities of WebRTC are webrtc.org and the W3C WebRTC WG. code.google.com has open source code that continues to be updated to the latest released and interoperable features in browsers.
The video of my talk is in the process of being published. There is a MP4 version on the Linux Australia mirror server, but I expect it will be published properly soon. I will update the blog post when that happens.
A bit over a week ago I gave a presentation at Web Directions Code 2012 in Melbourne. Maxine and John asked me to speak about something related to HTML5 video, so I went for the new shiny: WebRTC – real-time communication in the browser.
I only had 20 min, so I had to make it tight. I wanted to show off video conferencing without special plugins in Google Chrome in just a few lines of code, as is the promise of WebRTC. To a large extent, I achieved this. But I made some interesting discoveries along the way. Demos are in the slide deck.
UPDATE: Opera 12 has been released with WebRTC support.
Housekeeping: if you want to replicate what I have done, you need to install a Google Chrome Web Browser 19+. Then make sure you go to chrome://flags and activate the MediaStream and PeerConnection experiment(s). Restart your browser and now you can experiment with this feature. Big warning up-front: it’s not production-ready, since there are still changes happening to the spec and there is no compatible implementation by another browser yet.
Here is a brief summary of the steps involved to set up video conferencing in your browser:
Set up a video element each for the local and the remote video stream.
Grab the local camera and stream it to the first video element.
(*) Establish a connection to another person running the same Web page.
Send the local camera stream on that peer connection.
Accept the remote camera stream into the second video element.
Now, the most difficult part of all of this – believe it or not – is the signalling part that is required to build the peer connection (marked with (*)). Initially I wanted to run completely without a server and just enter the remote’s IP address to establish the connection. This is, however, not a functionality that the PeerConnection object provides [might this be something to add to the spec?].
So, you need a server known to both parties that can provide for the handshake to set up the connection. All the examples that I have seen, such as https://apprtc.appspot.com/, use a channel management server on Google’s appengine. I wanted it all working with HTML5 technology, so I decided to use a Web Socket server instead.
I implemented my Web Socket server using node.js (code of websocket server). The video conferencing demo is in the slide deck in an iframe – you can also use the stand-alone html page. Works like a treat.
While it is still using Google’s STUN server to get through NAT, the messaging for setting up the connection is running completely through the Web Socket server. The messages that get exchanged are plain SDP message packets with a session ID. There are OFFER, ANSWER, and OK packets exchanged for each streaming direction. You can see some of it in the below image:
I’m not running a public WebSocket server, so you won’t be able to see this part of the presentation working. But the local loopback video should work.
At the conference, it all went without a hitch (while the wireless played along). I believe you have to host the WebSocket server on the same machine as the Web page, otherwise it won’t work for security reasons.
A whole new world of opportunities lies out there when we get the ability to set up video conferencing on every Web page – scary and exciting at the same time!
With the latest developments in HTML5 and the still fairly new ARIA (Accessible Rich Interface Applications) attributes introduced by the W3C WAI (Web Accessibility Initiative), browsers have now implemented many features that allow you to make your JavaScript-heavy Web applications accessible.
Since I began working on making a complex web application accessible just over a year ago, I discovered that there was no step-by-step guide to approaching the changes necessary for creating an accessible Web application. Therefore, many people believe that it is still hard, if not impossible, to make Web applications accessible. In fact, it can be approached systematically, as this article will describe.
This post is based on a talk that Alice Boxhall and I gave at the recent Linux.conf.au titled “Developing accessible Web apps – how hard can it be?” (slides, video), which in turn was based on a Google Developer Day talk by Rachel Shearer (slides).
These talks, and this article, introduce a process that you can follow to make your Web applications accessible: each step will take you closer to having an application that can be accessed using a keyboard alone, and by users of screenreaders and other accessibility technology (AT).
The recommendations here only roughly conform to the requirements of WCAG (Web Content Accessibility Guidelines), which is the basis of legal accessibility requirements in many jurisdictions. The steps in this article may or may not be sufficient to meet a legal requirement. It is focused on the practical outcome of ensuring users with disabilities can use your Web application.
Step-by-step Approach
The steps to follow to make your Web apps accessible are as follows:
Use native HTML tags wherever possible
Make interactive elements keyboard accessible
Provide extra markup for AT (accessibility technology)
If you are a total newcomer to accessibility, I highly recommend installing a screenreader and just trying to read/navigate some Web pages. On Windows you can install the free NVDA screenreader, on Mac you can activate the pre-installed VoiceOver screenreader, on Linux you can use Orca, and if you just want a browser plugin for Chrome try installing ChromeVox.
1. Use native HTML tags
As you implement your Web application with interactive controls, try to use as many native HTML tags as possible.
HTML5 provides a rich set of elements which can be used to both add functionality and provide semantic context to your page. HTML4 already included many useful interactive controls, like <a>, <button>, <input> and <select>, and semantic landmark elements like <h1>. HTML5 adds richer <input> controls, and a more sophisticated set of semantic markup elements like such as <time>, <progress>, <meter>, <nav>, <header>, <article> and <aside>. (Note: check browser support for browser support of the new tags).
Using as much of the rich HTML5 markup as possible means that you get all of the accessibility features which have been implemented in the browser for those elements, such as keyboard support, short-cut keys and accessibility metadata, for free. For generic tags you have to implement them completely from scratch.
What exactly do you miss out on when you use a generic tag such as <div> over a specific semantic one such as <button>?
Generic tags are not focusable. That means you cannot reach them through using the [tab] on the keyboard.
You cannot activate them with the space bar or enter key or perform any other keyboard interaction that would be regarded as typical with such a control.
Since the role that the control represents is not specified in code but is only exposed through your custom visual styling, screenreaders cannot express to their users what type of control it is, e.g. button or link.
Neither can screenreaders add the control to the list of controls on the page that are of a certain type, e.g. to navigate to all headers of a certain level on the page.
And finally you need to manually style the element in order for it to look distinctive compared to other elements on the page; using a default control will allow the browser to provide the default style for the platform, which you can still override using CSS if you want.
Example:
Compare these two buttons. The first one is implemented using a <div> tag, the second one using a <button> tag. Try using a screenreader to experience the difference.
Many sophisticated web applications have some interactive controls that just have no appropriate HTML tag equivalent. In this case, you will have had to build an interactive element with JavaScript and <div> and/or <span> tags and lots of custom styling. The good news is, it’s possible to make even these custom controls accessible, and as a side benefit you will also make your application smoother to use for power users.
The first thing you can do to test usability of your control, or your Web app, is to unplug the mouse and try to use only the [TAB] and [ENTER] keys to interact with your application.
Try the following:
Can you reach all interactive elements with [TAB]?
Can you activate interactive elements with [ENTER] (or [SPACE])?
Are the elements in the right tab order?
After interaction: is the right element in focus?
Is there a keyboard shortcut that activates the element (accesskey)?
No? Let’s fix it.
2.1. Reaching interactive elements
If you have an element on your page that cannot be reached with [TAB], put a @tabindex attribute on it.
Example:
Here we have a <span> tag that works as a link (don’t do this – it’s just a simple example). The first one cannot be reached using [TAB] but the second one has a tabindex and is thus part of the tab order of the HTML page.
(Note: since we experiment lots with the tabindex in this article, to avoid confusion, click on some text in this paragraph and then hit the [TAB] key to see where it goes next. The click will set your keyboard focus in the DOM.)
You set @tabindex=0 to add an element into the native tab order of the page, which is the DOM order.
2.2. Activating interactive elements
Next, you typically want to be able to use the [ENTER] and [SPACE] keys to activate your custom control. To do so, you will need to implement an onkeydown event handler. Note that the keyCode for [ENTER] is 13 and for [SPACE] is 32.
Example:
Let’s add this functionality to the <span> tag from before. Try tabbing to it and hit the [ENTER] or [SPACE] key.
function handlekey(event) {
var target = event.target || event.srcElement;
if (event.keyCode == 13 || event.keyCode == 32) { target.onclick(); }
}
Click
<span class="customlink" onclick="alert('activated!')" tabindex="0"
onkeydown="handlekey(event);">
Click
</span>
<script>
function handlekey(event) {
var target = event.target || event.srcElement;
if (event.keyCode == 13 || event.keyCode == 32) {
target.onclick();
}
}
</script>
Note that there are some controls that might need support for keys other than [tab] or [enter] to be able to use them from the keyboard alone, for example a custom list box, menu or slider should respond to arrow keys.
2.3. Elements in the right tab order
Have you tried tabbing to all the elements on your page that you care about? If so, check if the order of tab stops seems right. The default order is given by the order in which interactive elements appear in the DOM. For example, if your page’s code has a right column that is coded before the main article, then the links in the right column will receive tab focus first before the links in the main article.
You could change this by re-ordering your DOM, but oftentimes this is not possible. So, instead give the elements that should be the first ones to receive tab focus a positive @tabindex. The tab access will start at the smallest non-zero @tabindex value. If multiple elements share the same @tabindex value, these controls receive tab focus in DOM order. After that, interactive elements and those with @tabindex=0 will receive tab focus in DOM order.
Example:
The one thing that always annoys me the most is if the tab order in forms that I am supposed to fill in is illogical. Here is an example where the first and last name are separated by the address because they are in a table. We could fix it by moving to a <div> based layout, but let’s use @tabindex to demonstrate the change.
Be very careful with using non-zero tabindex values. Since they change the tab order on the page, you may get side effects that you might not have intended, such as having to give other elements on the page a non-zero tabindex value to avoid skipping too many other elements as I would need to do here.
2.4. Focus on the right element
Some of the controls that you create may be rather complex and open elements on the page that were previously hidden. This is particularly the case for drop-downs, pop-ups, and menus in general. Oftentimes the hidden element is not defined in the DOM right after the interactive control, such that a [TAB] will not put your keyboard focus on the next element that you are interacting with.
The solution is to manage your keyboard focus from JavaScript using the .focus() method.
Example:
Here is a menu that is declared ahead of the menu button. If you tab onto the button and hit enter, the menu is revealed. But your tab focus is still on the menu button, so your next [TAB] will take you somewhere else. We fix it by setting the focus on the first menu item after opening the menu.
You will notice that there are still some things you can improve on here. For example, after you close the menu again with one of the menu items, the focus does not move back onto the menu button.
Also, after opening the menu, you may prefer not to move the focus onto the first menu item but rather just onto the menu <div>. You can do so by giving that div a @tabindex and then calling .focus() on it. If you do not want to make the div part of the normal tabbing order, just give it a @tabindex=-1 value. This will allow your div to receive focus from script, but be exempt from accidental tabbing onto (though usually you just want to use @tabindex=0).
Bonus: If you want to help keyboard users even more, you can also put outlines on the element that is currently in focus using CSS”s outline property. If you want to avoid the outlines for mouse users, you can dynamically add a class that removes the outline in mouseover events but leaves it for :focus.
2.5. Provide sensible keyboard shortcuts
At this stage your application is actually keyboard accessible. Congratulations!
However, it’s still not very efficient: like power-users, screenreader users love keyboard shortcuts: can you imagine if you were forced to tab through an entire page, or navigate back to a menu tree at the top of the page, to reach each control you were interested in? And, obviously, anything which makes navigating the app via the keyboard more efficient for screenreader users will benefit all power users as well, like the ubiquitous keyboard shortcuts for cut, copy and paste.
HTML4 introduced so-called accesskeys for this. In HTML5 @accesskey is now allowed on all elements.
The @accesskey attribute takes the value of a keyboard key (e.g. @accesskey="x") and is activated through platform- and browser-specific activation keys. For example, on the Mac it’s generally the [Ctrl] key, in IE it’ the [Alt] key, in Firefox on Windows [Shift]-[Alt], and in Opera on Windows [Shift]-[ESC]. You press the activation key and the accesskey together which either activates or focuses the element with the @accesskey attribute.
Example:
var button = document.getElementById(‘accessbutton’);
if (button.accessKeyLabel) {
button.innerHTML += ‘ (‘ + button.accessKeyLabel + ‘)’;
}
Now, the idea behind this is clever, but the execution is pretty poor. Firstly, the different activation keys between different platforms and browsers make it really hard for people to get used to the accesskeys. Secondly, the key combinations can conflict with browser and screenreader shortcut keys, the first of which will render browser shortcuts unusable and the second will effectively remove the accesskeys.
In the end it is up to the Web application developer whether to use the accesskey attribute or whether to implement explicit shortcut keys for the application through key event handlers on the window object. In either case, make sure to provide a help list for your shortcut keys.
Also note that a page with a really good hierarchical heading layout and use of ARIA landmarks can help to eliminate the need for accesskeys to jump around the page, since there are typically default navigations available in screen readers to jump directly to headings, hyperlinks, and ARIA landmarks.
3. Provide markup for AT
Having made the application keyboard accessible also has advantages for screenreaders, since they can now reach the controls individually and activate them. So, next we will use a screenreader and close our eyes to find out where we only provide visual cues to understand the necessary interaction.
Here are some of the issues to consider:
Role may need to get identified
States may need to be kept track of
Properties may need to be made explicit
Labels may need to be provided for elements
This is where the W3C’s ARIA (Accessible Rich Internet Applications) standard comes in. ARIA attributes provide semantic information to screen readers and other AT that is otherwise conveyed only visually.
Note that using ARIA does not automatically implement the standard widget behavior – you’ll still need to add focus management, keyboard navigation, and change aria attribute values in script.
3.1. ARIA roles
After implementing a custom interactive widget, you need to add a @role attribute to indicate what type of controls it is, e.g. that it is playing the role of a standard tag such as a button.
Example:
This menu button is implemented as a <div>, but with a role of “button” it is announced as a button by a screenreader.
Menu
<div tabindex="0" role="button">Menu</div>
ARIA roles also describe composite controls that do not have a native HTML equivalent.
Example:
This menu with menu items is implemented as a set of <div> tags, but with a role of “menu” and “menuitem” items.
Some interactive controls represent different states, e.g. a checkbox can be checked or unchecked, or a menu can be expanded or collapsed.
Example:
The following menu has states on the menu items, which are here not just used to give an aural indication through the screenreader, but also a visual one through CSS.
Some of the functionality of interactive controls cannot be captured by the role attribute alone. We have ARIA properties to add features that the screenreader needs to announce, such as aria-label, aria-haspopup, aria-activedescendant, or aria-live.
Example:
The following drop-down menu uses aria-haspopup to tell the screenreader that there is a popup hidden behind the menu button together with an ARIA state of aria-expanded to track whether it’s open or closed.
var button = document.getElementById(“button”);
var menu = document.getElementById(“menu”);
var items = document.getElementsByClassName(“menuitem”);
var focused = 0;
function showMenu(evt) {
evt.stopPropagation();
menu.style.visibility = ‘visible’;
button.setAttribute(‘aria-expanded’,’true’);
focused = getSelected();
items[focused].focus();
}
function hideMenu(evt) {
evt.stopPropagation();
menu.style.visibility = ‘hidden’;
button.setAttribute(‘aria-expanded’,’false’);
button.focus();
}
function getSelected() {
for (var i=0; i < items.length; i++) {
if (items[i].getAttribute('aria-checked') == 'true') {
return i;
}
}
}
function setSelected(elem) {
var curSelected = getSelected();
items[curSelected].setAttribute('aria-checked', 'false');
elem.setAttribute('aria-checked', 'true');
}
function selectItem(evt) {
setSelected(evt.target);
hideMenu(evt);
}
function getPrevItem(index) {
var prev = index – 1;
if (prev < 0) {
prev = items.length – 1;
}
return prev;
}
function getNextItem(index) {
var next = index + 1;
if (next == items.length) {
next = 0;
}
return next;
}
function handleButtonKeys(evt) {
evt.stopPropagation();
var key = evt.keyCode;
switch(key) {
case (13): /* ENTER */
case (32): /* SPACE */
showMenu(evt);
default:
}
}
function handleMenuKeys(evt) {
evt.stopPropagation();
var key = evt.keyCode;
switch(key) {
case (38): /* UP */
focused = getPrevItem(focused);
items[focused].focus();
break;
case (40): /* DOWN */
focused = getNextItem(focused);
items[focused].focus();
break;
case (13): /* ENTER */
case (32): /* SPACE */
setSelected(evt.target);
hideMenu(evt);
break;
case (27): /* ESC */
hideMenu(evt);
break;
default:
}
}
button.addEventListener('click', showMenu, false);
button.addEventListener('keydown', handleButtonKeys, false);
for (var i = 0; i < items.length; i++) {
items[i].addEventListener('click', selectItem, false);
items[i].addEventListener('keydown', handleMenuKeys, false);
}
<div class="custombutton" id="button" tabindex="0" role="button"
aria-expanded="false" aria-haspopup="true">
<span>Justify</span>
</div>
<div role="menu" class="menu" id="menu" style="display: none;">
<div tabindex="0" role="menuitem" class="menuitem" aria-checked="true">
Left
</div>
<div tabindex="0" role="menuitem" class="menuitem" aria-checked="false">
Center
</div>
<div tabindex="0" role="menuitem" class="menuitem" aria-checked="false">
Right
</div>
</div>
[CSS and JavaScript for example omitted]
3.4. Labelling
The main issue that people know about accessibility seems to be that they have to put alt text onto images. This is only one means to provide labels to screenreaders for page content. Labels are short informative pieces of text that provide a name to a control.
There are actually several ways of providing labels for controls:
on img elements use @alt
on input elements use the label element
use @aria-labelledby if there is another element that contains the label
use @title if you also want a label to be used as a tooltip
otherwise use @aria-label
I’ll provide examples for the first two use cases – the other use cases are simple to deduce.
Example:
The following two images show the rough concept for providing alt text for images: images that provide information should be transcribed, images that are just decorative should receive an empty @alt attribute.
When marking up decorative images with an empty @alt attribute, the image is actually completely removed from the accessibility tree and does not confuse the blind user. This is a desired effect, so do remember to mark up all your images with @alt attributes, even those that don’t contain anything of interest to AT.
Example:
In the example form above in Section 2.3, when tabbing directly on the input elements, the screen reader will only say “edit text” without announcing what meaning that text has. That’s not very useful. So let’s introduce a label element for the input elements. We’ll also add checkboxes with a label.
In this example we use several different approaches to show what a different it makes to use the <label> element to mark up input boxes.
The first two fields just have a <label> element next to a <input> element. When using a screenreader you will not notice a difference between this and not using the <label> element because there is no connection between the <label> and the <input> element.
In the third field we use the @for attribute to create that link. Now the input field isn’t just announced as “edit text”, but rather as “Lastname edit text”, which is much more useful. Also, the screenreader can now skip the labels and get straight on the input element.
In the fourth and fifth field we actually encapsulate the <input> element inside the <label> element, thus avoiding the need for a @for attribute, though it doesn’t hurt to explicity add it.
Finally we look at the checkbox. By including a referenced <label> element with the checkbox, we change the screenreaders announcement from just “checkbox not checked” to “Remember me checkbox not checked”. Also notice that the click target now includes the label, making the checkbox not only more usable to screenreaders, but also for mouse users.
4. Conclusions
This article introduced a process that you can follow to make your Web applications accessible. As you do that, you will noticed that there are other things that you may need to do in order to give the best experience to a power user on a keyboard, a blind user using a screenreader, or a vision-impaired user using a screen magnifier. But once you’ve made a start, you will notice that it’s not all black magic and a lot can be achieved with just a little markup.
I’m pretty proud of this, which is why I’m dedicating a short blog post to it: today, John and I released my first WordPress plugin as open source to the WordPress plugins site.
It’s got the boring name “External Videos” and builds a bridge between your WordPress instance and videos of channels on a video hosting site – currently supported are YouTube, Vimeo, and DotSub.
It does this by using a brand-new feature to be introduced in WordPress 3: custom post types.
Check out the screenshots on the plugins page to see more – I’m unfortunately not yet running this Website with WordPress 3, so am not yet using this plugin’s features.
In the admin interface of WordPress, you enter the video channels that you want to pull videos from. Then it goes and pulls the videos with their metadata from these sites and creates video posts for them. That pulling is done once a day to update with new posts. The videos can be looked at in the admin interface under a separate video post section. They can be linked to WordPress posts and pages where the video may be discussed in context.
The video posts can be exposed on the WordPress site through a gallery, which is created by a short code, that can be added to any WordPress page. The gallery of thumbnails clicks through to an overlay with each video and its metadata as well as a link to the related WordPress post.
You can also add a widget to the side bar of the WordPress site with links to the most recent videos.
There are many more features that I want to develop for this plugin. I’d of course like to move it to HTML5 video instead of Adobe Flash. But for now I am happy with it.
I’d like to say thank you to John Ferlito, who helped with some of the coding, to Jeff Waugh for suggesting that it would best be developed using the new post types feature, and to Senator Kate Lundy and Pia Waugh at her office, who funded a part of the development. I am hoping they will find it useful to give their awesome collection of videos better exposure.
Recently, I was asked to review the W3C Media Annotations specifications as they are about to go into Last Call (a state that comes before the request for implementations at the W3C).
The W3C Media Annotations group has defined a set of metadata that they believe is representative and common for media resources. The ontology consist of the following fields:
ma:identifier: a URI or string to identify a resource
ma:title: a string providing the title of the resource
ma:language: a language code describing the language used in the resource
ma:locator: the URI at which the resource can be accessed
ma:contributor: a URI or string identifying the contributor and the nature of the contribution
ma:creator: a URI or string identifying an author
ma:createDate: a date of creation or publication of the resource
ma:location: a string or geo code identifying where the resource has been shot/recorded
ma:description: a string describing the content of the resource
ma:keyword: a word or word combination providing a topic, keyword or tag representing the resource
ma:genre: a string providing the genre of the resource
ma:rating: rating value, including the rating scale
ma:relation: a URI and string identifying a related resource and the relationship
ma:collection: a URI or string providing the name of a collection to which the resource belongs
ma:copyright: a URI or string with the copyright statement.
ma:license: a string or URI with the usage license
ma:publisher: a string or URI with the publisher of the resource
ma:targetAudience: a URI and classification string providing the issuer of the classification and the classification value
ma:fragments: a list of string and URI values that identify media fragments and their type
ma:namedFragments: a list of string and URI values the provide names to media fragments
ma:frameSize: a width – height pair in pixels
ma:compression: a string providing the compression algorithm
ma:duration: a float to provide the resource duration in seconds
ma:format String: the mime type of the resource
ma:samplingrate: a float with the audio sampling rate
ma:framerate: a float with the video frame rate
ma:bitrate: a float providing the average bit rate in kbps
ma:numTracks: an int of the number of tracks
Note that some of these fields are not single values, but simple constructs of multiple values. Thus, they are actually more complex than name-value pairs that, e.g. are typically used in HTML meta headers or in Dublin Core. I regard this as an issue for implementations.
The fields were chosen as typical metadata being available about media resources. The media fragments fields are a bit dubious in this respect, but could be useful in future.
The metadata is determined either from within the resource itself or from a metadata collection about the resource. As such, the document maps several existing metadata and media resource formats to this interface, amongst them:
As they didn’t have a mapping table for Ogg content, I offered the following:
MAWG
Relation
Ogg properties
How to do the mapping
Datatype
Descriptive Properties (Core Set)
Identification
ma:identifier
exact
Name
Name field in skeleton header (new)
String
ma:title
exact
Title
TITLE field in vorbiscomment header
String
exact
Title
Title field in skeleton header (new)
String
related
Album
ALBUM title in vorbiscomment header
String
ma:language
exact
Language
Language field in skeleton header (new)
language code
ma:locator
exact
file URI from system
URI
Creation
ma:contributor
exact
Artist, Performer
ARTIST and PERFORMER vorbiscomment headers
Strings
ma:creator
related
Organization
ORGANIZATION field in vorbiscomment header
ma:createDate
exact
Date
DATE field in vorbiscomment header
ISO date format
ma:location
exact
Location
LOCATION field in vorbiscomment header
String
Content description
ma:description
exact
Description
DESCRIPTION field in vorbiscomment header
String
ma:keyword
N/A
ma:genre
exact
Genre
GENRE field in vorbiscomment header
String
ma:rating
N/A
Relational
ma:relation
related
Version, Tracknumber
VERSION (version of a title), TRACKNUMBER (CD track) fields in vorbiscomment header
Strings
ma:collection
related
Album
ALBUM field of vorbiscomment header
String
Rights
ma:copyright
exact
Copyright
COPYRIGHT field of vorbiscomment header
String
ma:license
exact
License
LICENSE field of vorbiscomment header
String
Distribution
ma:publisher
related
Organization
ORGNIZATION field of vorbiscomment header
String
ma:targetAudience
more specific
Role
Role field of Skeleton header (new)
String
Fragments
ma:fragments
N/A
ma:namedFragments
N/A
Technical Properties
ma:frameSize
exact
extract from binary header of video track
int, int (width x height)
ma:compression
exact
Content-type
Content-type field of Skeleton header
MIME type
ma:duration
exact
calculate as duration = last_sample_time – first_sample_time of OggIndex header of skeleton
Float (or rather: rational – rational)
ma:format
exact
Content-type
Content-type field of Skeleton header
MIME type
ma:samplingrate
exact
calculate as granulerate = granulerate_numerator / granulerate_denominator of Skeleton header
Rational (or rather int / int)
ma:framerate
exact
calculate as granulerate = granulerate_numerator / granulerate_denominator of Skeleton header
Rational (or rather int / int)
ma:bitrate
exact
calculate as bitrate = length_of_segment / duration from OggIndex headers of skeleton
Float
ma:numTracks
exact
Tracknumber
TRACKNUMBER field of vorbiscomment header (track number on album)
Int
You will notice that the table mentions 4 fields in skeleton with a “new” marker – they are actually proposed fields in skeleton – a bit of coding will be necessary to introduce them into software. The space for these fields already exists in message header fields, so it won’t require a change of the skeleton format.
In the second specification of the Media Annotations WG, the group offers a standard API to access (i.e. read) the defined fields. They also intend to create an API to write the fields, but I doubt that will be easy because of the vast amount of file types they intend to support.
There is basically a single function that allows the extraction of metadata:
MAObject[] getProperty(in DOMString propertyName, in optional DOMString sourceFormat, in optional DOMString subtype, in optional DOMString language, in optional DOMString fragment );
I proposed it may be possible to include this into HTML5 as follows:
interface HTMLMediaElement : HTMLElement {
...
getter MAObject getProperty(in DOMString propertyName, in optional unsigned long trackIndex);
...
}
This would either extract the property for a particular track in a media resource or for the complete resource if no track index is given. The only problem I see is that the returned object is different depending on the requested property – the MAObject is only a parent class for the returned object types. I am not sure it is therefore possible to specify this easily in HTML5.
Overall I thought the specification was a nice piece of work. I am not sure I agree with all the chosen fields, but that is always an issue with metadata. The most important fields are there and that’s what matters.
At the recent FOMS/LCA in Wellington, New Zealand, we talked a lot about how Ogg could support accessibility. Technically, this means support for multiple text tracks (subtitles/captions), multiple audio tracks (audio descriptions parallel to main audio track), and multiple video tracks (sign language video parallel to main video track).
Creating multitrack Ogg files
The creation of multitrack Ogg files is already possible using one of the muxing applications, e.g. oggz-merge. For example, I have my own little collection of multitrack Ogg files at http://annodex.net/~silvia/itext/elephants_dream/multitrack/. But then you are stranded with files that no player will play back.
Multitrack Ogg in Players
As Ogg is now being used in multiple Web browsers in the new HTML5 media formats, there are in particular requirements for accessibility support for the hard-of-hearing and vision-impaired. Either multitrack Ogg needs to become more of a common case, or the association of external media files that provide synchronised accessibility data (captions, audio descriptions, sign language) to the main media file needs to become a standard in HTML5.
As it turn out, both these approaches are being considered and worked on in the W3C. Accessibility data that are audio or video tracks will in the near future have to come out of the media resource itself, but captions and other text tracks will also be available from external associated elements.
The availability of internal accessibility tracks in Ogg is a new use case – something Ogg has been ready to do, but has not gone into common usage. MPEG files on the other hand have for a long time been used with internal accessibility tracks and thus frameworks and players are in place to decode such tracks and do something sensible with them. This is not so much the case for Ogg.
For example, a current VLC build installed on Windows will display captions, because Ogg Kate support is activated. A current VLC build on any other platform, however, has Ogg Kate support deactivated in the build, so captions won’t display. This will hopefully change soon, but we have to look also beyond players and into media frameworks – in particular those that are being used by the browser vendors to provide Ogg support.
Multitrack Ogg in Browsers
Hopefully gstreamer (which is what Opera uses for Ogg support) and ffmpeg (which is what Chrome uses for Ogg support) will expose all available tracks to the browser so they can expose them to the user for turning on and off. Incidentally, a multitrack media JavaScript API is in development in the W3C HTML5 Accessibility Task Force for allowing such control.
The current version of Firefox uses liboggplay for Ogg support, but liboggplay’s multitrack support has been sketchy this far. So, Viktor Gal – the liboggplay maintainer – and I sat down at FOMS/LCA to discuss this and Viktor developed some patches to make the demo player in the liboggplay package, the glut-player, support the accessibility use cases.
I applied Viktor’s patch to my local copy of liboggplay and I am very excited to show you the screencast of glut-player playing back a video file with an audio description track and an English caption track all in sync:
Further developments
There are still important questions open: for example, how will a player know that an audio description track is to be played together with the main audio track, but a dub track (e.g. a German dub for an English video) is to be played as an alternative. Such metadata for the tracks is something that Ogg is still missing, but that Ogg can be extended with fairly easily through the use of the Skeleton track. It is something the Xiph community is now working on.
Summary
This is great progress towards accessibility support in Ogg and therefore in Web browsers. And there is more to come soon.
Recently, I was asked for some help on coding with an HTML5 video element and its events. In particular the question was: how do I display the time position that somebody seeked to in a video?
Here is a code snipped that shows how to use the seeked event:
<video onseeked="writeVideoTime(this.currentTime);" src="video.ogv" controls></video>
<p>position:</p><div id="videotime"></div>
<script type="text/javascript">
// get video element
var video = document.getElementsByTagName("video")[0];
function writeVideoTime(t) {
document.getElementById("videotime").innerHTML=t;
}
</script>
Other events that can be used in a similar way are:
loadstart: UA requests the media data from the server
progress: UA is fetching media data from the server
suspend: UA is on purpose idling on the server connection mid-fetching
abort: UA aborts fetching media data from the server
error: UA aborts fetching media because of a network error
emptied: UA runs out of network buffered media data (I think)
stalled: UA is waiting for media data from the server
play: playback has begun after play() method returns
pause: playback has been paused after pause() method returns
loadedmetadata: UA has received all its setup information for the media resource, duration and dimensions and is ready to play
loadeddata: UA can render the media data at the current playback position for the first time
waiting: playback has stopped because the next frame is not available yet
playing: playback has started
canplay: playback can resume, but at risk of buffer underrun
canplaythrough: playback can resume without estimated risk of buffer underrun
seeking: seeking attribute changed to true (may be too short to catch)
seeked: seeking attribute changed to false
timeupdate: current playback position changed enough to report on it
ended: playback stopped at media resource end; ended attribute is true
ratechange: defaultPlaybackRate or playbackRate attribute have just changed
durationchange: duration attribute has changed
volumechange:volume attribute or the muted attribute has changed
I have talked a lot about synchronising multiple tracks of audio and video content recently. The reason was mainly that I foresee a need for more than two parallel audio and video tracks, such as audio descriptions for the vision-impaired or dub tracks for internationalisation, as well as sign language tracks for the hard-of-hearing.
It is almost impossible to introduce a good scheme to deliver the right video composition to a target audience. Common people will prefer bare a/v, vision-impaired would probably prefer only audio plus audio descriptions (but will probably take the video), and the hard-of-hearing will prefer video plus captions and possibly a sign language track . While it is possible to dynamically create files that contain such tracks on a server and then deliver the right composition, implementation of such a server method has not been very successful in the last years and it would likely take many years to roll out such new infrastructure.
So, the only other option we have is to synchronise completely separate media resource together as they are selected by the audience.
I created a Ogg video with only a video track (10m53s750). Then I created an audio track that is the original English audio track (10m53s696). Then I used a Spanish dub track that I found through BlenderNation as an alternative audio track (10m58s337). Lastly, I created an audio description track in the original language (10m53s706). This creates a video track with three optional audio tracks.
I took away all native controls from these elements when using the HTML5 audio and video tag and ran my own stop/play and seeking approaches, which handled all media elements in one go.
I was mostly interested in the quality of this experience. Would the different media files stay mostly in sync? They are normally decoded in different threads, so how big would the drift be?
The resulting page is the basis for such experiments with synchronisation.
The page prints the current playback position in all of the media files at a constant interval of 500ms. Note that when you pause and then play again, I am re-synching the audio tracks with the video track, but not when you just let the files play through.
I have let the files play through on my rather busy Macbook and have achieved the following interesting drift over the course of about 9 minutes:
You will see that the video was the slowest, only doing roughly 540s, while the Spanish dub did 560s in the same time.
To fix such drifts, you can always include regular re-synchronisation points into the video playback. For example, you could set a timeout on the playback to re-sync every 500ms. Within such a short time, it is almost impossible to notice a drift. Don’t re-load the video, because it will lead to visual artifacts. But do use the video’s currentTime to re-set the others. (UPDATE: Actually, it depends on your situation, which track is the best choice as the main timeline. See also comments below.)
It is a workable way of associating random numbers of media tracks with videos, in particular in situations where the creation of merged files cannot easily be included in a workflow.
We basically taught people how to create and publish Ogg Theora video in HTML5 Web pages and how to make them work across browsers, including much of the available tools and libraries. We’re hoping that some people will have learnt enough to include modules in CMSes such as Drupal, Joomla and WordPress, which will easily support the publishing of Ogg Theora.
I have been asked to share the material that we used. It consists of: