All posts by silvia

WordPress plugin for external videos updated

Over the last weeks I’ve updated my “external videos” wordpress plugin. I’ve fixed bugs and added some new functionality.

List of changes:

  • fixed a bug in attaching blog posts to videos for link-through from gallery overlays
  • allow re-attaching a different blog post to a video
  • added a shortcode that allows to link straight through to video pages instead of the overlay
  • fixed a bug on retrieval of keyframe for dotsub
  • added option to add the video posts to the site’s RSS feed
  • fixed a bug on image paths for the thickbox
  • made sure whenever a user goes to the admin page that the cron hook is active
  • changed some class names to avoid clashes with other plugins that people reported
  • turned simple_html_dom code into a class of its own to avoid clashes with other plugins that use this code, too
  • cleaning up entered data from surplus white space
  • styling fixes to the overlay on gallery
  • shielding against a bug with no videos on channels to retrieve yet

Download the new plugin version 0.13

Note: there is something weird going on with the wordpress plugins site, which still shows version 0.7 as the current one, but when you download it, it gets the latest version 0.12. If somebody knows how to fix this, that would be awesome. I think it also stops people from auto-updating this plugin, which is sad with this many improvements.
(I think I fixed it by actually changing the version number in the external-videos.php file – how silly of me – and thanks to the WordPress Forum person who pointed it out to me! Download 0.13 now.)

WebVTT explained

On Wednesday, I gave a talk at Google about WebVTT, the Web Video Text Track file format that is under development at the WHATWG for solving time-aligned text challenges for video.

I started by explaining all the features that WebVTT supports for captions and subtitles, mentioned how WebVTT would be used for text audio descriptions and navigation/chapters, and explained how it is included into HTML5 markup, such that the browser provides some default rendering for these purposes. I also mentioned the metadata approach that allows any timed content to be included into cues.

The talk slides include a demo of how the <track> element works in the browser. I’ve actually used the Captionator polyfill for HTML5 to make this demo, which was developed by Chris Giffard and is available as open source from GitHub.

The talk was recorded and has been made available as a Google Tech talk with captions and also a separate version with extended audio descriptions.

The slides of the talk are also available (best to choose the black theme).

I’ve also created a full transcript of the described video.

Get the WebVTT specification from the WHATWG Website.

Ideas for new HTML5 apps

At the recent Linux conference in Brisbane, Australia, I promised a free copy of my book to the person that could send me the best idea for an HTML5 video application. I later also tweeted about it.

While I didn’t get many emails, I am still impressed by the things people want to do. Amongst the posts were the following proposals:

  • Develop a simple video cutting tool to, say setting cut points and having a very simple backend taking the cut points and generating quick enough output. The cutting doesn’t need to retranscode.
  • Develop a polyfill for the track element
  • Use HTML5 video, especially the tracking between video and text, to better present video from the NZ Parliament.
  • Making a small MMO game using WebGL, HTML5 audio and WebSockets. I also want to use the same code for desktop and web.

These are all awesome ideas and I found it really hard to decide whom to give the free book to. In the end, I decided to give it to Brian McKenna, who is working on the MMO game – simply because it it is really pushing the boundaries of several HTML5 technologies.

To everyone else: the book is actually not that expensive to buy from APRESS or Amazon and you can get the eBook version there, too.

Thanks to everyone who started really thinking about this and sent in a proposal!

Linux.conf.au 2011: The Latest and Coolest with HTML5 Video

I gave a talk at LCA 2011 in Brisbane about some of the things that I have learnt and code I have developed during writing my book, see http://www.amazon.com/Definitive-Guide-HTML5-Video/dp/1430230908/ . The talk announcement: The new HTML 5 specification continues to change – a particularly large number of changes are still happening for audio and video. Not just that we were provided with a new open codec format called WebM which didn’t really change any functionality, but may eventually lead to a common baseline codec. But just in July 2010 features for accessibility and a new caption format called WebSRT have been introduced. Also, a new video API is being discussed that will expose analytics about the video performance, e.g. the number of dropped frames, the download rate, and the playback rate. Lastly, a audio data API is proposed that allows the programmer to access raw audio data and do cool thing such as frequency analysis. I will provide a brief introduction to the new HTML5 video and audio elements, their JavaScript API and already standardized and available functionality in modern Web Browsers, such as pixel manipulation through the Canvas or the application of SVG filters to videos. Then I will show some cool demos of what will be possible once the newer features are standardized and rolled out. This talk will contain lots of “bling”, i.e. lots of visual and aural demonstrations, but there will also be technical content at the level required by more or less hard-core Web developers. Do not expect a kernel talk from this though. For slides see: http://blog.gingertech.net/2011/01/27/html5-video-presentations-at-lca-2011/ Creative Commons licensed http://creativecommons.org/licenses/by-sa/3.0/ by Linux Australia

Category: Array
Uploaded by: Silvia Pfeiffer
Hosted: youtube

HTML5 Video Presentations at LCA 2011

Working in the WHAT WG and the W3C HTML WG, you sometimes forget that all the things that are being discussed so heatedly for standardization are actually leading to some really exciting new technologies that not many outside have really taken note of yet.

This week, during the Australian Linux Conference in Brisbane, I’ve been extremely lucky to be able to show off some awesome new features that browser vendors have implemented for the audio and video elements. The feedback that I got from people was uniformly plain surprise – nobody expected browser to have all these capabilities.

The examples that I showed off have mostly been the result of working on a book for almost 9 months of the past year and writing lots of examples of what can be achieved with existing implementations and specifications. They have been inspired by diverse demos that people made in the last years, so the book is linking to many more and many more amazing demos.

Incidentally, I promised to give a copy of the book away to the person with the best idea for a new Web application using HTML5 media. Since we ran out of time, please shoot me an email or a tweet (@silviapfeiffer) within the next 4 weeks and I will send another copy to the person with the best idea. The copy that I brought along was given to a student who wanted to use HTML5 video to display on surfaces of 3D moving objects.

So, let’s get to the talks.

On Monday, I gave a presentation on “Audio and Video processing in HTML5“, which had a strong focus on the Mozilla Audio API.

I further gave a brief lightning talk about “HTML5 Media Accessibility Update“. I am expecting lots to happen on this topic during this year.

Finally, I gave a presentation today on “The Latest and Coolest in HTML5 Media” with a strong focus on video, but also touching on audio and media accessibility.

The talks were streamed live – congrats to Ryan Verner for getting this working with support from Ben Hutchings from DebConf and the rest of the video team. The videos will apparently be available from http://linuxconfau.blip.tv/ in the near future.

UPDATE 4th Feb 2011: And here is my LCA talk …

with subtitles on YouTube:

Accessibility to Web video for the Vision-Impaired

In the past week, I was invited to an IBM workshop on audio/text descriptions for video in Japan. Geoff Freed and Trisha O’Connell from WGBH, and Michael Evans from BBC research were the other invited experts to speak about the current state of video accessibility around the world and where things are going in TV/digital TV and the Web.

The two day workshop was very productive. The first day was spent with presentations which were open to the public. A large vision-impaired community attended to understand where technology is going. It was very humbling to be part of an English-spoken workshop in Japan, where much of the audience is blind, but speaks English much better than my average experience with English in Japan. I met many very impressive and passionate people that are creating audio descriptions, adapting NVDA for the Japanese market, advocating to Broadcasters and Government to create more audio descriptions, and perform fundamental research for better tools to create audio descriptions. My own presentation was on “HTML5 Video Descriptions“.

On the second day, we only met with the IBM researchers and focused discussions on two topics:

  1. How to increase the amount of video descriptions
  2. HTML5 specifications for video descriptions

The first topic included concerns about guidelines for description authoring by beginners, how to raise awareness, who to lobby, and what production tools are required. I personally was more interested in the second topic and we moved into a smaller breakout group to focus on these discussions.

HTML5 specifications for video descriptions
Two topics were discussed related to video descriptions: text descriptions and audio descriptions. Text descriptions are descriptions authored as time-aligned text snippets and read out by a screen reader. Audio descriptions are audio recordings either of a human voice or even of a TTS (text-to-speech) synthesis – in either case, they are audio samples.

For a screen reader, the focus was actually largely on NVDA and people were very excited about the availability of this open source tool. There is a concern about how natural-sounding a screen reader can be made and IBM is doing much research there with some amazing results. In user experiment between WGBH and IBM they found that the more natural the voice sounds, the more people comprehend, but between a good screen reader and an actual human voice there is not much difference in the comprehension level. Broadcasters and other high-end producers are unlikely to accept TTS and will prefer the human voice, but for other materials – in particular for the large majority of content on the Web – TTS and screen readers can make a big difference.

An interesting lesson that I learnt was that video descriptions can be improved by 30% (i.e. 30% better comprehension) if we introduce extended descriptions, i.e. descriptions that can pause the main video to allow for a description be read for something that happens in the video, but where there is no obvious pause to read out the description. So, extended descriptions are one of the major challenges to get right.

We then looked at the path that we are currently progressing on in HTML5 with WebSRT, the TimedTrack API, the <track> elements and the new challenges around a multitrack API.

For text descriptions we identified a need for the following:

  • extension marker on cues: often it is very clear to the author of a description cue that there is no time for the cue to be read out in parallel to the main audio and the video needs to be paused. The proposal is for introduction of an extension marker on the cue to pause the video until the screen reader is finished. So, a speech-complete event from the screen reader API needs to be dealt with. To make this reliable, it might make sense to put a max duration on the cue so the video doesn’t end up waiting endlessly in case the screen reader event isn’t fired. The duration would be calculated based on a typical word speaking rate.
  • importance marker on cues: the duration of all text cues being read out by screen readers depends on the speed set-up of the screen reader. So, even when a cue has been created for a given audio break in the video, it may or may not fit into this break. For most cues it is important that they are read out completely before moving on, but for some it’s not. So, an importance maker could be introduced that determines whether a video stops at the end of the cue to allow the screen reader to finish, or whether the screen reader is silenced at that time no matter how far it has gotten.
  • ducking during cues: making the main audio track quieter in relation to the video description for the duration of a cue such as to allow the comprehension of the video description cue is important for comprehension
  • voice hints: an instruction at the beginning of the text description file for what voice to choose such that it won’t collide with e.g. the narrator voice of a video – typically the choice will be for a female voice when the narrator is male and the other way around – this will help initialize the screen reader appropriately
  • speed hints: an indicator at the beginning of a text description toward what word rate was used as the baseline for the timing of the cue durations such that a screen reader can be initialized with this
  • synthesis directives: while not a priority, eventually it will make for better quality synchronized text if it is possible to include some of the typical markers that speech synthesizers use (see e.g. SSML or speech CSS), including markers for speaker change, for emphasis, for pitch change and other prosody. It was, in fact, suggested that the CSS3’s speech module may be sufficient in particular since Opera already implements it.

This means we need to consider extending WebSRT cues with an “extension” marker and an “importance” marker. WebSRT further needs header-type metadata to include a voice and a speed hint for screen readers. The screen reader further needs to work more closely with the browser and exchange speech-complete events and hints for ducking. And finally we may need to allow for CSS3 speech styles on subparts of WebSRT cues, though I believe this latter one is not of high immediate importance.

For audio descriptions we identified a need for:

  • external/in-band descriptions: allowing external or in-band description tracks to be synchronized with the main video. It would be assumed in this case that the timeline of the description track is identical to the main video.
  • extended external descriptions: since it’s impossible to create in-band extended descriptions without changing the timeline of the main video, we can only properly solve the issue of extended audio descriptions through external resources. One idea that we came up with is to use a WebSRT file with links to short audio recordings as external extended audio descriptions. These can then be synchronized with the video and pause the video at the correct time etc through JavaScript. This is probably a sufficient solution for now. It supports both, sighted and vision-impaired users and does not extend the timeline of the original video. As an optimization, we can also do this through a single “virtual” resource that is a concatenation of the individual audio cues and is addressed through the WebSRT file with byte ranges.
  • ducking: making the main audio track quieter in relation to the video description for the duration of a cue such as to allow the comprehension of the video description cue is important for comprehension also with audio files, though it may be more difficult to realize
  • separate loudness control: making it possible for the viewer to separately turn the loudness of an audio description up/down in comparison to the main audio

For audio descriptions, we saw the need for introduction of a multitrack video API and markup to synchronize external audio description tracks with the main video. Extended audio descriptions should be solved through JavaScript and hooking up through the TimedTrack API, so mostly rolling it by hand at this stage. We will see how that develops in future. Ducking and separate loudness controls are equally needed here, but we do need more experiments in this space.

Finally, we discussed general needs to locate accessibility content such as audio descriptions by vision-impaired user:

  • the need for accessible user menus to turn on/off accessibility content
  • the introduction of dedicated and standardized keyboard short-cuts to turn on and manipulate the volume of audio descriptions (and captions)
  • the introduction of user preferences for automatically activating accessibility content; these could even learn from current usage, such that if a user activates descriptions for a video on one Website, the preferences pick this up; different user profiles are already introduced by ISO in “Access for all” and used in websites such as teachersdomain
  • means to generally locate accessibility content on the web, such as fields in search engines and RSS feeds
  • more generally there was a request to have caption on/off and description on/off buttons be introduced into remote controls of machines, which will become prevalent with the increasing amount of modern TV/Internet integrated devices

Overall, the workshop was a great success and I am keen to see more experimentation in this space. I also hope that some of the great work that was shown to us at IBM with extended descriptions and text descriptions will become available – if only as screencasts – so we can all learn from it to make better standards and technology.

Talk at Web Directions South, Sydney: HTML5 audio and video

On 14th October I gave a talk at Web Directions South on “HTML5 audio and video – using these exciting new elements in practice”.

I wanted to give people an introduction into how to use these elements while at the same time stirring their imagination as to the design possibilities now that these elements are available natively in browsers. I re-used some of the demos that I have put together for the book that I am currently writing, added some of the cool stuff that others have done and finished off with an outlook towards what new features will probably arrive next.

“Slides” are now available, which are really just a Web page with some demos that work in modern browsers.

Table of contents:

HTML5 Audio and Video

  1. Cross browser <video> element
  2. Cross browser <audio> element
  3. Encoding
  4. Fallback considerations
  5. CSS and <video> – samples
  6. <video> and the JavaScript API
  7. <video> and SVG
  8. <video> and Canvas
  9. <video> and Web Workers
  10. <video> and Accessibility
  11. audio plans