Tag Archives: statistics

HTML5 video: 25% H.264 reach vs. 95% Ogg Theora reach

Vimeo started last week with a HTML5 beta test. They use the H.264 codec, probably because much of their content is already in this format through the Flash player.

But what really surprised me was their claim that roughly 25% of their users will be able to make use of their HTML5 beta test. The statement is that 25% of their users use Safari, Chrome, or IE with Chrome Frame. I wondered how they got to that number and what that generally means to the amount of support of H.264 vs Ogg Theora on the HTML5-based Web.

According to Statcounter’s browser market share statistics, the percentage of browsers that support HTML5 video is roughly: 31.1%, as summed up from Firefox 3.5+ (22.57%), Chrome 3.0+ (5.21%), and Safari 4.0+ (3.32%) (Opera’s recent release is not represented yet).

Out of those 31.1%,

8.53% browsers support H.264

and

27.78% browsers support Ogg Theora.

Given these numbers, Vimeo must assume that roughly 16% of their users have Chrome Frame in IE installed. That would be quite a number, but it may well be that their audience is special.

So, how is Ogg Theora support doing in comparison, if we allow such browser plugins to be counted?

With an installation of XiphQT, Safari can be turned into a browser that supports Ogg Theora. The Chome Frame installation will also turn IE into a Ogg Theora supporting browser. These could get the browser support for Ogg Theora up to 45%. Compare this to a claimed 48% of MS Silverlight support.

But we can do even better for Ogg Theora. If we use the Java Cortado player as a fallback inside the video element, we can capture all those users that have Java installed, which could be as high as 90%, taking Ogg Theora support potentially up to 95%, almost up to the claimed 99% of Adobe Flash.

I’m sure all these numbers are disputable, but it’s an interesting experiment with statistics and tells us that right now, Ogg Theora has better browser support than H.264.

UPDATE: I was told this article sounds aggressive. By no means am I trying to be aggressive – I am stating the numbers as they are right now, because there is a lot of confusion in the market. People believe they reach less audience if they publish in Ogg Theora compared to H.264. I am trying to straighten this view.

View counts on YouTube contradictory

UPDATE (6th February 2010): YouTube have just reacted to my bug and it seems there are some gData links that are more up-to-date than others. You need to go with the “uploads” gData APIs rather than the search or user ones to get accurate data. Glad YouTube told me and it’s documented now!

I am an avid user of YouTube Insight, the metrics tool that YouTube provides freely to everyone who publishes videos through them. YouTube Insight provides graphs on video views, the countries they originate in, demographics of the viewership, how the videos are discovered, engagement metrics, and hotspot analysis. It is a great tool to analyse the success of your videos, determine when to upload the next one, find out what works and what doesn’t.

However, you cannot rely on the accuracy of the numbers that YouTube Insight displays. In fact, YouTube provides three different means to find out what the current views (and other statistics, but let’s focus on the views) are for your videos:

  • the view count displayed on the video’s watch page
  • the view count displayed in YouTube Insight
  • the view count given in the gData API feed

The shocking reality is: for all videos I have looked at that are less than about a month old and keep getting views, all three numbers are different.

Sometimes they are just off by one or two, which is tolerable and understandable, since the data must be served from a number of load balanced servers or even server clusters and it would be difficult to keep all of these clusters at identical numbers all of the time.

However, for more than 50% of the videos I have looked at, the numbers are off by a substantial amount.

I have undertaken an analysis with random videos, where I have collected the gData views and the watch page views. The Insight data tends to be between these two numbers, but I cannot generally reach that data, so I have left it out of this analysis.

Here are the stats for 36 randomly picked videos in the 9 view-count classes defined by TubeMogul and by how much they are off at the time that I looked at them:

Class Video watch page gData API age diff percentage
>1M 1 7,187,174 6,082,419 2 weeks 1,104,755 15.37%
>1M 2 3,196,690 3,080,415 3 weeks 116,275 3.63%
>1M 3 2,247,064 1,992,844 1 week 254,220 11.31%
>1M 4 1,054,278 1,040,591 1 month 13,687 1.30%
100K-500K 5 476,838 148,681 11 days 328,157 68.82%
100K-500K 6 356,561 294,309 2 weeks 62,252 17.46%
100K-500K 7 225,951 195,159 2 weeks 30,792 13.63%
100K-500K 8 113,521 62,241 1 week 51,280 45.17%
10K-100K 9 86,964 46 4 days 86,918 99.95%
10K-100K 10 52,922 43,548 3 weeks 9,374 17.71%
10K-100K 11 34,001 33,045 1 month 956 2.81%
10K-100K 12 15,704 13,653 2 weeks 2,051 13.06%
5K-10K 13 9,144 8,967 1 month 117 1.94%
5K-10K 14 7,265 5,409 1 month 1,856 25.55%
5K-10K 15 6,640 5,896 2 weeks 744 11.20%
5K-10K 16 5,092 3,518 6 days 1,574 30.91%
2.5K-5K 17 4,955 4,928 3 weeks 27 0.91%
2.5K-5K 18 4,341 4,044 4 days 297 6.84%
2.5K-5K 19 3,377 3,306 3 weeks 71 2.10%
2.5K-5K 20 2,734 2,714 1 month 20 0.73%
1K-2.5K 21 2,208 2,169 3 weeks 39 1.77%
1K-2.5K 22 1,851 1,747 2 weeks 104 5.62%
1K-2.5K 23 1,281 1,244 1 week 37 2.89%
1K-2.5K 24 1,034 984 2 weeks 50 4.84%
500-1K 25 999 844 6 days 155 15.52%
500-1K 26 891 790 6 days 101 11.34%
500-1K 27 861 600 3 days 17 30.31%
500-1K 28 645 482 4 days 163 25.27%
100-500 29 460 436 10 days 24 5.22%
100-500 30 291 285 4 days 6 2.06%
100-500 31 256 198 3 days 58 22.66%
100-500 32 196 175 11 days 21 10.71%
0-100 33 88 74 10 days 14 15.90%
0-100 34 64 49 12 days 15 23.44%
0-100 35 46 21 5 days 25 54.35%
0-100 36 31 25 3 days 4 19.35%

The videos were chosen such that they were no more than a month old, but older than a couple of days. For older videos than about a month, the increase had generally stopped and the metrics had caught up, unless where the views were still increasing rapidly, which is an unusual case.

Generally, it seems that the host page has the right views. In contrast, it seems the gData interface is updated only once every week. It further seems from looking at YouTube channels where I have access to Insight that Insight is updated about every 4 days and it receives corrected data for the days in which it hadn’t caught up.

Further, it seems that YouTube make no differentiation between channels of partners and general users’ channels – both can have a massive difference between the watch page and gData. Most videos differ by less than 20%, but some have exceptionally high differences above 50% and even up to 99.95%.

The difference is particularly pronounced for videos that show a steep increase in views – the first few days tend to have massive differences. Since these are the days that are particularly interesting to monitor for publishers, having the gData interface lag behind this much is shocking.

Further, videos with a low number of views, in particular less than 100, also show a particularly high percentage in difference – sometimes an increase in view count isn’t reported at all in the gData API for weeks. It seems that YouTube treats the long tail worse than the rest of YouTube. For every video in this class, the absolute difference will be small – obviously less than 100 views. With almost 30% of videos being such videos, it is somewhat understandable that YouTube are not making the effort to update their views regularly. OTOH, these views may be particularly important to their publishers.

It seems to me that YouTube need to change their approach to updating statistics across the watch pages, Insight and gData.

Firstly, it is important to have the watch page, Insight and gData in sync – otherwise what number would you use in a report? If the gData API for YouTube statistics lags behind the watch page and Insight by even 24 hours, it is useless in indicating trends and for using in reports and people have to go back to screenscraping to gain information on the actual views of their videos.

Secondly, it would be good to update the statistics daily during the first 3-4 weeks, or as long as the videos are gaining views heavily. This is the important time to track the success of videos and if neither Insight nor gData are up to date in this time, and can even be almost 100% off, the statistics are actually useless.

Lastly, one has to wonder how accurate the success calculations are for YouTube partners, who rely on YouTube reporting to gain payment for advertising. Since the analysis showed that the inaccuracies extend also into partner channels, one has to hope that the data that is eventually reported through Insight is actually accurate, even if intermittently there are large differences.

Finally, I must say that I was rather disappointed with the way in which this issue has so far been dealt with in the YouTube Forums. The issues about wrongly reported view counts has been reported first more than a year ago and since regularly by diverse people. Some of the reports were really unfriendly with their demands. Still, I would have expected a serious reply by a YouTube employee about why there are issues and how they are going to be fixed or whether they will be fixed at all. Instead, all I found was a more than 9 month old mention that YouTube seems to be aware of the issue and working on it – no news since.

Also, I found no other blog posts analysing this issue, so here we are. Please, YouTube, let us know what is going on with Insight, why are the numbers off by this much, and what are you doing to fix it?

NB: I just posted a bug on gData, since we were unable to find any concrete bugs relating to this issue there. I’m actually surprised about this, since so many people reported it in the YouTube Forums!