Quantcast
Channel: Tristan Louis (TNL.net) »» eBay
Viewing all articles
Browse latest Browse all 24

The state of HTML validation

$
0
0

The original address for this post is The state of HTML validation. If you're reading it on another site, please stop by and visit.

There’s been a lot of talk about HTML5 recently and, in some geek circles, there have been snickers when companies have done a poor job of implementing it. But what is the true state of html5. To find out, I decided to check whether the top sites on the internet had implemented it and how successful they were in doing so.

Methodology

One of the first thing in this effort was to get a decent list of sites. Unfortunately, it seems that it has become increasingly difficult to get a sense of which sites are the most popular when it comes to number of visits. I eventually settled down on Alexa’s Top Sites list because it featured most of the sites people think of when considering what large sites are and includes a few non-US sites.

I then used the W3C Validator against each of the top 25 sites. This allowed me to get 3 different pieces of information:

  • Doctype: This is what the site declares as its HTML code version. In other words, how the site identifies what version of HTML it supports.
  • Encoding: This is the language the site uses, which gives us a better understanding as to whether they are targeting a particular language or trying to offer a global site.
  • Validation: This is how the site validated when tested for errors relating to the HTML version it purported to be offering. It gives us an idea as to how compliant with the standards the site truly is.

Surprisingly, a number of popular Web 2.0 sites were not in Alexa’s Top 25 so I created a separate list for them.

Top 25

Looking at the top 25, here are the results:

Name Doctype Encoding Validation
Google HTML 5 iso-8859-1 37 errors, 3 warnings
Facebook HTML 5 utf-8 34 errors
YouTube HTML 5 utf-8 120 errors, 2 warnings
Yahoo! HTML 5 utf-8 144 errors, 8 warnings
Blogger HTML 4.0 Strict utf-8 34 errors, 45 warnings
Baidu HTML 5 gb2312 6 errors, 6 warnings
Wikipedia HTML 5 utf-8 5 errors, 1 warning
Windows Live HTML 4.01 Transitional utf-8 33 errors, 17 warnings
Twitter HTML 5 utf-8 5 errors, 1 warning
QQ.com XHTML 1.0 Transitional gb2312 validator crashed
MSN XHTML 1.0 Strict utf-8 Completely valid
Yahoo Japan HTML 4.01 Transitional utf-8 26 errors, 24 warnings
LinkedIn HTML 5 utf-8 12 errors, 1 warning
Google India HTML 5 iso-8859-1 40 errors, 2 warnings
Amazon HTML 4.01 Transitional iso-8859-1 516 errors, 125 warnings
Sina.com.cn XHTML 1.0 Transitional gb2312 validator crashed
Taobao.com HTML 5 gb2312 validator crashed
WordPress XHTML 1.0 Transitional utf-8 4 errors
Google HK HTML 5 Big5 40 errors, 1 warning
Google Germany HTML 5 iso-8859-1 37 errors, 3 warnings
Ebay HTML 4.01 Transitional utf-8 386 errors, 19 warnings
Yandex HTML 4.01 Transitional utf-8 52 errors, 12 warnings
Google UK HTML 5 iso-8859-1 37 errors, 3 warnings
Google Japan HTML 5 shift_jis 39 errors, 1 warning
Bing XHTML 1.0 Transitional utf-8 16 errors

Looking at the data, the first thing that is interesting is how many sites have made the switch to HTML 5. Of the top 25 sites, 14 have made the switch to HTML 5. This means than in the last year, 56 percent of the largest sites on the internet have completely modified their code base to comply with a new standard. 6 sites are still left on the old HTML standard and 5 are sticking to the somewhat more recent XHTML standard.

However, it is also interesting to note that none of the sites which have made the transition comply with proper HTML standards. In fact, of the top 25 sites in the Alexa list, only MSN was found to provide completely valid code. Maybe Microsoft could point those people towards their other properties. Amazon was the worst offender, with 516 errors in their code, showing that disregard for standard compliance does not seem to have an impact on economic performance. However, Ebay and Yahoo came closely behind with hundreds of errors in their code, maybe highlighting Amazon as an exception.

Another interesting phenomenon is that most of the large sites have adopted UTF 8, the encoding type that support most languages, as their default language. Once again, over half (56%) of the sites have switched with Amazon and Google being among the rare exceptions. An interesting aside here is that the W3C validator may have issues when it comes to validating chinese sites as it was not able to finish the job.

Web 2.0 Companies

Looking at Web 2.0 companies, the data was surprising:

Name Doctype Encoding Validation
Facebook HTML 5 utf-8 34 errors
YouTube HTML 5 utf-8 120 errors, 2 warnings
Blogger HTML 4.0 Strict utf-8 34 errors, 45 warnings
Twitter HTML 5 utf-8 5 errors, 1 warning
LinkedIn HTML 5 utf-8 12 errors, 1 warning
WordPress XHTML 1.0 Transitional utf-8 4 errors
Flickr HTML 5 utf-8 15 errors, 3 warnings
Tumblr XHTML 1.0 Transitional utf-8 19 errors
Foursquare XHTML 1.0 Strict utf-8 40 errors
Groupon XHTML 1.0 Transitional utf-8 6 errors
Zynga XHTML 1.0 Transitional utf-8 4 errors, 6 warnings

I captured the data for companies other than those in the top 25 and a few interesting trends seem to pop up. The first thing that came as a surprise is that there seems to be that a lower number of sites have made the transition to HTML 5, with only 5 sites out of 11 (or 45 percent) having completed the transition. There seems to still be a strong preference for XHTML as the way to encode pages.

Also of note is that all sides have plans for globalization, encoding their page in the UT-8 format that can support both western and non-western alphabets.

However, none of the sites successfully validate in any of their preferred standard. It looks like there is still much room for improvement in the world of HTML validation.

, a serial entrepreneur most often found at tnl.net, where this was initially posted under the title The state of HTML validation. You can follow Tristan on Twitter at @TNLNYC


Viewing all articles
Browse latest Browse all 24

Trending Articles