Search

Hi,

I’ve been reading about HTML5 today and that got me thinking about IE8 and thinking about HTML5 and Symphony.

Let me explain my question…

I read this on the Web Hypertext Application Technology Group Wiki:

“What MIME type does HTML5 use?

The HTML serialisation must be served using the text/html MIME type.

The XHTML serialisation must be served using an XML MIME type, such as application/xhtml+xml or application/xml. Unlike XHTML 1.0, XHTML 5 must not be served as text/html.”

Now unless I’m mistaken, even IE8 still doesn’t accept “application/xhtml+xml” so sites on IE couldn’t handle the XHTML version of HTML5.

That panicked me a bit until I read:

“Void elements in HTML (e.g. the br, img and input elements) do not require a trailing slash. e.g. Instead of writing <br />, you only need to write <br>. This is the same as in HTML 4.01. However, due to the widespread attempts to use XHTML 1.0, there are a significant number of pages using the trailing slash. Because of this, the trailing slash syntax has been permitted on void elements in HTML in order to ease migration from XHTML 1.0 to HTML5. “

So closing slashes can still be used in HTML5. Good from an XML and a Symphony point of view..

But does this mean that in the immediate future a Symphony X/HTML 5 site would need to be served as HTML5 but have closing tags?

I know practically this is no big deal. But it seems worrying that HTML5 is a step away from XML. XHTML5 is unusable therefore because of IE and HTML5 reverts back to non-closing tags but allows them just to “ease migration” which sounds kind of temporary.

Just wanted to start a discussion about what HTML5 means for Symphony?

I’ll just throw up a brief but hopefully accurate history of HTML/XML serialisation:

To HTML 4.01 and below, <br/> (XML serialisation for self-closing tags) is actually a syntax error. Classic HTML parsers expect <br> (HTML serialisation) since they know which elements self-close and don’t need the ‘/’ hint.

XML parsers require the self-closing <br/> syntax, otherwise they will treat following elements as the br’s children, and probably throw a syntax error if no </br> is found later. This is what happens with XML 1.0, but non-draconian error handling is supposedly planned for future versions. Spaces before the ‘/’ don’t matter - to an XML parser, <br /> is the same as <br/>.

Luckily, <br /> gets parsed correctly (sort of) in legacy HTML parsers. I'm 90% sure that IE's parser treats
as an element with nodeName “br/”, which obviously won’t render the same as <br>. But it will treat <br /> as an element with nodeName “br” and an attribute “/”, which is good enough. This means documents can be serialised in a way that both HTML and XML parsers can read.

All browsers will use their HTML parser when sent a “text/html” MIME type, and since IE doesn’t recognise “application/xhtml+xml”, the only way to author a universally-readable document is by using an HTML-compatible serialisation.

HTML5 expands the HTML syntax to include <br/> as a substitute for <br>, but this doesn’t help us much unless all browsers are updated to comply.

XSLT processors know which serialistion to use based on whether you choose method="xml" or method="html" in the xsl:output element. Good XSLT processors know to use <br /> form when an XHTML doctype is used with method="xml".

Unfortunately, there are problems with either method. With HTML, some XSLT processors (namely LibXSLT, which is what PHP 5 and Symphony use) don’t do a very good job of indentation, which is a pain if you’re debugging. With XML, all sorts of hacks need to be used to generate HTML-compatible syntax, e.g. non-self-closing empty elements like <script></script>.

The consensus seems to be that the future of the web is HTML, not XML, which isn’t bad news for XSLT at all - XSLT is very well suited to producing HTML.

But does this mean that in the immediate future a Symphony X/HTML 5 site would need to be served as HTML5 but have closing tags?

Not at all. Any strict doctype would do (either HTML 4.01 Strict, XHTML 1.0 Strict, or HTML5), and would allow for use of HTML5 features in supporting browsers. Here’s what I use to produce pages with the HTML5 doctype, which outputs HTML serialisation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output
        method="html"
        media-type="text/html"
        encoding="UTF-8"
        doctype-system="about:legacy-compat"
        indent="yes"/>
    <xsl:template match="/">
        <html lang="en">
            <head>
                <title>HTML5 Doctype Test</title>
            </head>
            <body>
                <p>Hello world from HTML5 land!</p>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

To overcome the indentation problem mentioned above, I just switch method="html" to method="xml" while I’m debugging, which gives me nice readable indentation, then I switch back (and possibly also set indent="no") when making the site live.

I think the penny is finally dropping! So let me get this straight then? What really determines if a document is HTML or XHTML is the MIME type. If the document is served with a text/html MIME type, it is treated as HTML and is parsed by an HTML parser. If it is served as application/xhtml+xml or text/xml, it gets treated as XHTML and parsed by an XML parser? So really Symphony is serving HTML anyway and not XHTML even when it is valid XHTML1 and has an XHTML doctype. The W3C validator is validating the doctype while the browser is parsing based on the MIME type?

So valid XHTML served as text/html is probably in reality invalid HTML 4.01? The doctype and the MIME type don't really match? So maybe HTML 5 is just a recognition of that reality?

> So really Symphony is serving HTML anyway and not XHTML even when it is valid XHTML1 and has an XHTML doctype.

You are most correct. The XHTML 1.0 spec allows text/html as a valid MIME type, but to browsers, this is no different to HTML 4.01 Strict, since they both trigger the same rendering mode (unless you're trying to do things like include inline SVG).

HTML5's purpose is to consolidate everything that browsers do into a single parsing/rendering algorithm (with minor difference between standards and quirks modes when it comes to things like table layouts) in order to help make browsers simpler, to promote competition and make updating and improving browsers easier.

The HTML5 doctype is just the shortest string that triggers standards mode in browsers today:

&lt;!DOCTYPE html&gt;

A longer form is also allowed since this one is easier to output with XSLT:

&lt;!DOCTYPE html SYSTEM "about:legacy-compat"&gt;

HTML5 and XHTML5 are really the same thing, just with different serialisations + MIME types. Everything you can do in one you can also do in the other, so which you choose really depends on what's easiest for the author. Fortunately, XSLT is flexible enough to be able to do both, so if XHTML ever becomes the defacto format for the web, all you'll ever need to do is change the <xsl:output> element.

So valid XHTML served as text/html is probably in reality invalid HTML 4.01?

Precisely. You might enjoy this article by Webkit’s Maciej Stachowiak: Understanding HTML, XML and XHTML

In fact, the vast majority of supposedly XHTML documents on the internet are served as text/html. Which means they are not XHTML at all, but actually invalid HTML that’s getting by on the error handling of HTML parsers. All those “Valid XHTML 1.0!” links on the web are really saying “Invalid HTML 4.01!”.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details