Search

I got fed up with my school’s awful grade website, and decided the perfect way to use to rearrange the data into a new design was XSL. Naturally, the perfect tool for XSL is Symphony.

I tried to import the important pages as dynamic data sources, but unfortunately their (X?)HTML is as bad as their design.

Is there a way to clean up the document en route? Or, failing that, does anyone have suggestions of what else I could use to fix it up? I first tried creating a userscript, but the multitude of tables, DHTML, inline CSS, and horrific specificity made that extremely complicated.

I did the same some days ago using a two-step process:

  1. parse incoming data to valid XML
  2. parse these XML data to whatever you like

Step 1 is the hard one. I used the EXSLT str:replace template and special matching templates (Ninja technique) to generate valid XML output from a Symphony page (with dynamic data sources attached to it). The EXSLT template allows you to pass a whole bunch of search/replace strings to the template as a variable. The Ninja Technique (based on the “identity templates” idea) is the right start for very powerful transformations.

Sidenote: I used a two-step process for performance reasons as well. I am writing the valid XML (=Symphony page output) to disk using a cron job. Then I can load this (cache) file as a document into my XSLT templates wherever I like. If you need caching (and you will definitely need it if your source may be slow or if you are loading several feeds into one page) and want to store the data, you may as well consider writing importer scripts. For me an XML file on disk was OK, because I didn’t want any archive data in the database.

You could try HTMLTidy.

It’s available as a PHP plugin. If you can’t install that, there is a similar pure php solution:

htmlLawed

Or you could just send it through the w3c’s online tidy service:

http://www.w3.org/services/tidy?docAddr=http://example.com&forceXML=on

Thanks both of you! Since I’m lazy and this is only a personal project (but really, I’m lazy), I think I’ll go with w3c’s tidier unless I hit an obstacle.

Cool, let us know how that solution works out (never used it myself).

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details