Search

I have a external datasource (dynamic xml) which has html in it. Only… The HTML is converted to entities. So instead of clean HTML I’ve got:

<p>Bla bla bla <a href="http://www.some-url.com">Some URL</a> Bla bla bla...</p>

Now, I can use this HTML nicely, by using disable-output-escaping, but what I want is to use XPath on this HTML. So I could select the first paragraph for example, or do some other neat stuff with it.

Any thoughts on how this is posible (if it’s even possible at all)?

This would seem to me to come down to the way the dynamic datasource itself is constructed. Is it a PHP file outputting some XML with some HTML in one of the nodes, is it a cURL call to another server etc.

The reason for escaping the HTML entities, I imagine, would be to ensure that they are valid XML. What I would do is experiment with a raw script that first fetches the data and tidies it up and then set up the datasource to use this script as it’s source.

I think fixing this in XSLT will require quite a lot of jumping through hoops. My suggestion would be to create a custom data source which fetches the XML and translates the entities in PHP before returning the resulting XML to the template.

The only thing in XSLT 1.0 other than actual nodes, as far as I know, you can convert to nodes are Result Tree Fragments. But this requires eXSLT, and even RTFs are more like nodes than strings.

Only solution in XSLT that comes to mind right now is creating a state machine in XSLT, reading character by character and creating on the fly… But only an insane person would do that!

@kanduvisla - Couldn’t you technically create a new page (xml page) and add that datasource to the new page, and clean-up the feed on that page and then use that newly created page as a dynamic datasource, that way you won’t have the entities.

Would you mind posting an example of the source code you are referring to?

The dynamic datasource is the RSS-feed of my Posterous blog: http://giel.posterous.com/rss.xml. You see the node rss/channel/item/description, which is HTML inside CDATA. The external datasource ‘entifies’ this HTML inside the CDATA.

I think modifying the PHP-file that reads the external datasource is my best shot at converting the entities back to their original tags.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details