Search

Hi!

I have a loosely structured XHTML data and I need to convert it to properly structured XML to import to a Symphony section.

Here's the example:

<tbody>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_apples.gif"/><img src="http://www.abc.com/images/flag/portugal.gif" alt="Portugal"/> First Grade</td>
</tr>
<tr>
    <td>Green</td>
    <td>Round shaped</td>
    <td>Tasty</td>
</tr>
<tr>
    <td>Red</td>
    <td>Round shaped</td>
    <td>Bitter</td>
</tr>
<tr>
    <td>Pink</td>
    <td>Round shaped</td>
    <td>Tasty</td>
</tr>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_strawberries.gif"/><img src="http://www.abc.com/images/flag/usa.gif" alt="USA"/> Fifth Grade</td>
</tr>
<tr>
    <td>Red</td>
    <td>Heart shaped</td>
    <td>Super tasty</td>
</tr>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_bananas.gif"/><img src="http://www.abc.com/images/flag/congo.gif" alt="Congo"/> Third Grade</td>
</tr>
<tr>
    <td>Yellow</td>
    <td>Smile shaped</td>
    <td>Fairly tasty</td>
</tr>
<tr>
    <td>Brown</td>
    <td>Smile shaped</td>
    <td>Too sweet</td>
</tr>

I am trying to achieve following structure:

    <data>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Green</color>
        <shape>Round shaped</shape>
        <taste>Tasty</taste>
    </entry>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Red</color>
        <shape>Round shaped</shape>
        <taste>Bitter</taste>
    </entry>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Pink</color>
        <shape>Round shaped</shape>
        <taste>Tasty</taste>
    </entry>
    <entry>
        <type>Strawberries</type>
        <country>USA</country>
        <rank>Fifth Grade</rank>
        <color>Red</color>
        <shape>Heart shaped</shape>
        <taste>Super</taste>
    </entry>
    <entry>
        <type>Bananas</type>
        <country>Congo</country>
        <rank>Third Grade</rank>
        <color>Yellow</color>
        <shape>Smile shaped</shape>
        <taste>Fairly tasty</taste>
    </entry>
    <entry>
        <type>Bananas</type>
        <country>Congo</country>
        <rank>Third Grade</rank>
        <color>Brown</color>
        <shape>Smile shaped</shape>
        <taste>Too sweet</taste>
    </entry>
</data>

Firstly I need to extract the fruit type from the tbody/tr/td/img[1]/@src, secondly the country from tbody/tr/td/img[2]/@alt attribute and finally the grade from tbody/tr/td itself.

Next I need to populate all the entries under each category while including those values (like shown above).

But... As you can see, the the data I was given is very loosely structured. A category is simply a td and after that come all the items in that category. To make the things worse, in my datasets, the number of items under each category varies between 1 and 100...

I've tried a few approaches but just can't seem to get it. Any help is greatly appreciated. I know that XSLT 2.0 introduces xsl:for-each-group, but I am limited to XSLT 1.0.

One way to achieve this is: Count the elements up to the next header line.

<xsl:template match="//tbody">
    <xsl:for-each select="tr[td/@class='header']">
        <xsl:variable name="elements-number" select="count(following-sibling::*[not(self::tr[td/@class='header'])]) - count(following-sibling::tr/following-sibling::*[not(self::tr[td/@class='header'])])"/>
        <xsl:apply-templates select="following-sibling::*[position() &lt;= $elements-number]" mode="element">
            <xsl:with-param name="rank" select="."/>
        </xsl:apply-templates>
    </xsl:for-each>
</xsl:template>

<xsl:template match="tr" mode="element">
    <xsl:param name="rank"/>
    <entry>
        <color>
            <xsl:value-of select="td[1]"/>
        </color>
        <shape>
            <xsl:value-of select="td[2]"/>
        </shape>
        <taste>
            <xsl:value-of select="td[3]"/>
        </taste>
        <rank>
            <xsl:value-of select="rank"/>
        </rank>
    </entry>
</xsl:template>

I have not included the type or country, but that shouldn't be too hard.

Sorry, it's buggy. It only outputs 3 elements. Wait a minute...

I was counting wrong. Here you are:

<xsl:template match="//tbody">
    <xsl:for-each select="tr[td/@class='header']">
        <xsl:variable name="elements-number" select="
            count(following-sibling::*[not(self::tr[td/@class='header'])]) 
            - count(following-sibling::tr[td/@class='header']/following-sibling::*[not(self::tr[td/@class='header'])])"/>
        <xsl:apply-templates select="following-sibling::*[position() &lt;= $elements-number]" mode="element">
            <xsl:with-param name="rank" select="normalize-space(td/text())"/>
        </xsl:apply-templates>
    </xsl:for-each>
</xsl:template>

<xsl:template match="tr" mode="element">
    <xsl:param name="rank"/>
    <entry>
        <color>
            <xsl:value-of select="td[1]"/>
        </color>
        <shape>
            <xsl:value-of select="td[2]"/>
        </shape>
        <taste>
            <xsl:value-of select="td[3]"/>
        </taste>
        <rank>
            <xsl:value-of select="$rank"/>
        </rank>
    </entry>
</xsl:template>

[EDIT]: Added normalize-space()...

Just a short explanation:

  • For each tr element with a class 'header', count all following tr elements which don't have the class 'header'
  • Subtract the count of elements following the next tr element with a class 'header' (and not having a class 'header')
  • Result: number of elements up to the next "header"

I am sure there are simpler ways, but this technique saved my day once. :-)

If you want a more "expert-like" solution, you may use IDs:

<xsl:template match="//tbody">
    <data>
        <xsl:apply-templates select="tr[td/@class='header']" mode="header"/>
    </data>
</xsl:template>

<xsl:template match="tr" mode="header">
    <xsl:variable name="next-header" select="generate-id(following-sibling::tr[td/@class='header'])"/>
    <xsl:apply-templates select="following-sibling::tr[not(generate-id(.) = $next-header or preceding-sibling::tr[generate-id(.) = $next-header])]" mode="element">
        <xsl:with-param name="rank" select="normalize-space(td/text())"/>
    </xsl:apply-templates>
</xsl:template>

<xsl:template match="tr" mode="element">
    <xsl:param name="rank"/>
    <entry>
        <color>
            <xsl:value-of select="td[1]"/>
        </color>
        <shape>
            <xsl:value-of select="td[2]"/>
        </shape>
        <taste>
            <xsl:value-of select="td[3]"/>
        </taste>
        <rank>
            <xsl:value-of select="$rank"/>
        </rank>
    </entry>
</xsl:template>

[EDIT]: Added initial apply-templates.

The above only selects elements which are not the "next header" or following the "next header". Same result.

Thanks for the solution... Saved me a day...

You're welcome, @qnn!

Which solution will you use? (I think that the second one is more elegant.)

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details