String manipulation with multiple regex
This is an open discussion with no replies, filed under General.
Search
Create an account or sign in to comment.
This is an open discussion with no replies, filed under General.
Create an account or sign in to comment.
Quick Links
Symphony • Open Source XSLT CMS
--with-xsl
)
This isn't a Symphony specific problem, but you might be able to help me anyway.
What I'd like to do is to add markup to a string by using regular expressions, these regex patterns will be stored in a separate file and the same string needs to be iterated over all these patterns. Let me give you an example:
patterns.xml: <patterns> <pattern> <regex>\d+\smetr(es|e)</regex> </pattern> <pattern> <regex>\d+\s(km/h|mph)</regex> </pattern> <pattern> <regex>\d+\s*secon(ds|d)</regex> </pattern> </patterns>
data.xml: <section> <para>This is a paragraph with <em>three</em> matches: 24 metres, 120 km/h and 60 seconds.</para> </section>
desired result: <section> <para>This is a paragraph with <em>three</em> matches: <match>24 metres</match>, <match>120 km/h</match> and <match>60 seconds</match>.</para> </section>
The XSLT I've used for this looks like this: <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
The transformation starts at the matching template of the text() node.
The problem with this code is that I only get the last match marked up (the \d+\s*secon(ds|d) or eg 12 seconds). This because the xsl:analyze-string is only able to a string and not a node-set, so even if the input string in the second iteration is correct with a <match/> this element will be stripped. If I use "psuedo markup" like #match# ... #/match# the script works correctly since we're only dealing with a string.
This is based on code from David Carlisle at http://www.dpawson.co.uk/xsl/sect2/replace.html#d9701e322.
Maybe I'm thinking all wrong about this, so please help me back on the right track! :)
(My formatting isn't the best (the regex in patterns.xml is kinda messed up) but I think you get the picture.)