“XML Importer” – Forum Thread – Discuss

- bzerangue
- 25 Sep 10, 7:41 am
- Comment #201

@toman - Please forgive me if you’ve already answered this question (and I missed it), but have you tried the XML Importer? If so, would you mind making a screenshot of your setup and posting it here?

Also, have you looked over some of the middle posts in this same thread? If not, some of those might help you.

In regards to the XML… I find running another transform on the XML you want to clean up is helpful, will allow you import exactly what you want to get imported in.

Also, if you would like it to automatically import on a daily basis or whatever basis you want… you can run a crontab. Here’s how you do that with the XML Importer extension.

NOTE: I watched this helpful video on how Drupal uses Cron. Actually, the first half of this video, shows you how to setup Cron on your Mac or Linux machine (Windows developers, here’s a helpful tutorial on how to setup wget for Windows on your machines). The Drupal specific stuff on the back half of the video you can ignore.

Setting up Cron to run XML Importer

Step 1) Install the following extension, XML Importer

Step 2) Setup your XML import through XML Importer.

Step 3) Edit crontab by going to your Terminal and run crontab -e

NOTES ABOUT THE EXAMPLES: The authentication token (auth-token) is 123e45f6 and the XML Importer name is testimport. You should replace these values with your values.

10 *    *   *   *   php /home/username/Sites/test.com/extensions/shell/bin/symphony -t 123e45f6 cron run-tasks

Step 4)In the command line field in Cron, use the following, if curl is installed…

curl --silent --compressed http://test.com/symphony/extension/xmlimporter/importers/run/testimport/?auth-token=123e45f6

OR if wget is installed

wget -O - -q -t 1 http://test.com/symphony/extension/xmlimporter/importers/run/testimport/?auth-token=123e45f6

Then wait and see if it works. Or you can force it by going to your terminal, and typing in your command from the crontab

php /home/username/Sites/test.com/extensions/shell/bin/symphony -t 123e45f6 cron run-tasks

ALSO, since I’m no expert with Cron, please anyone add detail to this thread so it can be more helpful to others. Thanks for all of y’alls help.

Special thanks to the brains behind these extensions… @pointybeard (alistair), @buzzomatic (rowan), @nickdunn, @czheng, @allen, @nils, and everyone else in the community who develop all of the available extensions. Y’all truly are awesome! I hope one day to be able to develop a helpful extension. Thanks again!

- toman
- 25 Sep 10, 2:03 pm
- Comment #202

@bzerangue: I have tried the XML Importer. I followed the git patching gymnastics that you detailed in an earlier post, made an importer for a relatively simple data feed, and it worked first time out of the gate. Very nice, and thank you for stepping through that process.

What I am saying though is that I can see situations, such as the file I posted, where a simple XPath won’t be enough to massage the data into the format needed by the section being filled. Currently the solution for that is to write a PHP function to transform the data. That’s fine if you know PHP, but many who use Symphony are not programmers, or at least don’t know that they are. :-)
So, I’m suggesting an optional stylesheet that the data gets piped through first would be both a more powerful and a more friendly approach. Yes, you could run the data through a stylesheet by hand if you are just doing a one-off import, or set up some automated transformation outside of Symphony, but that’s sort of kludgy. A simple XSLT file that if it exists it gets run over the data first, as the import is occurring, is the most elegant solution to a number of problems such as the DocBook import. As I said, it’s just a suggestion.

- brendo
- 25 Sep 10, 3:29 pm
- Comment #203

So something like a select box that would allow you choose an existing XSLT file from within the current workspace?

I think that’d work, but like you said, you still don’t get the optional to preview what’s happening.

The best way (currently) would be do as designermonkey said in this comment

- toman
- 25 Sep 10, 4:17 pm
- Comment #204

@brendo: Yes, that would be one way to approach it. I was thinking a simple text input, but a select box is probably the better idea. I don’t think I said anything about previewing, but you’re right about that. I see what designermonkey is saying. It’s clever, but don’t you think it has an air of working in spite of Symphony, rather than because of Symphony? Maybe that’s a stylistic quibble, or I don’t get the Symphony aesthetic yet.

- brendo
- 25 Sep 10, 5:58 pm
- Comment #205

Symphony is great at transforming XML with XSLT already.

The XML Importer is good at what it does, take XML, apply XPath (not XSLT) and import the values.

I feel using Symphony to transform the data into more readable XML is the best path. It’s a familiar workflow (just like creating a normal page), you get the benefit of being able to debug the page and it saves adding bloat to the XML Importer.

If the XML Importer was updated to include the ability to pick an XSLT utility to apply, it could be done fine, but realistically you probably aren’t going to write that utility without testing it first against the data, so I begin to question it’s use. I can see many people creating a page using the Dynamic DS to get the utility working 100%, and then just removing that page (or leaving it).

So I wonder what’s the point when they can just make the Importer use that page they’ve already created.

And because of that, I feel it’s use is limited..

- designermonkey
- 25 Sep 10, 7:10 pm
- Comment #206

Just to add to my previous…

I always hide the url behind a hash of somekind, as it is only ever used in the system itself, that makes it almost impossible to stumble across the page from the frontend.

It honestly is the best approach imho. Many XML feeds are a little verbose for my liking, so I cut the cr4p out of them, merge nodes, etc etc until it is nice and easy.

It’s clever, but don’t you think it has an air of working in spite of Symphony, rather than because of Symphony?

Thank you! Plus I tink it works because of Symphony. Symphony is there to transform XML, whether it be output from the DB, or from a DDS. The extension is just that, an extension to the core to do a function, which it does beautifully (after a lot of learning, as you’ve probably read lol).

I do like your idea though, although I’d be more in favour of XSL being attached to DDSs rather than this. We wouldn’t want to overcomplicate this process.

- Nils
- 25 Sep 10, 7:18 pm
- Comment #207

So I wonder what’s the point when they can just make the Importer use that page they’ve already created.

The point is that you’d expect the XML Importer to handle this.

It works well the way you or John explained it (and this is what most of us do when using the importer) but it feels like a work around. If you have PHP knowledge it is easy to write a few helper classes but as Symphony is a XML/XSLT based system, it would be great if the importer could just make use of a XSL template. It would streamline the importing process in my opinion.

- designermonkey
- 25 Sep 10, 7:25 pm
- Comment #208

Where would that template be stored?
How would you debug and test the output?

It’s a lot more work than just adding the ability to use XSLT. I have never yet written an entire XSLT page that works first time, and as discussed earlier in this thread, I’ve also really struggled to get the xPath just right too. Debugging code would have to be added to the Importer etc.

It may look like an extra step, but honestly, the time it saves in the long run is greater.

Proper Preparation Prevents P*ss Poor Performance.

- nickdunn
- 25 Sep 10, 8:43 pm
- Comment #209

If you have PHP knowledge it is easy to write a few helper classes but as Symphony is a XML/XSLT based system, it would be great if the importer could just make use of a XSL template

The PHP is there to perform functions that are complex or impossible with XSLT, such as advanced date parsing and the like, so there would always be a need for PHP post processing of the selected values.

But I like the idea of a pre-processing XSL transformation that manipulates the XML into something more easily parseable. Thinking about this some more, there’s no reason why the XSLT couldn’t replace the XML/Field pairing that you have to do through the XML Importer interface. Say you had the fields in your section:

Title
Description
Date

Instead of writing an XPath expression against the incoming XML and pairing it with the section field, wouldn’t it make more sense to have XSLT to output in some generic format such as:

<title unique="yes">My value here</title>
<description>My value here</description>
<date post-process="my_php_function">My new value here</date>

So you’d do almost all of the processing with XSLT to build a new XML document which is then parsed directly by XML Importer: the elements match field names and the values are inserted.

It’s potentially more work for the developer since they have to write XSLT instead of matching things up in the UI, but it might feel less clunky than the current workflow.

It would also make it easier to have a “dry run” to apply the XSLT without actioning the import itself.

- Nils
- 25 Sep 10, 9:59 pm
- Comment #210

I think it would be possible to combine backend interface and XSL templates. The Importer could just create an XML file that stores its settings like so:

<importer>
    <utilities>
        <import select="{$workspace}/utilities/datetime.xsl" />
        <import select="{$workspace}/utilities/something.xsl" />
    </utilities>
    <basenode select="/a/deep[x = 'path']/link" />
    <fields>
        <title unique="yes">
            <match select="/my/xml/title" />
            <template>
                <xsl:value-of select="substring-after(., ' ')" />       
            </template>
        </title>
        <description>
            <match select="/my/xml/description" />
            <template>
                <xsl:value-of select="." />
            </template>
        </description>
        <date post-process="my_php_function">
            <match select="/my/xml/dates">
            <template>
                <xsl:for-each select=".">
                    <xsl:if test="position() != 1">, </xsl:if>
                    <xsl:value-of select="item" />
                </xsl:for-eachj
            </template>
        </date>
    </fields>
</importer>

This file could either be edited in the backend or on the server directly. While importing entry data, the Importer should first apply the stored template to the matched node and run the attached php function afterwards.

Two check if everything is working, the Importer could offer a test mode which simulates the import with the first node it should import and either throws errors or shows the resulting values.

- toman
- 26 Sep 10, 2:22 pm
- Comment #211

@nickdunn: You could do that, though it’s not backwards compatible with existing XML Importer use. That’s why I was suggesting pre-processing with an optional stylesheet, but leaving the XPath and PHP bits in. The elements would be defined by the field names in the XSLT only model, yes? What if your field data should end up with an XML fragment in it with elements named the same as the field name? A field named bob containing “<bob>Hi! I’m Bob!</bob>” for instance? You run into the “escaping the delimiter” problem. That example might be a little far fetched, but HTML5 “header” or “footer” tags could conceivably show up in fields named header or footer. What you said about the intended use of the PHP function was interesting. I looked back at the server requirements for Symphony and noticed EXSLT support wasn’t required. I don’t have a lot of experience with various hosting environments, so is it often the case that XSLT support is baked in but EXSLT isn’t? That seems like the way to go for those things that are difficult or impossible in pure XSLT.

- bzerangue
- 27 Sep 10, 12:16 am
- Comment #212

EXSLT is usually supported if libxslt is supported.

- nickdunn
- 27 Sep 10, 2:21 am
- Comment #213

I was thinking something similar to Nils, but much more simple. Instead of storing the configuration in XML, I’m suggesting almost zero configuration and allowing XSLT to do the work.

So my idea is that let’s say you want to import an RSS feed. You’d write some XSLT to transform the RSS XML into something that looks like this:

<section handle="articles" update="no">
    <entry unique="yes">
        <title>My new article</title>
        <date php="convert_to_gmt">2010-09-25</date>
        <description>Lorem ipsum dolor sit amet</description>
    </entry>
    ...
</section>

So your XSLT might look like:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output encoding="UTF-8" indent="yes" method="xml" />

<xsl:template match="/">

    <section handle="articles">
        <xsl:for-each select="//item">
            <entry>
                <title>
                    <xsl:value-of select="title"/>
                </title>
                <date php="convert_to_gmt">
                    <xsl:value-of select="pubDate"/>
                </date>
                <description>
                    <xsl:copy-of select="description/*"/>
                </description>
            </entry>
        </xsl:for-each>
    </section>

</xsl:template>

</xsl:stylesheet>

That’s all you’d need to write. The XML Importer would look for a standard XML schema that it understands (the <section>, <entry> and <field_name> elements). Just like a Symphony page, the source XML is presented like a Dynamic XML Data Source attached to a frontend page (so you get environment variables too) and you write the XSLT that you’re used to.

You can include any additional templates here or use EXSLT as you wish. EXSLT isn’t implemented completely with libXSLT, for example you can’t use regular expressions, so the need for PHP post-processing of each field value is a necessity. I see the PHP processing for achieving things that are simply too complex to achieve neatly in XSLT such as:

running content through text formatters such as Markdown
converting dates between complex formats
comparing version numbers and the like

Although, by registering PHP functions these could be referenced directly in the XSLT.

The idea behind this is to keep the same mental model and approach as writing standard XSLT for your normal pages. Kind of like building Data Source XML the wrong way round.

- Nils
- 27 Sep 10, 5:22 am
- Comment #214

I like your idea, Nick. The only thing that needs to be adressed is the variety of the section and field schemas. It might get quite complicated for the user to remember and reconstruct the structure of the import section. Maybe the Importer could generate a XML template or validate a given file against the current section schema?

- Ecko
- 28 Sep 10, 9:00 am
- Comment #215

I must be blind. Where can I grab the latest copy? Nils’ and the one on page one don’t work in Symphony 2.1.

- Nils
- 28 Sep 10, 3:18 pm
- Comment #216

I’m using it on 2.1.1 without problems. What’s not working?

- Ecko
- 29 Sep 10, 8:48 am
- Comment #217

I’m using it on 2.1.1 without problems. What’s not working?

After I save a set of fields, they seem to dissappear when I try to run the script.

Also, on your version, there are no fields that display. I’m guessing that the javascript is conflicting with something.

- designermonkey
- 30 Sep 10, 5:44 am
- Comment #218

You need the unstable branch of the git repo… Don’t know why it’s still not the master as the master doesn’t work and is ‘unstable’.

- designermonkey
- 26 Oct 10, 10:55 pm
- Comment #219

I’m having difficulty with imports past commit date 2010-06-03 in the unstable branch on the repo.

If I use the importer that I have on my hard drive, which the last commit is the ‘dat-flip’ commit, it works perfectly. If I use the latest version, it fails every time. It only partially imports entries and misses nodes completely…

I can’t log an issue on the repo, as the option is not there, and I’m reluctant to do it on this issue tracker as my first issue is still open an unread!

Can someone look into this? I’ll try and find the problem too, but I’m not great with PHP.

- jonasd
- 03 Nov 10, 5:24 am
- Comment #220

Here’s a goofy XML Importer trick, which may be useful for others (or if someone has a better approach, I’m all for it.) I’m using multiple sources to import entries into the same section, and I needed a way for each entry to indicate its origin. I created a Text Input field called Feed Source and set the XML Importer’s XPath expression to something that would always be valid, e.g.:

title/text()

then I set the PHP function as follows:

return "feedname";

this only seems to work if the XPath expression returns real content, so you have to pick a field that will always be non-null.

Symphony.

XML Importer

Search

Setting up Cron to run XML Importer

Server Requirements

Symphony.

XML Importer

Search

You are looking at page 11 of 25

Setting up Cron to run XML Importer

Server Requirements

Sign in