Yes it'd be interesting to know what values were omitted and if it's consistently the same field that is not accepting the values.

I am unexperienced with this extension. Can anybody tell me if it is capable of importing larger datasets, i.e. approximately 10,000 entries with 900,000 lines of XML?

I have done that, but not on a production site. It's how I started with Symphony.

I had the data imported on a cron task from an XML dump from another system, then tidied using pages and templates into a more semantic structure for use as an XML feed to another web system.

It never went into production (as I was made redundant) but it worked.

Thanks for the information, @designermonkey!

The cron task was to be set early morning as the process used a lot of memory and took a while, so it was expensive, but did work.

I'll bet the XML Importer works better now than it did back then, so not sure of the expense of it now.

Michael, do you have control over the import source? XML Importer should be able to import large datasets but it's a nightmare to debug invalid, not well formed sources.

@designermonkey: Fortunately, I will have to import the stuff only once. And I have lots of RAM, at least on the "final" live server (32 GB). The testing servers or my local machine have less RAM, but I will see how it goes.


The import source will probably have to be fixed using a two-step-process:

  • Tidy (which will make it valid XHTML)
  • XSLT transformation (which will output only whitelisted tags and attributes)

I will have to write a script to do this. At least the first part will have to be done in a copy of the source database, by "walking" through the table. Once the content is valid XHTML, I can try and build an XML dump (using another script, probably…) which can be transformed by XSLT. (I may then have to split it into chunks to be able to edit it in a text editor for some final improvement.)

(There will be even more, e.g. crawling images and other assets and manipulating the links in the source. Or building a workflow to quickly check and flag the resulting entries.)

This is a big task, and it would be very expensive for the client. I will tell him: Blame it on the WISIWYG editor (and Microsoft WORD).

I won't say it can't be done, but I'll say it might be unpleasant. I've been using the XML Importer to bring in 12.5MB wordpress exports and it required a lot of patience and some creative thinking to achieve.

I ended up consuming the raw XML file with a Remote DS and then massaging some of the XML into a more concise set for the XML Importer to consume.

Thanks, @brendo. As I said, I am sure that in my case cleaning up will be a big task, which will have to be "moderated" somehow… (I already looked at the data, and the HTML code is simply terrible. There are even cases where Tidy messes things up because it misinterpretes the rotten code.)

I'll say it might be unpleasant

This is why it would be very expensive for the client (if he insists on importing the archived data).

The final import process is the smallest part of this project. I could as well use a bare PHP script posting to a Symphony page which has a native event attached (I did this successfully in the past), but I thougt I might try the XML importer this time. :-)

I'm currently using this extension to import a list of property using an unique id, which is working really well! However when the unique id is no longer in the original xml file, the entry is still visible in my symphony site.

Is there a way to delete entries when their id is no longer available in the xml file?

I found a couple of threads on the subject but I'm afraid I couldn't get anything to work. I'm pretty new to extensions and would appreciate any help to guide me in the right direction. thanks :)

You could use a PHP function in the XMLHelpers file to check existence of ID and maybe use a checkbox to publish and unpublish based on the ID being present or not in the system?

Not sure deleting is possible from the XMLImporter code. Plus you then have an archive of properties in the system... what if the entry is accidentally omitted from the feed and the XMLImporter deletes something it shouldn't? using a checkbox condition may help with this... you can then filter on that checkbox for frontend output.

thanks for the super quick reply. the check box idea sounds good. I'm completely new to symphony extensions and the php code behind them. Do you know of any resources/code samples I could take a look at so I can work out how to check the existence of the id and change the value of the check box accordingly? thanks again

Will see what I can uncover, I'm sure I've come across something another forum member used a while back. Fingers crossed I can locate it again.

Here's one post which touches on the topic:

I've never done this myself, but I suggested a couple of possible solutions here:

Thanks for this guys, your help is much appreciated. I've taken a look at those links but I'm not getting far! saving a date and only showing the last 24 hours sounds like it could work, but then I have the same problem as on this comment

I think this comment about using a php function to check id's would be more suited to what I need but I lack the php/symphony knowledge to do this at the moment. It's something I need to learn, so no time like the present!

Can you only run xml import as developer?

I would provide an update link for the user through a php script using curl to cloak the auth-token and redirect to the corresponding page in the frontend.

But I don't want that user to be logged in as developer.

EDIT: my bad the url with the auth-token log out automatically. Solved!

Can anyone clarify what the datasources branch on the XML Importer repo is all about? It seems to have the most recent commits but it sounds like a radical shift in fetching XML and apparently not external feeds anymore?:

If you'd like to import external data, please have a look at Remote Data Source which accepts XML, JSON or CSV sources.

From that I take it that if you're currently pulling in from remote sources and want to grab and store a copy within Symphony that you need to stick with the current master/integration branch?

Incidentally it looks like @designermonkey's added a couple of useful fixes to the integration branch that could be worth pulling into the master? Are those part of providing Symphony 2.4 compatibility?

The Data Source branch is a radical shift indeed but it does still allow the usage of external feeds (anything else wouldn't make much sense).

The branch is the result of a longer discussion I had with Brendan on how we could improve the feedback in the interface if an error occurs during import. We noticed that the extension was duplicating a concept we already have in Symphony: data sources for fetching and providing data. So the idea behind this branch is to rely on Symphony's core functions to get the import data.

From that I take it that if you're currently pulling in from remote sources and want to grab and store a copy within Symphony that you need to stick with the current master/integration branch?

No, that's not the case. If you don't have it already installed, please take a look at Remote Data Source. It's the most advanced way to fetch external data in Symphony: it understands XML, JSON and CSV sources.

In your Data Source editor, set up a Remote Data Source that fetches your external data, e. g. an RSS feed (XML) or your latest post from Facebook (JSON via the Graph API). Then go to the XML Importer and create an importer using this Data Source.

The import will work as usual but you will get better feedback if the import failed. Furthermore, this new approach allows you to debug external sources in the browser because you can attach your Data Sources to a page and check their sources directly.

By the way, with this new approach it's also possible to migrate one section to another: you can now create a Data Source that returns all entries and fields from a section and use the resulting XML as source for an importer to a second section.

But you could do that anyway ;) Just sayin'

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details