Search

I'm using XML Importer to import feeds every 24 hours. This works fine but the the feed sources themselves are subject to change whereas entries will be removed from the source every now and then.

The problem I'm having is that I need the data in my system to reflect that of the external feeds so if an entry is removed from an external feed I need to be able to recognise that and some how automatically delete it (or at least change a value to set it to "unpublished" or some alternative) so that it doesn't display on the front-end.

Each feed entry has a unique ID which can help to distinguish it but I'm still not sure of how I can match what's in the system to the entries of the external feed and then trigger a status change or deletion to system entries based on what's been removed from the external feed.

Can anyone suggest a good way to achieve this?

Have you played with the XML Importer yet?

When setting it up, you can specify which field is unique to match feed to entries, so you would specify your external ID as unique, and have a corresponding field for it to store that value.

Also, the best method is to set entries to unpublished. Does the feed physically remove the entries, or are they marked as removed?

I can't remember if the XML Importer can delete entries or not.

I've got XML Importer setup and running and for the import and even the update part it seems to work fine. The next hurdle is just figuring out how to delete entries which are no longer in the feeds being imported from.

In this case it's a list of properties and although in some cases they may be marked as "sold" whilst still being listed, in most cases they'll just be removed from the listing with nothing being marked as being removed.

So it's almost like I need to evaluate what's in the current system against what's in the feed being imported and if there are some in the system which aren't in the external feed then either delete them or mark them as unpublished. There doesn't appear to be anything in XML Importer which will delete entries.

Internally in Symphony I don't mind if the actual entry remains but they are just marked as unpublished (some how) or if they are removed entirely. I just need a way to automatically remove them from display on the site.

Is this more a job for the Remote Datasource maybe? That way the Data in the XML is true to the feed you are fetching?

Or what about a timestamp imported with each new entry and if the previous entries that are updated have a new timestamp only the entries with a new timestamp get filtered and shown on the front end? Hence ones that are removed from imported XML won't get an updated timestamp and therefore fall off the end of your filtering rules in the Datasource for output.

Make sense?

Yeah I'd considered Remote Datasource and have even got it installed and run some tests but there are a couple of problems in that:

A) I wouldn't be able to order by newest entries because they'd all be as new as each other in that respect. I was looking to rely on system created and updated dates but those wouldn't be available with Remote Datasource.

B) It doesn't seem to be possible to paginate the results in the remote datasource settings page. I have hundreds or thousands of entries so I need to be able to break up the results.

The other things I don't quite like the idea of is that once the cache life runs out if there is a problem in re-polling the external feeds the site is essentially taken out of action. In this case the site is powered entirely by outside data so it would take out 100% of the content if there is a problem in polling the external sources. At least with the XML Importer the old data would still be there albeit a small handful of entries would be out of date.

Or what about a timestamp imported with each new entry and if the previous entries that are updated have a new timestamp only the entries with a new timestamp get filtered and shown on the front end? Hence ones that are removed from imported XML won't get an updated timestamp and therefore fall off the end of your filtering rules in the Datasource for output.

Hmmmm. Where would the timestamp be added in the process? I've looked at adding a couple of date fields to the section and at first glance they are set to the current date and time but unfortunately this is only done via JavaScript in the Symphony admin panel and not saved to the database so on a remote fetch this isn't saved unless you're actually entering the entry in the Symphony publish page and clicking the submit button.

I'm not entirely sure I follow. Can you elaborate a little more?

With regards to xml importer and a timestamp.. do you have any control of the structure of the external feed? Thats where I was thinking about introducing the timestamp.

Assuming there is a cron job or something similar to run the XML import every 24 hours… what if there was a preceding job that set all existing entries to Unpublished. Then the XML Importer could match on the unique ID and set those back to Published.

Not very elegant, but it might work.

Ok what about using the reflection field with a slight alteration that reflects the entry updated date if the entry is edited? @designermonkey showed me a tweak to reflection field in order to do this.. I'll try find it when im at my desk.

Ok heres the thread discussing a way to reflect the entries modification date.

Interesting idea. You could also add your own date field and use this helper in class.xmlimporterhelpers.php to set the time it was updated, and only show entries updated within the last 24 hours.

static function returnCurrentDate($string) {
  $date = date('c');
  return $date;
}

Brian, your approach might work better, because if the entry is edited away from an xmlimport action I.e. edited from the symphony backend, the reflection field result would be out of sync with the other entries and filtering would be inaccurate.

Thanks guys, some interesting ideas here which I'll give a try to see if I can figure out the best way to go about it.

@briandrum, So by using that function it would effectively get over the problem I mentioned before about the date fields being populated by JavaScript in the publish screen? That would actually save the current date to the database on import?

Only showing entries updated within the last 24 hours would get around the need to delete or change entry status. The only potential downside is that as with the remote datasource, if for some reason the remote polling was interrupted then the whole site would be empty. The ideal would still be to at least show old data. It's fine if it's for something secondary like a twitter feed in a sidebar but when it's the entire site contents it's a bit more critical which I'd like to try and safeguard against if possible. I guess some kind of conditional check would be good here, so if there are entries within the last 24 hours just show those but if not show older entries (although I can still think of issues with that it may be a step closer).

@moonoo, good point about the sync with the Reflection field, although in this case the likelihood is very low of them being edited in the back-end I am still offering that facility and it may be a possibility, albeit remote. I think it's a case of weighing up the pros and cons.

@briandrum, So by using that function it would effectively get over the problem I mentioned before about the date fields being populated by JavaScript in the publish screen? That would actually save the current date to the database on import?

Yep. In XML Importer just make sure there is something to match in the XPath Expression (it doesn't matter what) and “XMLImporterHelpers::returnCurrentDate” in the PHP Function field.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details