“Running dynamic DS's from Cron or in the background” – Forum Thread – Discuss

- creativedutchmen
- 30 Jun 12, 10:17 am
- Comment #1

Has anyone ever rewritten dynamic datasources to just load from a cache file, and refresh that cache file using cron in the background?

My current website will import feeds from about 10 different websites, and the first visitor will face load times of over 2 seconds because all these feeds have to be refreshed. So doing the loading in the background would be really, really great.

I don't think this should be very complex, but has anyone every done this? Is this something you'd like to see abstracted in an extension?

- gunglien
- 30 Jun 12, 4:12 pm
- Comment #2

I've never exactly done it but discussed quite a lot. If you want simple XML caching you could use nick dunn's cacheable-datasource and set the cache expire time to infinite.

Possibly add some customization (not sure if he has a way to manually flush per read request) what I had discussed was adding a new url-param which would state that the cache has to be manually flushed. And then just hit this URL from your cron. (simplest way). The problem would be if you're running a multi-server environment where each keeps its own cache. That would be solved by possibly using a cache that saves into a 'shared' db.

- nickdunn
- 30 Jun 12, 5:03 pm
- Comment #3

Has anyone ever rewritten dynamic datasources to just load from a cache file, and refresh that cache file using cron in the background?

Yes. One website uses a few weather feeds to show current and forecast weather icons in the header. This can sometimes be really, really slow. I didn't use the Dynamic DS in this example, but used the same principle:

The weather feeds are called from a basic PHP script, and save their result as an XML file on the server. There's a schedule cron to run these every X minutes. The Symphony side just reads these XML files (as Dynamic XML data sources) from their URL on the site. I could have bypassed the DS entirely and called them in via document(...) in XSLT directly, but I wanted the extra level of parsing Symphony provides, such as when the XML is missing or invalid.

You could have a cron that hits a page with these data sources attached, to ensure they are always up to date, yes. However it won't always work: there is an edge case where the user might still trigger the updating, unless the cron is exactly synced with the DS expiry! Ideally your DS would have long expiry times (longer than the cron frequency at least), and the cron has the ability to first purge the existing data, or bypass the cache check entirely. This way the public consumption of the DS will always get back the cached version, and the cron consumption will always refresh it.

- creativedutchmen
- 01 Jul 12, 12:14 am
- Comment #4

Thanks, I'll go for most failsafe solution, as the datasource caching does not really work reliably on my server (no idea why, and don't feel like debugging). So, I will be triggering an update script via cron, write the results to a file, and read that file within Symphony.

Now, back to my second question, would anybody be interested in this as an extension? I will need to build this for a project, but I could abstract it a bit if anyone finds it useful.

- moonoo2
- 01 Jul 12, 12:23 am
- Comment #5

Been attempting to build an extension myself that is in need of a Dynamic DS that's loaded from cache or at least updates the feed every 30mins or so. Would be interested in knowing how you put this together for reference for sure.

- nickdunn
- 01 Jul 12, 12:54 am
- Comment #6

You could also investigate background processes and non-blocking CURL requests. This way, the visitor that triggers the cache refresh would still get the previous result, but would trigger a process to grab a new result. The stale cache would be served until this process completes.

I did once write a class to achieve some of this non-blocking stuff which stored everything in memcache. When the cached object becomes stale, the next request fetches the new data, while subsequent requests still receive the stale data. Another memcache key is created as a flag to say "serve the stale version until I don't exist", so when the cache is updated, this flag is destroyed.

Cache stampede race conditions suck.

- creativedutchmen
- 01 Jul 12, 2:23 am
- Comment #7

Yes, I thought about that, but the content will change very often, and I am afraid I won't have enough visitors to make sure everybody is shown reasonably up to date content.

I will put something together somewhere in august (going on holiday next week, woo!), and I'll put it on github!

Something related: is it possible for an extension to create a datasource type (rather than a single datasource) with 2.3? I do recall a discussion about it, but I can't seem to find it.

- klaftertief
- 01 Jul 12, 3:45 am
- Comment #8

The Union Datasource extension would be an example if an extension providing a new datasource type.

- creativedutchmen
- 01 Jul 12, 4:36 am
- Comment #9

Thanks, awesome!

- nickdunn
- 01 Jul 12, 8:52 am
- Comment #10

See also latest Section Schemas.

- michael-e
- 01 Jul 12, 9:57 am
- Comment #11

@creativedurchmen: I did something similar to what you are looking for. It's an extension providing various field types (e.g. "Facebook API" field). Like with every field , I am saving the API's (XML) content to the database. There are some advantages if you follow the Symphony "field" idea, e.g.: You can simply add the fields to sections/datasources to get the content in your page XML. And: Every field can save several "feeds" (i.e entries, in Symphony speak) using parameters. The amount of content you can save (and retrieve rather fast, I think) actually depends on the limits of the API.

Twitter, for example, has a rather strict limit (350 API calls per hour for one authenticated user). So if you would like to save many more, you would have to extend this idea and use mutliple accounts (maybe authenticated via a Twitter app) to call the API.

Of course the regular API calls must be activated by cron, so the extension has a private API itself. Once called, it will loop through the available fields. For each field, it will loop through the entries and call the corresponding API for each entry, then parse the content (especially in case of JSON) and save the XML results to the database.

That said, I don't really think that you can create a "universal" extension easily, because APIs and authentication mechanisms differ too much. That is why I went for the "field" idea. If I want to add another API, I can simply create a new field in the extension.

- creativedutchmen
- 01 Jul 12, 6:40 pm
- Comment #12

Woah. Michael, that sounds awesome! But, I was only planning on supporting basic XML feeds (RSS, for example), not all advanced API based feeds.

I too have thought of the field idea, but I think it's not the best way for me: my client will be able to add new feeds when I enable this, but they do not understand namespaces, xpath or any of that. So I will not be able to limit the XML, or retrieve namespaced feeds.

But, I think that my idea of providing a datasource type can be used to develop plugins for authenticated feeds, too. Just extend the datasource with a new authentication method, and you can still use the main logic: give xml from cache when requested, flush and rebuild cache when private URL is called.

Symphony.

Running dynamic DS's from Cron or in the background

Search

Server Requirements

Symphony.

Running dynamic DS's from Cron or in the background

Search

You are looking at page 1 of 1

Server Requirements

Sign in