Search

From a thread discussing publishing flat files, we started touchng on caching.

Remie asked:

In light of this discussion: does anyone have any experience with MemCache in combination with Symphony?

And some requested a bit more information. So...

Presently I use Cacheable Datasource a lot. This caches the XML result of a data source onto disk and serves this instead of querying the database (until it becomes stale). Problem is, you have to customise the data source. I'm hoping from Symphony 2.3 I'll be able to rewrite the extension using delegates so no customisation is necessary.

However I have seen APC being used in custom data sources, instead of saving the output to disk. Your data source grab() function might look like this:

public function grab(&$param_pool=NULL){

// unique key for this data source
$cache_key = $this->dsParamROOTELEMENT . md5($_SERVER['REQUEST_URI']);

// get cache if APC is available
if (function_exists('apc_fetch')) $data = apc_fetch($cache_key);

// cache is still good, return cached XML and param pool
if ($data != FALSE) {
    $param_pool = $data['param_pool'];
    return $data['output'];
}
$result = new XMLElement($this->dsParamROOTELEMENT);

// normal data source processing
try{
    include(TOOLKIT . '/data-sources/datasource.section.php');
}
catch(Exception $e){
    $result->appendChild(new XMLElement('error', $e->getMessage()));
    return $result;
}

if($this->_force_empty_result) $result = $this->emptyXMLSet();

// store new XML in cache
if (function_exists('apc_add')) {
    apc_add($cache_key, array(
        'output'        => $result,
        'param_pool'    => $param_pool
    ), 60);
}

return $result;
}

This checks that APC is available on your server, and stores the data source XML and param pool (including output parameters) in the cache. Exactly the same principle applies for memcache. You can try this out using MAMP by selecting "APC" from the Cache dropdown (MAMP Pro).

You can see in the example above that the object is cached for one minute (60 seconds). This got me thinking about how we might be able to add caching into Symphony's core, and where it would be appropriate.

I am not suggesting data source caching goes into the core. Data caching is so personal to your application that there are so many ways in which it can and should be implemented. The above is just one way, and it would be restrictive for Symphony to force developers to use one method.

However I do think caching could be used internally for repetitive tasks that occur on every page load:

  • selecting a list of all extensions and their delegate subscriptions from the database
  • resolving section and fields, namely looking up field configuration in sym_fields and the sym_fields_{fieldtype} tables

These are already kind of cached in the core — on each page load the queries are only performed once and the result cached in static arrays, so if the data is required again during that page cycle, it is read from the array and not the database. But these lookups still occur every page view. What if these common lookups could be stored in memory, then the result could be shared between page views, between visitors.

The problem with caching is knowing when to purge the cache — when does the data become stale? The above mechanisms all use an expiry time. This is fine, and an expiry of even just a few seconds would benefit sites with mega traffic. However I suggested the two area above for where caching would be appropriate because we have events (delegates) in the backend through which we can invalidate the cache.

Take for example the list of extensions and their delegates. The extension list in the backend could perhaps bypass the cache entirely. And when you enable/disable an extension, Symphony would know to invalidate the cache storing the array holding extension subscriptions, and it would be rebuilt on next page load.

With this in mind, these lookups could be cached for longer periods, perhaps minutes or even hours, and invalidated by backend actions when we know the structure has changed.

Granted, the queries to build data sources are the largest, and I'm not suggesting caching these at the moment. But these extra queries all add up, and I'm convinced there is some benefit to sharing the lookups between page views.

definitely +1

@nickdunn, This is how I picture you these days: Nick Dunn, the Thinker. Caching has always been a bitch to figure out. Any savings in lookups is always good in my books. +1

This kind of internal object caching is something that Wordpress and Drupal both do via plugins, e.g. Wordpress' memcache object cache. I'm not sure how it could be implemented in Symphony, especially using specific libraries. APC and Memcached are just two, but I'm sure there are more.

Would be interested to hear from other folks who have experience of object caching.

Thanks, Nick, for posting this. I am using APC as opcode cache, but I never noticed that it's rather easy to use APC's "user cache" as well.

Regarding your thoughts on Symphony's internal caching mechanisms: We must keep in mind that memcache or APC are not standard on shared hosting environments. But it would be great, of course, to be able to use them if available.

We must keep in mind that memcache or APC are not standard on shared hosting environments

Agreed. However I've read that APC will become standard in PHP 5.4.

I've decided to work on the Cacheable Datasource extension over the coming months (after the Symposium in October) with a view to adding APC and Memcache as cache options. If I can make it plugin-able (much like the Email providers in Symphony's core) then that'll be a step forward in considering it for core use.

I am wondering why use 'refreshing (or 'purge') by expiry time' if you can refresh the cache by updating a record (event)? Since the cache will only need to be updated if anything changes.
I am using this as a reference to the cachelite extension, but i think this can be said for all cache methods.

That's what I went on to say. For content that changes often, in CacheLite we use backend delegates to listen for content change events, and purge the relevant caches. However a timeout fallback is always there just in case.

I am considering caching in this context not for content, but for all of the structural lookups Symphony needs to perform. 99% of changes to Symphony's structure are performed through the backend (saving a section, saving a page etc.) so we have dedicated delegates (events) to listen for these actions and purge caches accordingly.

However power users might make changes directly to their database tables, and not through the Symphony UI. We can't ignore this, so a cache expiry based on time will catch these changes, albeit not immediately.

This is where core object caching may be problematic — advanced developers might be changing things directly in the database but then do not see these changes reflected in the frontend. There would need to be an option to disable caching in development environments.

advanced developers might be changing things directly in the database

Or extensions like CDI, which run SQL statements with structural and configuration changes on acceptance / production instances that are changed on local development machines.

Its a different approach then memcache, but what do we think about using cirruxcache?

I originally considered using symphony to generate flat files (see initiating thread), as the ultimate caching mechanism. But it turns out that cirrux combines the benefits of flat files with that of advanced caching.
I went over the cirruxcache docs again, but more importantly a vanished blogpost by the authors, I am starting to believe that a cachelight extension modified to work with cirruxcache would seriously rock.

The post stated;

While some people may think this is limited to purely static content, this is not accurate. You may very well apply that to dynamically generated content as well. The only question you have to ask yourself is: at what rate do you want your changes on dynamic content to be propagated? Typically, a web blog can very well suffer a one hour latency before a new post hits the planet. And a data feed may as well easily suffer a 20 minutes refresh policy. Either way, this all depends on your application structure, but if you are running anything that doesn't need to be fully real-time in its entirety, then it's quite likely you can get important gains from a CDN HTTP caching capabilities

and

Some requests may not be cached and forwarded directly to the origin instead (typically, POST) - deciding what you want to be cached, and what you want not to be, is under your control, by specifying cache-control directives.

So this is not just for your js, css and images, because all the rest is dynamically created, like this tutorial might imply. It just turns your symphony site into a rock solid one, just like a static one would behave.

For large web applications (10 million+ requests per day) this is common practice. Servers would explode and our application would die instantly and if we turn off memcache.

Dynamic content, from CMS or user generated, will take 10-30 minutes before it is published (with some exceptions).

Newnomad what you're talking about is static (asset) publishing and output caching, which is quite a different proposition from building in object caching to the core. I think we need to keep these ideas quite separate:

  • CacheLite/CirruxCache/Varnish/Squid/Akamai/CDNs which are designed to cache the entire output of the system and deliver it in a more efficient way
  • APC/Memcache opcode/object caching, which is making Symphony's internal data handling more efficient

True, good you clear that out, but even when you don't care about spread delivery, you could use use delivery networks just for caching the entire output, beyond just assets. So if you don't need super-dynamic interaction such as site search and forums (those could be handled by cloud services as disqus and bing). The end result for the site visitor can be the same: Memcache can make the first load of a page a lot faster, but after the first load (probably triggered by the author of the site anyway) each next load is from cache entirely. So In that case setting up memcache is not even needed.

Hi, if you install Symfony 2.0 on WAMP and install the phpapc.dll on php 5.3x you have to use extension=phpapc316php53vc9nts.dll - for apc-versions higher than 3.1.6 cause a bug on phpMyAdmin.

Symfony works with higher versions but not phpMyAdmin.

You can get it here - http://dev.freshsite.pl/php-accelerators/apc/sorting//1.html

Best regards Axel Arnold Bangert - Herzogenrath 2014

Axel Arnold Bangert, this forum is for Symphony CMS. Symphony CMS is not Symfony.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details