Search

As discussed here

What do the symphonians here think of using blogger as your blog (because of easy integration with other google apps such as picasa, and good pageranks) and mirroring its content on your symphony site. Since pictures, videos, calendars, documents and much more already come from external webapps, why not be consequent and do it too for the last content that actually sits on your symphony database...?

Ofcourse you'd have to be carefull not to ruin your seo because of duplicate content. Maybe it can be solved by setting the blog to private, but then you lose the pagerank?

What do you guys think about this?

So, if all of your content is from external sources, why use Symphony? Wouldn't it be easier to just use a feed parser to mirror content? A CMS seems like overkill.

I think the problem with using external sources for all your content is that Symphony can't pull the content and add it to it's database for permanent archiving or editing. If it could suck in and copy external feeds, I could see it being useful for what you're talking about. Maybe an extension might add this functionality ...

I think JeffCroft.com uses a custom Django CMS that duplicates his content from other sites and integrates everything and archives it all. It's huge.

I use Tumblr.com as a basic way to the same thing, but I'd rather have that content archived on my site too.

We do a similar thing on a large site that makes use of social networking sites. There are 20 featured users, who all have Bebo blogs, Flickr accounts, Vimeo video accounts, some have Blogger blogs, and Twitter accounts. We have a cron task that polls the XML API feeds (or RSS feeds if no API is available) and caches the data back into Symphony.

Each site (Flickr, Vimeo etc.) has a corresponding Section in Symphony to store the data as we need it.

The reason being that we have to moderate each piece of content before it appears on the site. We have a simple extension that provides a preview of each (Flickr photo, Vimeo video etc) and a button to Approve/Decline.

The catch is that, of course, you need the cron job to do this grabbing of the data yourself.

I did a similar thing on a previous version of my blog/personal site. I grabbed my Last.fm feed every 10 minutes or so, storing in Symphony every single instance of a played track. The reason being that Last.fm only provide the last 10 plays in their API feeds. I managed to save about 30,000 records this way — the plan being that I can query and do some visualisation stuff in the future.

Do you know any such php feed parser wich has the essential url scheme, navigation and 'fixed' page features symphony has? There is more to a 'digital activity stream frontend' then just parsing.

Currently symphony external datasources are already cached, I assumed this meant they were written to DB? And you can modify symphonies image function to cache the pictures 'forever'.

Jeff's site is indeed a good example, yet I guess even he doesn't host his own videos copied over from an external app.

I am just investigating how much or few one is 'depended' on outside apps and if symphony could use a spinoff application limited to binding together external feeds. Or if symphony should step up to offer the features we find on external webapps by extensions;

  • importing of ical
  • photo cms import from iphoto/picasa
  • integrating with videoapp (will always be hosted external)
  • twitter
  • usable as replacement for google docs (spreadsheets will be hard...)

  • forum

  • webshop
  • crm/erp....

Realisticaly it seems to me that many will be running these last 3 seperate on their server, and will always be pulling and copying external apps who are beter at their specific task (flickr,....)

Personally for example I ponder using a symphony photomanagement VS flickr.

What's the team's idea on this all?

I'm not 100% sure what you're asking.

My personal view is that you should leave the specifics up to the specialist applications. Flickr is built for managing photos, so the photo management and storage should always reside there.

The Dynamic XML data sources are fine, but they simply cache the feed XML as-is. If the feed shows the latest 10 photos, that's all that will ever be cached. By replicating the XML schema as a Symphony Section you can archive the feed continually.

With reference to Jeff Croft's site — I'm fairly sure he just caches the 'links' to his assets elsewhere. So for a Flickr photo he would simply cache the photo title, description, thumbnail URL and full size image URL. There's no need to cache the image itself. But don't forget that with most of these sites, you could replicate Jeff Croft's efforts using the API, grabbing data on the fly, without saving anything in Symphony at all.

could you elaborate on "By replicating the XML schema as a Symphony Section you can archive the feed continually."?

Sure.

Let's say I want to cache my listening habits into Symphony. My recent tracks feed from Last.fm displays only the last 10 listens. So I decided to store this information in Symphony for future querying etc. A complete archive.

I created a Section named "Recent Tracks" with fields mirroring those found in the XML file (or just the ones I want to store): artist, mbid, name, URL, date. I then have a PHP script that grabs the Last.fm XML feed every few minutes, checks which entries I've already saved (date comparison) and creates entries for the new additions.

On other sites we do the same thing for Flickr photos (a Photos section, fields include title, description and URL), Vimeo videos, Twitter tweets, scraped blog posts, and so on.

I firmly believe this shouldn't be part of the Symphony core. The caching of Dynamic XML data sources is adequate for most cases. In these edge cases, Symphony provides the core tools for CRUD (create, read, update, delete) of the data. But the developer just needs some additional work to parse a feed and submit its entries to Symphony for storage.

@nickdunn

I would LOVE to learn how this is done with Symphony. I'd be a great tutorial once RC1 is done. :-)

Right now, I use Tumblr as an easy way to do this sort of thing. I'd really like to have more control over my data so if I ever want to stop using, say Flickr, my photos are already copied and available on my site and I can simply set up and use a different service.

If I get time I'll write up an explanation of how I do it — it's quite simple.

The only thing I'll add is that it isn't intended as a replacement for these services. It merely archives the data stored in them. With the Flickr example, all I store is the photo ID, its title and the URL of the image on the Flickr servers. I don't bother grabbing the image as well.

But yes, if your "Photos" section in Symphony is generic enough, you could have Flickr, Photobucket, Webshots, Ringo et al publishing into it.

I second the call for a tutorial for RC1. A most valuable contribution!

@nickdunn: Could you provide an example PHP script used for grabbing a feed and writing data in the Symphony database? This would surely be very helpful.

Have been snowed under with work recently, but I'll try and find time to cobble something together!

Huzzah, I finally had time to finish my blog template and write up the method. Hope you find it useful!

Archiving XML with Symphony

Nick, thanks for that tutorial! It will be handy when I finish work on my business site in before christmas.

so to clarify;

By default a datasource pointing to an external feed is cached every number of minutes specified. So this cache is stored as a file on your server, where is it located? And frequent updates as sepecified, how big is the impact on your server?

If there would be a way to detect when a feed changes (without listening in evry x-amount of minutes), and a revert to the previous state if the feed drops out, would this default method not be sufficient?

You argument that your 'writing to DB' method is good for archiving, but I assume every feed has archives available?

For those of us who see it as a backup and don't 'trust' the feeds they use, how about pictures? Somewhere on this forum is a hack to have caching the pictures (by the symphony image manipulation function) never expire. But consequent with your method one would need to actually write these pictures meta data into the DB, and the files itselves onto your server. How would that be solved?

And regarding movies, many would argue that offering self-stored mp4's is not space and traffic convenient, and therefore (and for added community features like commenting) choose to use a webapp, and import its feed.

So if we can never completely get rid of non-locally stored feeds, then should we?

A Dynamic XML data source is cached for a set number of minutes. However it is the page load (execution of the data source) that triggers the update, rather than a scheduled task updating the local feed every X minutes. So if nobody visits your site, the local XML cache remains unchanged. The first visitor page load will trigger the DS to update.

My method has several advantages:

  • Speed: a Dynamic XML DS runs on page load, so the initial "fetch" needs to make a request to download the remote XML feed. If the XML is large, or the network/remote server is slow, this can have a negative performance impact on your own page load times. My method pre-caches the data so the query is another MySQL query.
  • Querying: the archived data can be queried, sorted and filtered as a normal Data Source, since it is just data in a Section.
  • Reliability: if the XML fetched by a Dynamic XML DS is invalid, the data source will fail. My method can trap the errors and potentially report by email to the developer, and providing there are entries in the Section, the front end will always have some data to display.

Not all feeds have archives available. Sites like Flickr do indeed have an API for drilling down through all of your data, but other sites, particularly RSS feeds will only return the latest 20 or so items.

However if you don't trust the third party sites, then there's rationale right there for not using them at all ;-)

Using Flickr as an example, you have a few options:

  1. Hack the /image/ script to cache the JPEGs and never expire them.
  2. Use the event/cron method I describe but store the image URL in a Textfield, leaving the image hosted on Flickr itself
  3. Use the event/cron method but have an Image field in the section, and modify the cron PHP script to fetch the JPEG image as a bytestream, and send it along with the POST to your event (just as an image upload script would). The benefit is that you have an exact copy of the image cached on your server, and is stored as a native Image field in your section.

Obviously the downside of #3 is that if the file is large, you take the bandwidth hit. In the case of movies, you'd want to use #2 and simply save the URL of the remotely-hosted file.

Does that explain things a little more clearly?

aha. almost crystal-clear, thanks.

So if the update is set at 30 minutes, the first load will fetch and cache the feed, and any visits within that 30 minute frame will have no effect, just be shown the cached version. The first visit 30 minutes after this initial visit will re-fetch the feed, and so on... If so, what would you recommend for update time for flickr fotos, I would be OK with 1440 minutes, 24h...

This cached feed is floating somewhere in your servers memory, or is written to a file in a dir within symphony?

About the strategy/methods;

Comparing to Jeff Croft's custom brewed django cms, reading the list items below in this post he states he pulled all the Flickr photos 'on the fly'.(at the time of the post he did use images from static.flickr.com, and had an enourmous amount of thumbnails on his page, which took quite long to load) Currently it seems he has copies of each foto hosted on his server (and a limited cropped-photo count), like so (hm... and he marks them up as inline background image, not sure why...)

His Blogroll is also 'stored to DB' indeed,and on his tumblelog - a lifestream- he achieves what we here need to use Tumblr for, mix all activity into one feed.

Yes, you've explained the caching exactly. The refresh rate depends on how often your Flickr photos update. If you only upload new photos at the weekend, then you could potentially have a much longer time (60 * 24 * 7 = a week).

I'm certain the feed is written directly to the server, but I haven't looked at the code itself. Most likely it'll reside in the manifest/cache directory in some way.

When Jeff refers to "on the fly" he doesn't mean his data is coming from Flickr on the fly (I don't think). He's making a comparison to the Movable Type blogging platform which publishes a static HTML file for each post made, whereas in comparison Wordpress, Django and Symphony (et al.) build "pages" on the fly using .htaccess and a database of content.

Jeff appears to use various Django scripts (Python code, leveraging the Django framework) to do the heavy lifting of data from third party sites, and archiving it on his system. For example there is a Integrating Flickr With Django article on the Django site which documents this functionality, and looks like it's a similar script to what Jeff uses.

By also caching the tags and set relationships in his database, you can essentially build your own mini Flickr. Jeff chooses to allow visitors to browse by tags and date, which are nice features. The same could be done with Symphony by simply adding a Tag field to the cached photos section. You could build data sources to filter by a tag, or by a specific year.

With the Flickr API you can easily obtain the URL of the photo and its various sizes, and so downloading and caching this on your own server would be fairly trivial.

I guess they images are marked up as anchors with background images so he can easily add the border-radius CSS (which could have been achieved in a somewhat more fiddly manner with absolutely positioned PNG masks over the top of the image).

By caching each service into your database, you can easily achieve with Symphony what Jeff has done with Django. The lifestream wouldn't be too difficult. Providing you use a consistent naming convention between your sections (e.g. every entry has an ID, a Date etc.) you can use a data source to fetch the latest entries (e.g. Twitter posts, Flickr photos, Last.fm plays) and use XSLT to combine and order them into one list.

So I think what I'm saying is that Jeff is using a concept almost identical to the one I suggest — only his scripts are written in Python on top of the Django framework, whereas mine is PHP posting data to Symphony. I don't think Jeff is doing on his site that this method can't achieve :-)

I thought mixing several activities from external blog apps into a chronological lifestream feed still required an external 'mixer' application like tubes because importing, then 'statifying' and then combining with xslt is just too complex?

@nickdunn: Thank you so much for the thorough explanation of your concept. It really helps me a lot!

(Just in case somebody missed it: Article on Nick's website.)

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details