The release of Symphony 2.0.7 saw some performance optimisations that deserve mention beyond a line on the release notes. The optimisations made are relevant to core contributors, extension developers and end users: some are changes to private functions within core classes (interesting, but incidental to the casual developer); some are tweaks to the public interface of core classes (extension developers, this is for you); and finally one is a change to the Data Source editor.
Last year we launched a new client website that looks simple on the surface, but in reality is a complex Symphony build of 40 sections, 275 fields, 50 pages and almost 100 data sources. One particular page uses a single data source to output XML markers for a Google Map. While performance was fair, I noticed that this page consumed almost 5,000 queries... just to render a list of 300 entries, one text field per entry. Concerns over query counts have been raised before, so this one data source assumed the role of arch nemesis for the next day or so.
I'm pleased to report that after a few nips and tucks to Symphony's core the total queries required fell from 4,677 to just 11. That's a saving of about 99.7 percent! Not that I'm counting. Oh wait, I am. After several months sitting in the integration branch on Github, these optimisations were finally merged into the 2.0.7 release. Here's an explanation of the problems and how they were solved.
Comparison query counts
I set up some simple sections and pointed separate 2.0.6 and 2.0.7 builds to the same database instance:
- Users: Name, Profile, Email, Profile Image, Date Added, Enabled (y/n) (screenshot)
- Comments: Comment, User (linked to Users)
- Articles: Title, Content, User (linked to Users)
- Uploads: File, User (linked to Users)
- Inbox: Subject, Message, User (linked to Users)
Notice that the last four sections all contain Select Box Link fields back to the Users section. I added three entries to the Users section, but for the purpose of this example no entries were required in any other section. I then created a single Users data source using the default options and selecting "Name" in the Included Elements list. When attached to the page the output looks something like this (2.0.6):
Oh my, 103 queries for 12 lines of XML? Rather a lot. These are the queries the data source must perform:
- resolve section name from ID
- retrieve a list of matching entry IDs (using filters, limit, sort etc.), for each:
- retrieve a list of all fields in the section, resolve field names
- query the data for all fields in the section
- retrieve a list of all sections linked to this section, for each:
- resolve the section name
- count the number of matching related entries
Change the order in which these queries are performed, throw in some caching, and give developers an option to keep the XML lean... the same data source running in 2.0.7 uses only 9 queries and at a fraction of the running time:
Why so many queries?
First off, a quick primer of Symphony's database structure since this is the root problem of high query counts.
If you were tasked with creating the Users section above in a normal MySQL database, no doubt you would create a single table and add six columns using the native MySQL data types (varchar, integer, binary etc.). This works well when data requirements are prescribed early, and when each column needs only hold one piece of data. Symphony however abstracts this normalisation one level further by creating a new database table for each field.
There are several benefits to this approach. Firstly, fields within a section can be created and destroyed with ease. By changing a reference in the database directly you can swap the parent section of any field, leaving its data intact. Secondly, a section "field" becomes more than a single database column — it can have multiple storage types (formatted and unformatted, in the case of a Textarea) and have options stored against it for text formatting, validation, default values, user interface options... they are almost infinitely extensible.
The downside of this normalisation is that querying entries becomes a three step process. What you gain in architectural flexibility you forfeit with query complexity. No longer can you perform a SELECT * from users
to return data for all users. Instead Symphony must:
- retrieve a list of IDs of entries in the Users section
- retrieve a list of field IDs comprising the Users section
- query data for each entry by joining field tables together
What this means is that as the number of entries returned from a data source increases, the number of queries increases exponentially. The same is true if you add additional fields to your section. Until now.
Optimisation #1: cache schemas resolved from the database
The FieldManager
and ExtensionManager
classes now maintain a static array of any resolved fields and extensions. This means that while the first entry in a data source will resolve its fields (XML element names) from the database, the values are cached so that subsequent entries will not re-query the database to build the section schema. This cache is shared across data sources, so if two DSs on a page are querying from the same section, the savings are greater.
This change does not effect developers or users — it is a change to private functions in the core.
When will I see the most improvement?
On any DS that returns more than one entry. On any DS that uses fields provided by an extension. On any page where two or more DSs are querying the same section.
Optimisation #2: limit the fields being queried
I was surprised to discover that a data source would query all fields for each entry returned, regardless of whether that field was in the Included Elements list (i.e. returned in the XML). If your section contains twenty fields but you choose to include only two in a DS, the data for all twenty are still queried. The reason for this was that all entry data must be available in case an Output Parameter is in use, at which point the value can be plucked from the query result and added to the parameter pool.
Instead, some functions in the EntryManager
class now accept an optional $element_names
parameter as an array of field handles. A data source passes a merged list of its Included Elements and Output Parameters so that EntryManager
queries data only for these fields. If no array is passed, the query reverts to grabbing data for the entire section schema as before.
Extension developers take note: if you use the fetch
, fetchByPage
or __buildEntries
functions from the EntryManager
class, you can optimise the queries by adding this additional parameter.
When will I see the most improvement?
When you have a section containing many fields but your DS returns only a handful. Does not save many queries, but saves memory usage.
Optimisation 3: counts of entries in related sections are costly
If your section is linked to others using a Select Box Link field, Bi-Link field or similar, then you automatically see an attribute for each section appended to <entry>
elements in the XML. Unfortunately this seems to be the most prolific perpetrator of query consumption. Taking the Users example above, for each User <entry>
returned Symphony would:
- query the section relationships table and return the IDs of any related sections
- resolve the handle of each (for the attribute name) in the sections table
- look up the field that links the two sections together
- query the count of related entries from the above field
This means an additional four or five queries to build each attribute on each entry. In the Users example above each entry is related to 4 other sections, each requiring 4 queries to build the attribute. When three entries are returned:
4 x 4 x 3 = 48 queries
That's a lot of overhead for XML you may never need. These values are usually used when displaying the number of comments for a blog post, or the number of articles written by a user. They aren't usually a requirement of every page on a website, so it seems wasteful to include them in every data source.
Developers take note: there is a new checkbox in the Data Source Editor labelled "Include a count of entries in associated sections" which allows you to toggle these attributes on and off. If you are upgrading from an earlier version the checkbox will remain ticked, but when creating new Data Sources the default is off.
When will I see the most improvement?
When you have a section that is related to many other sections. The more links, the greater the savings by removing these attributes.
Detailed comparison
Here are the results of the aforementioned "Users" data source running in 2.0.6 and 2.0.7. (Averages are over ten page loads. Results for 2.0.7 had the "Include a count of entries in associated sections" checkbox un-ticked.).
| Metric | 2.0.6 | 2.0.7 | Improvement | | ------ | :-----: | :-----: | :-----------: | | Total queries | 103 | 9 | 91.3% | | Time (average, sec.) | 0.049 | 0.015 | 69.4% | | Memory used (average, bytes) | 1202692 | 618128 | 48.6% |
They pretty much speak for themselves: the DS now requires a fraction of the number of queries, take a third of the time to execute, and consumes half of the memory as it did before.
Result.
Comments
Nice to see a more in-depth view into Symphony DS as well as the improvements you made.
What I am still wondering: Right before “Optimization #1” you write
Why don’t you just let MySQL unwrap your nomalization and write queries like
This would result in one to two queries per DS, no matter how many fields or entries you select.
Hm, an edit-button would be handy. :-)
f1.fid
are the foreign keys wich are equal to each of your structure tableid
.Apologies, my ordering was incorrect, I’ll update the post.
The actual querying of the data is similar to this. However the building of this query string takes a number of queries to work out the section from which you are querying and get desired field IDs and their element names.
Symphony first builds a query to get the list of entry IDs matching DS filters. It does the JOINs of the tables together, applies the WHERE filters and any LIMITs. This returns just a list of entry IDs. It then needs to build an entry object for each of these entries. The querying of the data is performed as one query per entry.
The difference is that your example does one query for the entire thing, whereas Symphony will use on query per entry.
One would naturally choose your method since it is the most efficient. If you were querying the database directly this is the way to go. However since things are so abstracted building a single query is almost impossible since a DS is so flexible.
Having the query of JOIN/WHERE/LIMIT separate from the data SELECTs is useful since it allows the query to be run just to return entry counts (for pagination) without selecting any data.
That’s great nick. What about core queries? Could be decreased?
Core queries in the frontend? They are actually remarkably lean now. A blank page uses about 25 queries, about 75% of which are for delegates. I’m going to have a think about how these can be optimised for a future release.
But Data Sources were always the biggest offenders.
Great post, Nick, thanks.
Nick, great post. Thanks for giving us this insight!
Nick, thanks for your work on optimizing the database queries. It relieves one of the biggest concerns about the level of database normalization that Symphony uses.
And the site design that you have developed for Original Travel looks pretty complex to me, actually. It’s an impressive example of what can be accomplished with Symphony. I’d love to see a book of case studies that details how sites like this have been built using Symphony.
Great article Nick.
I think we all know that the speed versus ease-of-use balance is as old as time itself and it applies to the code in the same way. You can’t have a user-friendly layer of abstraction without paying a price somewhere.
Furthermore, an understanding of what is happening “under the hood” will always help a developer do their job that little bit better so this kind of thing is very useful.
Thanks guys.
I’ve played with 2.0.7 some more and got the number of queries to resolve and render a blank page down to ten which is pretty damned good.
This includes four queries verifying the logged-in user to view the debug page, so normal users would only require six queries.
The same in 2.0.6 requires thirty.
The Juice Maker!
I like it. Nick “The Juice Maker” Dunn.
Thanks for the article and the improvements!
this article is great. Thanks Nick
Create an account or sign in to comment.