Search

Hi! I've made a page wich is a kind of an "Interactive Determinator of minerals".
The idea is as follows.
The user selects a number of checkboxes on the page, presses the "Show" button, and a list of minerals is displayed which properties match those the user has selected.

To get the idea you can look here: http://world-of-stones.ru/min-filter/

(just, please, don't select too many checkboxes!!!, 1 to 4 is okay).

Now, what is behind the scene.

The 'minerals' section is where all the properties of the minerals are stored. And, at present, this section is less than 70 entries big.

Each mineral (an item in the section) has several properties, such as "streak color", "color","cleavage" and so on.
Each property may contain several values. For example, a mineral agate may be of different color such as white, gray, yellow and so on. In the section all the minerals properties have the Select Box Link field type (with 'multiple selection' allowed).

On the front end these properties are presented as groups of multiselectable checkboxes, representing all the possible choices.

Inside the data source's php-file these filters look like this:

              public $dsParamFILTERS = array(
                            '237' => '{$color}',
                            '238' => '{$hue}',
                            '223' => '{$luster}',
                            '224' => '{$streak}',
                            '225' => '{$cleav}',
                            '226' => '{$fract}',
                            '244' => '{$misc}',
                            '231' => 'equal to or less than {$dens-max} ',
                            '232' => 'equal to or greater than {$dens-min}',
                            '229' => 'equal to or less than {$mohs-max}',
                            '230' => 'equal to or greater than {$mohs-min}',
                            '251' => '{$trans}',
            );

So, each of the variable ($color, $hue, $luster and so on) is an array containing from 0 to 12 or more of the options IDs.

Here is an example of how the params can look like (I have output params to the screen when debugging):

parameters:
luster = 1335
streak = 1561 1538 1537 1468 1467 1357
cleav = 1342
fract = 1317
color = 1376 1377 1378 1379 1380 1381 1401
hue = 1387 1388 1389 1391 1392 1419 1390
trans = 1406 1407 1408 1529

When the user selects many checkboxes in each group, it takes too long for the sql-query to execute. Up to a few minutes on a local setup. The hoster did not wait that far and blocked my site for a few minutes when I checked all the boxes, and sent me a message "MySQL server has gone away").

Why is it happening? Is the resulting sql query so hard to execute?

I'm investigating now the queries displayed for me by Symphony when I use

<form method="post" action="?profile=database-queries">

Anybody knows what should be displayed in the second column? I would suggest there should be duration of execution of each query, but in my case, no matter how long the query takes, this column always contains '0.0000'.
Like this:

1   0.0000  SET character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'
2   0.0000  SET CHARACTER SET 'utf8'
3   0.0000  SELECT SQL_CACHE t1.name, t2.page, t2.delegate, t2.callback FROM `wos1_extensions` as t1 INNER JOIN `wos1_extensions_delegates` as t2 ON t1.id = t2.extension_id WHERE t1.status = 'enabled' ORDER BY t2.delegate, t1.name

and so on, for all 175 queries.

And here are the queries (I have obtained them with action="?profile=slow-queries") which take about 15 seconds:

SELECT SQL_CACHE DISTINCT `e`.id, `e`.section_id, e.`author_id`, UNIX_TIMESTAMP(e.`creation_date`) AS `creation_date` FROM `wos1_entries` AS `e` LEFT JOIN `wos1_entries_data_237` AS `t237` ON (`e`.`id` = `t237`.entry_id) LEFT JOIN `wos1_entries_data_238` AS `t238` ON (`e`.`id` = `t238`.entry_id) LEFT JOIN `wos1_entries_data_223` AS `t223` ON (`e`.`id` = `t223`.entry_id) LEFT JOIN `wos1_entries_data_224` AS `t224` ON (`e`.`id` = `t224`.entry_id) LEFT JOIN `wos1_entries_data_225` AS `t225` ON (`e`.`id` = `t225`.entry_id) LEFT JOIN `wos1_entries_data_226` AS `t226` ON (`e`.`id` = `t226`.entry_id) LEFT JOIN `wos1_entries_data_231` AS `t231` ON (`e`.`id` = `t231`.entry_id) LEFT JOIN `wos1_entries_data_232` AS `t232` ON (`e`.`id` = `t232`.entry_id) LEFT JOIN `wos1_entries_data_229` AS `t229` ON (`e`.`id` = `t229`.entry_id) LEFT JOIN `wos1_entries_data_230` AS `t230` ON (`e`.`id` = `t230`.entry_id) LEFT JOIN `wos1_entries_data_251` AS `t251` ON (`e`.`id` = `t251`.entry_id) WHERE 1 AND `e`.`section_id` = '12' AND (`t237`.relation_id IN ('1377') ) AND (`t238`.relation_id IN ('1388') ) AND (`t223`.relation_id IN ('1336') ) AND (`t224`.relation_id IN ('1538') ) AND (`t225`.relation_id IN ('1343') ) AND (`t226`.relation_id IN ('1318') ) AND `t231`.`value` <= 8 AND `t232`.`value` >= 1 AND `t229`.`value` <= 10 AND `t230`.`value` >= 1 AND (`t251`.relation_id IN ('1407') ) ORDER BY ( SELECT `ed`.value FROM wos1_entries_data_40 AS `ed` WHERE entry_id = e.id ) ASC LIMIT 0, 10
    SELECT SQL_CACHE count(DISTINCT `e`.id) as `count` FROM `wos1_entries` AS `e` LEFT JOIN `wos1_entries_data_237` AS `t237` ON (`e`.`id` = `t237`.entry_id) LEFT JOIN `wos1_entries_data_238` AS `t238` ON (`e`.`id` = `t238`.entry_id) LEFT JOIN `wos1_entries_data_223` AS `t223` ON (`e`.`id` = `t223`.entry_id) LEFT JOIN `wos1_entries_data_224` AS `t224` ON (`e`.`id` = `t224`.entry_id) LEFT JOIN `wos1_entries_data_225` AS `t225` ON (`e`.`id` = `t225`.entry_id) LEFT JOIN `wos1_entries_data_226` AS `t226` ON (`e`.`id` = `t226`.entry_id) LEFT JOIN `wos1_entries_data_231` AS `t231` ON (`e`.`id` = `t231`.entry_id) LEFT JOIN `wos1_entries_data_232` AS `t232` ON (`e`.`id` = `t232`.entry_id) LEFT JOIN `wos1_entries_data_229` AS `t229` ON (`e`.`id` = `t229`.entry_id) LEFT JOIN `wos1_entries_data_230` AS `t230` ON (`e`.`id` = `t230`.entry_id) LEFT JOIN `wos1_entries_data_251` AS `t251` ON (`e`.`id` = `t251`.entry_id) WHERE `e`.`section_id` = '12' AND (`t237`.relation_id IN ('1377') ) AND (`t238`.relation_id IN ('1388') ) AND (`t223`.relation_id IN ('1336') ) AND (`t224`.relation_id IN ('1538') ) AND (`t225`.relation_id IN ('1343') ) AND (`t226`.relation_id IN ('1318') ) AND `t231`.`value` <= 8 AND `t232`.`value` >= 1 AND `t229`.`value` <= 10 AND `t230`.`value` >= 1 AND (`t251`.relation_id IN ('1407') )

are they so bad to take 15 seconds on a dedicated local server and with only 70 entries in the section?

I haven't fully digested your particular setup, so this is mostly a general reaction to what I've seen so far...

From the looks of the query itself, it is indeed quite complicated. The parts that are contributing to the downgrade of performance would be:

The use of the count function count(DISTINCTe.id) can be a resource sink. Are you using the "Include a count of entries in associated sections" option?

The DS has many fields filtering by an array of values. Are the array of ID values coming from output of other data sources? DS chaining can be computationally expensive.

How many elements are you including in your XML, is there any areas that this could be minimised?

Generally I wouldn't go beyond filtering a DS by more than two different DS param output sources.

If there isn't an alternate way to restructure your content to be more streamlined, I'd suggest customising the DS, so that the query is much more optimised. The queries that the system automatically generates can be much better written when it comes to complex DSes.

are they so bad to take 15 seconds on a dedicated local server and with only 70 entries in the section?

Yes. The count distinct in combination with the joins is a pretty bad combination. Try running it yourself, with EXPLAIN in front of it. This will show you what the engine has to do to get you the result set. You'll be surprised by the amount of tmp tables, unindexed queries and other time consuming processes the engine has to go through.

From there you should be able to optimise the query, or rethink your structure. At this moment I do not have the time to think of a better way, but I'll keep it in the back of my head.

edit: Ah come on! Allen beat me to it (as always)

Thanks, both of you.

I haven't fully digested your particular setup, so this is mostly a general reaction to what I've seen so far...

I know, it's my fault. I can try to put it the other way (not sure I'll not mess it up again :-)):

I have a section with 70 entries and with a number of multiselectable SBL fields.
I want to filter it by those SBL field's values. The match is to be done against the user defined arrays of values (one array of values per each SBL field).

That's how I would filter records without the use of the SQL queries:

For each entry: the values from the first user's array are compared with the values of the first SBL field. If any of the values match - the SBL field is matched; and this is done for each user array - SBL field pair. if any of the SBLs doesn't match - no need to proceed with the others. - the entry is not included into resulting set.

Are you using the "Include a count of entries in associated sections" option?

No, I'm not. I Wonder why Symphony inserted that count function into this query?

The DS has many fields filtering by an array of values. Are the array of ID values coming from output of other data sources? DS chaining can be computationally expensive.

Yes, too many, as it occured. But this just for the beginning. Actually I need to add more fields to filter upon.
But each additional field seems to decrease performance in geometrical progression. The arrays of IDs come from the custom event. The latter takes the values from the $_POST variable. It all consumes a fraction of millisecond.

How many elements are you including in your XML, is there any areas that this could be minimised?

Not many, indeed.

Generally I wouldn't go beyond filtering a DS by more than two different DS param output sources.

Of course, if I'd cut the number of fields in the filter by half, I wouldn't notice any slowdowns.
But I really need all of them, and even more.

I have a crazy idea here to test: what if I replace this one DS filtered by 12 fields with, say, 4 chained DSs filtered by 3 fields each.
Will I gain much in speed?

The queries that the system automatically generates can be much better written when it comes to complex DSes.

I was very supprised how heavy weighted and inificient this query happened to be! I'm sure if'd just pulled all these 70 records into XML and just do all the filtering in a simple xsl for-each loop, I would get the results in a fraction of a second, and for the user's expense :-). Though I badly want to do it using the database resources.

Try running it yourself, with EXPLAIN in front of it.

Never had a thought that I could play with that query outside of Symphony. Probably from phpMyAdmin? May be I'll try it, just for learning purposes.
May be not, because I believe to what you said anyway.


Ok. What we've got?

  • Symphony creates a very inificient query, at least for my particular setup.
  • There is no way to simplify my sections - they seem to me very simple already.

What to try?

  • replace the current DS wich contains so many filters with a number of chained DS's having smaller sets of filters. Just the idea.
  • learn how to write my own sql-queries.

Anything else?

May be not, because I believe to what you said anyway.

It really, really is a good idea to run it (in myadmin, for instance, yes) because it really shows you where you can make progress. It will break the query into bits, and it will generally show where it can be improved too. This takes the guesswork out of optimising the query, which makes your life a whole lot easier.

Something else you could try - people are going to hate me for suggesting this - is creating another table, with all the properties you want to add as a field. You can then query this table directly, without the need for all the JOIN operations. You would have to re-generate each entry when it is saved, so you'll have to create an extension to do that for you.

This will make the entire operation quite a lot faster, but it does mean more work up front.

...creating another table, with all the properties you want to add as a field. You can then query this table directly, without the need for all the JOIN operations. You would have to re-generate each entry when it is saved, so you'll have to create an extension to do that for you.

Could you reword it for me, please, or, maybe, expand on it a little further? I can't get the idea. Thanks

Essentially what Huib is saying is that the query that is being run in your DS is very very complex, because it is trying to query a whole lot of tables at once by using joins (many, many of them).

There are two options here, either optimise the query, which may take some fancy SQL ninja skills to do, or do the reverse and simplify the database tables Symphony has to query.

If somehow you managed to create a single database table that kept all your data that you need, then the SQL query will be much simpler to write--in fact, it'll be possible to do this without using any fancy JOIN clauses and go straight SELECT with a few simple WHERE clauses.

The challenge however with this approach is finding a way to save your data NOT in the standard table structure, but in a custom database table. Huib suggested that you can create an extension that hooks into the action of saving an entry, thereby allowing the entry to be saved into a different database table. It's a pretty clever way of doing this actually. However, you will need to know some fancy extension ninja skills.

So where we stand is either going for SQL ninjutsu or Extension API ninjutsu. Unfortunately either way, ninjas will be involved.

Aha, I guess I see what you mean.
Creating a normal database table, containing all the fields, like in good old pre-Symhonia times?
An interesting idea. Just need to think about how to integrate it with Symphony.
I wonder if any body did something like that, or may be there are extensions which I could look at?

Hey, just got struck with a thought - isn't the Database Section Views extension what I could benefit from?
I, probably, should install it and see what it does.

As far as I know, database views won't generally help with actual performance and are mainly for ease of querying.

Edit: I just came across this from Allen:

Regarding my point about the DB optimisation through index tables, for those that heard my talk and suggested MySQL views, I just want to point out that MySQL views are pseudo temporary tables and its main purpose is to allow developers to build simpler queries on highly normalised or complex table structures. It however does not provide any performance gains, in fact it's a bit slower. This is because MySQL views are built in run-time before the user's query is run against the temp table.

Simple queries that run against physical index tables offer performance gains at the cost of additional file space and DB update performance.

You could use the reflection field or the search index / elastic search extension to build this intermediate table. It would not be as custom and specialised, but maybe just enough to speed up the queries. The reflection field supports different query modes you can use. This would be more suitable for a full text search because the data would be saved in a single column, but it might be worth a try.

One way to optimise would be using a select box instead if a SBL. This way there are no link tables and you won't need to model each individual attribute as its own section. You lose some flexibility, since the relationships are stored using the plain text strings rather than IDs, so if you change a select box option (correcting a typo for example) this would be de-selected for any mineral which had the original value.

You could still get a list of all the unique select box values using the Section Schemas data source.

I've run queries as complex as yours before without much issue so I wonder if your MySQL is configured differently, or whether Symphony's column indexes are being created properly — table indexes should make these queries fast.

Ok, thanks, armed will all your tips now, I will do some research and tests.

One way to optimise would be using a select box instead if a SBL.

I'd really like to keep SBLs. One of the reasons is that I want to keep away the cyrillic words from all the indexing, searching, quering as much as possible.

I'm happy to report that the problem is solved now. The results exceeded my expectations.
Here is the output of action="?profile=render-statistics":

Total Database Queries 151
Slow Queries (> 0.09s) 0
Total Time Spent on Queries 0.0290 s
Time Triggering All Events 0.0002 s
Time Running All Data Sources 0.0863 s
XML Generation Function 0.0070 s
XSLT Generation 0.0096 s
Output Creation Time 0.1185 s Total Memory Usage 2.58 MB

And this was the worst case, when the visitor checked all the 86 options in 8 groups of checkboxes.
Such a selection forces filtering of the section by 8 SBL fields, and 4 Number fields.

This is what I've replaced that huge slow automagically created query with (all is happening inside the custom DS):

  1. Query 1. Only filtering by 4 Number fields (they a representing two ranges "from min to max"). Returns minimum number of fields - namely, those of 8 SBLs, the visitor wants to filter upon.
  2. 'foreach' loops the results of the Query 1, and array intersections tests are performed to filter on the SBL fields.
  3. Query 2. '->where' contains just a list of handles of the minerals passed the filters of the first two steps. "select", this time, sets all the fields needed to display on the page.
  4. Query 3. This is to get the things from the subsections. Unfortunately, I could not figure out how to make SymQl do it.

Interested to see the results may go to http://world-of-stones.ru/min-filter/ and press a couple buttons.
Sorry, it's all in Russian. But if you press a second button, you might see some pretty pictures below in the table.
To the right of the buttons is a selector of the size of the page. Sets the max number of minerals to show in the result table.

That's really great!

Would you mind pasting your DS code? I think it'll be valuable for others to follow the thread in a "before/after" format and give other users a working example to learn from.

The result is astounding and I'm glad you stuck through it!

Would you mind pasting your DS code?

Gladly do it.
Just one warning. I'm still not very convinient with the terminology conventions, so some comments or variable names may be confusing for you. (they are confusing to me sometimes :-) )
Feel free to ask, I'm ready to explain any line, or make corrections to the comments.

I'm pasting the grab function only. The rest are just some leftovers from the standard DS.

require_once(EXTENSIONS . '/symql/lib/class.symql.php');

public function grab(&$param_pool=NULL){

# a list of fields to include in final result
# images: are commented, because SymQL does not pull data from
# subsections anyway
  $incl = array(
                     'title',
                     'url-handle',
                     'publish',
                     'intro: formatted',
                     'props: formatted',
                     'diagnos: formatted',
//                   'images: name',
//                   'images: alt',
//                   'images: image',
//                   'images: description: formatted'
                );


// list of all the SBL fields by which the filtering may be needed 
// not all will be used, - depends on the visitor selections
         $lbs_props = array(
                      'color-d',
                      'hue',
                      'luster',
                      'streak',
                      'cleavage',
                      'fracture',
                      'misc-props',
                      'trans'
                      );

// User selected filters              
 $u_sbl_f = array();

 foreach($lbs_props as $prop_name){
   if(isset($_POST[$prop_name]))  $u_sbl_f[$prop_name] = $_POST[$prop_name];
  }

// if the visitor checks no checkboxes at all,   
// then we do no filtering on SBL fields  
$sbl_filtering = (sizeof($u_sbl_f)>0)?1:0;

// Query 1.  Gets most of the minerals from the 'minerals' section.
// Filtering is done only for not SBL fields
$sel1_str = "url-handle";
if($sbl_filtering) $sel1_str.= ("," . implode(',', array_keys($u_sbl_f)));
   $query = new SymQLQuery('tmp');
        $query
               ->select($sel1_str)
                ->from('minerals')
                ->where('dens-min', 'equal to or less than ' . $_POST['dens-max'] )
                ->where('dens-max', 'equal to or greater than ' . $_POST['dens-min'] )
                ->where('mohs-min', 'equal to or less than ' . $_POST['mohs-max'] )
                ->where('mohs-max', 'equal to or greater than ' . $_POST['mohs-min'] )
                ->perPage(999); // I don't know how to turn off pagination
        $r = SymQL::run($query, SymQL::RETURN_RAW_COLUMNS);

  $hndls_toshow = array();

  $n = 0;

 if(sizeof($r['entries']) > 0)
   foreach ($r['entries'] as $min_id => $min_props){
   $match = 1;
   $n++;

// The records fetched by the first query, will be
 // simply filtered in the foreach loop
  if($sbl_filtering){
   foreach ($u_sbl_f as $prop_name => $filter){
     // if this property is not set for this mineral - no match    
      if( !$min_props[$prop_name] ) {$match = 0;}
       else {
         if(is_array($min_props[$prop_name]['relation_id'])) {
                $minprop_v = $min_props[$prop_name]['relation_id'];
                }
              else $minprop_v = array($min_props[$prop_name]['relation_id']);
             $prop_matched = 0;
             // check array intersection. 
             // just one match of any of the values is enough
             foreach($filter as $fv){
                foreach($minprop_v as $v) if($fv == $v) {$prop_matched=1; break;}
              if($prop_matched) break;
             }
           if(!$prop_matched) {$match = 0; break; }
        }
      }
     }

   if($match) {
               $hndls_toshow[]=$min_props['url-handle']['value'];
               }
  }   

// Query 2.
// Gets all the necessary fields values for the filtered earlier list of 
// minerals. 
$sel = implode(',', $incl);
if(sizeof($hndls_toshow)>0) {
   $regexp =  "regexp:".implode('|', $hndls_toshow);
   $query = new SymQLQuery('min-filter');
        $query
               ->select($sel)
                ->from('minerals')
                 ->where('url-handle', $regexp)
                ->orderby('title', 'asc')
                ->perPage($_POST['pagesize'])
                ->page(1);

  $result = SymQL::run($query);

// Prepare a regexp for the third query.
// Need to pull out one image per mineral. 'Name' of the image
// matches  'url-handle' of the mineral.
// 
$regexp = "regexp:";
$n = 0;
 foreach($result->getChildren() as $entries) {
  foreach($entries->getChildrenByName('url-handle') as $urlhandles){

    $regexp.= ((($n>0)?"|":"").$urlhandles->getValue()."$");
     $n++;
  }
 }

// Query 3.
// Get one image  per mineral (image's 'name' == mineral's 'url-handle')
   $query = new SymQLQuery('images');
        $query
               ->select('name, alt, image')
                ->from('images')
                 ->where('name', $regexp)
                ->perPage($_POST['pagesize'])
                ->page(1);

  $r2 = SymQL::run($query);

// Just append images after minerals. 
// Would be better to stick each image in the corresponding mineral's entry,
// but I was already tired by the time   
  $result->appendChild($r2);

  }

return $result;
}

I love the SymQL extension...

I love the SymQL extension...

Yes, I'm finding it very handy, too.
Though, I missed a lot a couple of features in it when designing the queries above:

  • 'where'ing on the multiple system id's
  • pulling relevant data out of subsections

The absence of these features made me look for the ways-around. I've found them, but I believe they are inferior by several aspects.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details