Search

What sort of thing are you after, Lewis?

Well, I just need a way for my nonprofit friends to be able to search the member section they have by name. This is probably overkill, and as you may recall I utilized the Publish Filtering extension but started to have problems when using it in combination with the Bi-Link field.

Anyways, I don't want to sidetrack the thread.

how to highlight the search terms in the resultant page ? for eg if my search terms is "testing an event" the resultant page should highlight the "this a test an event" how to achieve this , it is possible ?

I have some problem with search index, that using cyrilic symbols.

When i submit search form - search results equal zero, because search string is urlencoded.

When i write search params in url like ?search=???-??-??????????????-??????? everything forked fine.

My question is , how to url decode query string automatically.

Tested only on symphony version 2.1.2

Another option is to use REGEXP. I used this recently, with a modification like this. Results are good and the client is pleased.

Here's another example of a REGEXP-based search query I've been playing with.

SELECT *,
(
    IF(LOCATE('nick', LOWER(`data`)) > 0, 1, 0) +
    IF(LOCATE('xslt', LOWER(`data`)) > 0, 1, 0)
) AS number_of_terms_found,
(
    (length(`data`) - length(replace(LOWER(`data`),LOWER('nick'),''))) / length('nick') +
    (length(`data`) - length(replace(LOWER(`data`),LOWER('xslt'),''))) / length('xslt')
) AS number_of_terms_matched
FROM sym_search_index
WHERE `data` REGEXP '[[:<:]]nick[[:>:]]' OR `data` REGEXP '[[:<:]]xslt[[:>:]]'
ORDER BY (number_of_terms_matched * number_of_terms_found) DESC

The algorithm is quite simple:

  • count how many of the terms are found in each row (e.g. if nick is found, number_of_terms_found is 1, but if nick and xslt are found, number_of_terms_found is 2)
  • count the number of times each of these terms are matched in each row (by replacing the string in the column value, measuring the new length)
  • multiply these to sort, as a function of "relevance"

This works well. It lets you do an OR search (nick or xslt) but because we multiply the count of all matched words against how many of the terms themselves are matched, a column containing both nick and xslt a few times, will score higher than a column mentioning only xslt many times. That's correct behaviour in my opinion.

Running the above on relatively large data sets yields slight performance implications, but not massive: 7,000 rows searched in 0.5s, 20,000 rows searched in 1.1s.

My plan is for the Search Index extension to provide a sort of modular plugin system, so that the developer can choose which query they want to use. Some might want the performance of boolean MATCH... AGAINST and not be worried about the length limitations, whereas others might want to use LIKE query, or a REGEXP query, or something entirely different.

Each has its pros and cons, and I'm not convinced that choosing a single query will satisfy everyone.

I've got an interesting one.

I'm using DB Sync as well as Serch Index on Development server. It triggered this error on submitting the search words from an arbitrary page:

Fatal error: Class 'Administration' not found in extensions\db_sync\lib\class.logquery.php on line 27

I think it tried to log some queries that were fired by Search Index. It's not a problem on Stage and Production because DB Sync is deactivated but I found it interesting enough to mention it.

how to display total number of results and no results found ?

Take a look at the README. The number of results are in the pagination element of the data source XML:

<search keywords="foo+bar" sort="score" direction="desc">
    <pagination total-entries="5" total-pages="1" entries-per-page="20" current-page="1" />
        ...
</search>

In case anyone is thinking of using this extension, you should hold off until I release the next version. It's coming soon, but it's not yet complete I'm adding:

  • indexing of each word in an entry into normalised tables, for future algorithm improvements (if I ever work out the complex SQL!)
  • an alternative to the MATCH... AGAINST fulltext query, thereby working around the 4-letter minimum limitation. I'm using LIKE with my own sorting algorithm which provides a fairly good relevance score too. A config option to switch between them.
  • an option to also search by word stems (e.g. searching for "filtering" would search for "filter" or "filtering")
  • suggest alternative spellings for keywords. Isn't perfect, but uses MySQL's soundex() and PHP's levenshtein() functions
  • an option to automatically build entry XML rather than just providing IDs as output parameters. This can be expensive (all fields are included in the result) but useful for some setups

It's likely that this release will not be backwards compatible with 0.6.x, so upgrading will require some playing around. Considering I'm not at 1.0 yet this is to be expected. I'll try and package some upgrade instructions if requested.

Search Index updated to version 0.7.1 on 7th of April 2011

This is a major update bringing the following functionality:

  • support for fulltext (as before) as well as LIKE and REGEXP matching
  • a "did you mean" algorithm to suggest alternative spellings
  • word stemming
  • build full entry XML in the data source without the need for chaining

It also fixes a ton of bugs and makes things more stable.

i'm having a bit of an issue with 0.7.1 and 0.7.3 both not creating the sym_fields_search_index table and symphony throwing an error after i add the field to a section. the field seems to be successfully associated, however, so i am locked out of editing the section again until i manually create a bogus table with id and field_id so i can remove the field from the section.

what other information do you need, nick?

so, on a whim (and mostly out of deadline desperation) i went ahead and changed line 30 of extension.driver.php to create sym_fields_search_index instead of sym_fields_search_index_filter and everything seems to be working like normal. is this correct?

You're quite right, 0.7.3 has a legacy bug whereby the table name is incorrect. Will roll out an update shortly.

Search Index updated to version 0.7.4 on 19th of April 2011

  • fixes the issue reported above

Sorry if this has been covered elsewhere, but I couldn't find anything on it. Also it might relate more to Symphony in general than the Search Index extension or my general inexperience with web server config/character encoding.

I'm just starting off trying to implement a site-wide search which handles 7 (and counting) indexed sections.

Any idea why the URL parameters string might be being encoded more than once and how to avoid that? This results in commas returning as %252C rather than %2C which ends up borking the "sections" parameter for Search Index. I've tried passing them through as straight commas, HTML entity codes and url-encoded characters. I've also tried explicitly setting the enctype attribute of the form element itself. None of this appears to be having any effect, so I've figured it's either something to do with Symphony or the host configuration/htaccess.

Any ideas?

I'm running Symphony 2.2 on a linux cPanel hosting solution (Apache 2.0.63)

Could you post an example of your relevant URL, XML, XSLT and resultant HTML? I can't replicate here, but I might not be doing the same thing as you are.

Apologies Nick, I'm now thinking a little clearer (mid-afternoon frustration had set in when I posted the original issue). I was expressing the form action for the search box with an explicit url - "{$root}/search" - rather than the more concise and less-exploitable "/search/" as suggested in the readme. I'm assuming the double url-encoding was due to sanitising by the server. My bad.

Now to actually build the search results, thanks for another excellent extension!

is there any possibility to search label fields also

Nick, this is a fantastic extension. Very powerful IMO, but I'm having some difficulties that I need to pick your brains about (uurrghh, zombies, brains!)

I can't seem to get the multiple section search working and I'm at a loss on how to do it. Single section is working using the search field and filtering on that. Brilliant! I've created my indexes, created a page too and attached the search index datasource to it, yet it always tells me <error>Invalid search sections</error>. What am I missing? I'm using it as default as possible so I'm not breaking anything, but I can't get it to work.

Also, a quick question about the search index suggestions datasource... What is it for and how do I use it? Can you update the readme to explain? Or explain here and I'll do it for you...

Thanks in advance!

Also, is there any way to get it to use page params rather than url params? Or is that not a good idea? (just asking, no real reason).

Invalid search sections

Are you passing sections in the querystring along with the keywords? The default is:

?keywords=foo&sections=section1,section2,section3

However if you want to omit these from your URLs then you can add this string to the default-sections key in the config. If that still doesn't work, take a look inside data-sources/data.search.php and just before the foreach on line 94, see what $param_sections contains. It should contain a comma-delimited string of section handles.

the search index suggestions datasource... What is it for and how do I use it?

It would let you build an AJAX auto-complete for a search input box. It accepts two querystring params: keyword and sort. The XML will return words starting with your keyword string. The words it searches are the indexed words from your entries... so if the user does select a word from the auto-suggest they are usually guaranteed a result. The returned words are sorted alphabetically by default, but sort=frequency will sort them by frequency instead (more popular words first).

You will want to attach it to a page, then format the output to what your JavaScript expects. Some jQuery plugins want XML, others JSON, others plain text.

I have no intention of providing any further implementation for this (beyond a README description) as each auto-complete/auto-suggest plugin works slightly differently, so I don't want to prescribe anything more.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details