Search

To continue on an earlier discussion, without spamming the wrong thread: the idea is to setup a hosted ElasticSearch service for Symphony.

Nick: I am toying with the idea of setting up a VPS node somewhere just running ElasticSearch (proxied through Apache/nginx, to provide authentication) and charge a nominal fee to cover the hosting costs. If a VPS costs $20/m then it only needs a few of us to make it worthwhile. It's just whether people are happy using a service like this.

Who would be interested?

Put me down for it. I'm already planning to do this and prefer if we do pool our resources together!

I'd be interested if I have a future project that requires search in Symphony.

I was going to set up ElasticSearch on a Linode in the same data centre (UK) as my Hiawatha/PHP hosting VPS which I use for a variety of client sites, meaning that the two VPSes can communicate directly through Linode's internal network, which should be quick and doesn't count against bandwidth quotas.

I'm thinking using a hosted ElasticSearch service would add a certain delay to searches, but would this generally be significant? I guess it's just a single HTTP request and response, so probably neither here nor there in the grand scheme of things for my kind of small/medium site usage.

I'm thinking using a hosted ElasticSearch service would add a certain delay to searches, but would this generally be significant?

I have the same concern, too. Given there are other similar services I'd say no. It would be interesting to know how much data has to travel around also.

The idea looks cool!

Put me down for it. I'm already planning to do this and prefer if we do pool our resources together!

Cool!

I'm thinking using a hosted ElasticSearch service would add a certain delay to searches, but would this generally be significant?

Sure, it would cause a certain delay, but I don't think it's significant. For instance, the ping between my VPS and a Linode VPS is about 11ms. So hosting the ES instance on Linode instead of my own VPS would cause an extra delay of that 11ms. I think that's very reasonable.

I already have an Amazon EC2 instance setup for the Symphony community which I was planning to use for making testing of the core easier (it has PHP5.2 and PHP5.3). I can easily add a node for ElasticSearch and maintain it if you guys are interested?

Who's have thought back in the days of dial-up that we'd be sitting here debating whether 11ms is reasonable!

I can easily add a node for ElasticSearch and maintain it if you guys are interested?

Something tells me it will need some extra effort than that, namely a web UI:

  • do we want to charge, to cover hosting costs? Could easily integrate with a basic recurring payment provider
  • do we want to allow anyone to register and create an index automatically, or should it be done manually, vetting each new customer?
  • ElasticSearch isn't easy to secure since it's not really supposed to be public facing on the Internet. I usually disable its port and proxy through Apache, adding in HTTP Basic authentication on the way. This is something we should consider. And then SSL to prevent the user/pass being sent as plain text
  • multiple customers share the same server instance, meaning we can all write to each other's indexes (there's no permissions model). Therefore people like IndexDepot and Bonsai use long hashes for index names to prevent people writing to each other's indexes

A few months ago I actually started planning a service just like this, aimed at Symphony users. I bought http://symphonysearch.net. The idea being they can install the ElasticSearch extension, paste in the index name (hash) and you're done.

Whatever service exists, I'd want a guarantee that it's going to be stable and stay around indefinitely, so it needs commitment from its owner.

@Nick: I have no problem of making this commitment.

I was kind of hoping that ElasticSearch would have the notion of "cores" like Solr does, in addition to the more flexible REST architecture of creating and maintaining indexes. This allows you to make a better separation for different users.

paste in the index name (hash) and you're done.

How do you than manage the indexing of the search results? Do have some sort of Cron job that updates the index with the entries from Symphony?

there's no permissions model

You could use Basic authentication for the XPUT requests. It is not that hard to make this work in apache. For the search results it is also possible to have IP restrictions instead of authentication. Especially in combination with HTTPS the authentication challenge is an overhead you might not want to have.

Haven't used Solr so can't really comment. ES uses "index" to mean a collection of "types", which in turn are a collection of "documents". This maps nicely onto Symphony: an index is a site, a type is a section and a document is an entry. So each user of the service could be assigned a new index in ES (which is created with a POST).

How do you than manage the indexing of the search results? Do have some sort of Cron job that updates the index with the entries from Symphony?

Give the extension a try. It posts entries to ES in three ways:

  • when you submit new/update an entry through an event (delegates)
  • when you submit new/update an entry in the backend (delegates)
  • there's an update page in the backend which makes batch requests, useful if you need to re-index everything. No crons.

You could use Basic authentication for the XPUT requests

I'd rather use authentication for everything, not just PUT. For example an instance's configuration, and an index's configuration, are all exposed via the API. I wouldn't want someone seeing how I've set the index up, synonyms etc.

The other hosted ES services rely primarily on the obfuscation of index names and nothing else. IndexDepot also allows you to add basic authentication if you wish, but generally a long index name is sufficient.

The problem is that you need to block server admin API stuff (creating new indexes), but allow operations on indexes themselves.

This is what I've been doing on my servers:

  • add an iptables rule to drop incoming traffic on 9200 (ES)
  • use a rewrite rule to proxy only to index URLs, thereby just exposing the API for individual indexes, not for creating new indexes and seeing the server config
  • using IP restriction, falling back to basic auth, on another proxypass through to the API for managing the instance

What this gives me is a locked down ES instance whereby one can only create new indexes (and see server config) with authentication.

Give the extension a try.

I will!

The other hosted ES services

If there are ES hosted services, why would we want to setup an Symphony specific host? Is there some extra stuff we can offer that the other services can't?

This is what I've been doing on my servers [...] What this gives me is a locked down ES instance

So if I understand correctly you have multiple ES instances running, which are all locked down for their specific use. Setting up the same lock down on a consolidated server is more work and less secure. What is the reason to consider this (other than saving costs)?

If there are ES hosted services, why would we want to setup an Symphony specific host

Cost. The cheapest seems to be $15/m for one single index. Which seems rather high to me.

This is what I've been doing on my servers...

Oh sorry, that was ambiguous. I have several ES servers for different projects/clients/networks (i.e. clients wanting to use their own servers). But I have been using this same configuration on each instance I set up, to lock it down. Each of these instances can, and do, serve multiple indexes.

What I am seeing for this service is a single ES instance locked down as I describe. Each new customer gets a new index on the instance. The locked down simply prevents the "root level" ES API being exposed without authentication.

Maybe we should have a chat on IM or similar to discuss how this might work?

Maybe we should have a chat on IM or similar to discuss how this might work?

Sure, let's start with e-mail (I believe you already have my address) and see how it goes from there. Maybe we can do a PoC and see if we can pull this off somehow :)

This looks very interesting - would definitely be interested to pool in :)

Especially as I start to put more into the Search Features. And considering most of the sites are work on are either small or else, handled by an external provider who doesn't like us to mess around with the set-up something like this would be great.

I'd be interested in this as well probably, depending on cost.

probably, depending on cost

What do you think it is worth?

Hmmm, hard question, isn't it? Depends a bit whether it's a site I'm building for myself, or for my company. My first thought was to use this on BoxedUpFun, my personal site about board games. The search capability really needs to be beefed up there. However, we make like no money on it right now (but also don't have a ton of traffic yet) whereas for my company the cost isn't as big of a deal, as long as we know about it up front on a project.

I guess I'd say somewhere around 5-10 bucks a month? That's pretty similar to what I pay for shared hosting right now...

This looks very interesting - would definitely be interested to pool in :)

I agree. I would love to know what the latest is on this.

The latest news is that may, june and july were busy months :) @Nickdunn: as soon as you return from your road trip we should look into this again!

I just setup ES on my Digital Ocean droplet and it was painless. Their SSDs and RAM configs have the bandwidth to support ES.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details