How To Install and Setup Sphinx on Ubuntu 16.04

UbuntuMySQLSoftware & Coding

In this article, we will learn about how to install and setup Sphinx on Ubunt 16.04, Sphinx is an open source search engine which allows for full test searches, and is best in performing searches with a huge data very effectively where the data can be from any sources, (Ex − SQL Databases, plain text files, etc.,)

Features of the Sphinx

  • Advanced indexes and good tool for querying.

  • High Searching performance and indexes.

  • Advances result for post-processing.

  • Easily scalable with advanced searches.

  • Can be integrated with SQL and XML sources.

  • Can be scalable for handing the huge data with 1000’s of queries.

Prerequisites

Before we begin we needed some pre-requisites.

  • We needed a Ubuntu machine with a non-root user with sudo permission on the machine.

  • MySQL installed on the machine.

Installing the Sphinx on the Machine

We can directly install the Sphinx using a native package repository of the Ubuntu using the apt-get, below is the command to install the Sphinx.

$ sudo apt-get install sphinxsearch
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
libmysqlclient20 libstemmer0d
The following NEW packages will be installed:
libmysqlclient20 libstemmer0d sphinxsearch
0 upgraded, 3 newly installed, 0 to remove and 92 not upgraded.
Need to get 2,608 kB of archives.
After this operation, 20.5 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://in.archive.ubuntu.com/ubuntuxenial/universe amd64 libstemmer0d amd 64 0+svn585-1 [62.1 kB]
Get:2 http://in.archive.ubuntu.com/ubuntuxenial-updates/main amd64 libmysqlclie nt20 amd64 5.7.15-0ubuntu0.16.04.1 [809 kB]
Get:3 http://in.archive.ubuntu.com/ubuntu xenial/universe amd64 sphinxsearch amd 64 2.2.9-1build1 [1,737 kB]
Fetched 2,608 kB in 2s (986 kB/s)
Selecting previously unselected package libstemmer0d:amd64.
(Reading database ... 117542 files and directories currently installed.)
Preparing to unpack .../libstemmer0d_0+svn585-1_amd64.deb ...
Unpacking libstemmer0d:amd64 (0+svn585-1) ...
Selecting previously unselected package libmysqlclient20:amd64.
Preparing to unpack .../libmysqlclient20_5.7.15-0ubuntu0.16.04.1_amd64.deb ...
Unpacking libmysqlclient20:amd64 (5.7.15-0ubuntu0.16.04.1) ...
Selecting previously unselected package sphinxsearch.
Preparing to unpack .../sphinxsearch_2.2.9-1build1_amd64.deb ...
Unpacking sphinxsearch (2.2.9-1build1) ...
Processing triggers for libc-bin (2.23-0ubuntu3) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for systemd (229-4ubuntu4) ...
Setting up libstemmer0d:amd64 (0+svn585-1) ...
Setting up libmysqlclient20:amd64 (5.7.15-0ubuntu0.16.04.1) ...
Setting up sphinxsearch (2.2.9-1build1) ...
Adding system user `sphinxsearch' (UID 119) ...
Adding new group `sphinxsearch' (GID 125) ...
Adding new user `sphinxsearch' (UID 119) with group `sphinxsearch' ...
Not creating home directory `/var/run/sphinxsearch'.
Processing triggers for libc-bin (2.23-0ubuntu3) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for systemd (229-4ubuntu4) ...

Creating a Test Database for Sphinx

Now we have to create one test database using the sample data which comes with the package by default which will allow you to test the Sphinx searching in the later steps.

Let us login into the MySQL where we will create the test database and import the sample database.

$ mysql –u root –p
mysql> create database test;
Query OK, 1 row affected (0.01 sec)
mysql> SOURCE /etc/sphinxsearch/example.sql;
Query OK, 0 rows affected, 1 warning (0.01 sec)
Query OK, 0 rows affected (0.03 sec)
Query OK, 4 rows affected (0.01 sec)
Records: 4 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 10 rows affected (0.01 sec)
Records: 10 Duplicates: 0 Warnings: 0
Mysql> quit

Configuring the Sphinx for Searching

In Sphinx, we need to edit and configure 3 main blocks to suit our environment where the essential like index, search and sources are defined, these are found in the configuration file sphinx.conf, which is located at /etc/sphinxsearch/sphinx.conf.sample file for that we need to copy the existing sample configuration file into the /etc/sphinxsearch folder

$ cp /etc/sphinxsearch/sphinx.conf.sample /etc/sphinxsearch/sphinx.conf
$ sudo vi /etc/sphoxsearch/sphinx.conf

The configuration file should look like the below with blocks

Source Block in sphinx.conf

source src1
{
   type = mysql
   #SQL settings (for ‘mysql’ and ‘pgsql’ types)
   sql_host = localhost
   sql_user = roo
   tsql_pass = ubuntu
   sql_db = test
   sql_port = 3306 # optional, default is 3306
   sql_query = \
   SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
   FROM documents
   sql_attr_uint = group_id
   sql_attr_timestamp = date_added
}

Index Block in sphinx.conf

index test
{
   source = src1
   path = /var/lib/sphinxsearch/data/test
   docinfo = extern
}
Searchd block in sphinx.conf
searchd
{
   listen = 9312:sphinx #SphinxAPI port
   listen = 9306:mysql41 #SphinxQL port
   log = /var/log/sphinxsearch/searchd.log
   query_log = /var/log/sphinxsearch/testquery.log
   read_timeout = 5
   max_children = 30
   pid_file = /var/run/sphinxsearch/testsearchd.pid
   seamless_rotate = 1
   preopen_indexes = 1
   unlink_old = 1
   binlog_path = /var/lib/sphinxsearch/datatest
}

Once, we edit the configuration we needed to Index the Sphinx.

Managing the Indexes on Sphinx

Here, we will index using the configuration files which we edited in the earlier steps

$ sudo indexer –all
Sphinx 2.2.9-id64-release (rel22-r5006)
Copyright (c) 2001-2015, Andrew Aksyonoff
Copyright (c) 2008-2015, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'test'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.007 sec, 24319 bytes/sec, 504.03 docs/sec
total 4 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 12 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg

In the production environment, we needed to have the index up to date so we will create a cronjob for this –

$ crontab –e

Add the following to the end of the file.

# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h dom mon dow command
@hourly /usr/bin/indexer --rotate --config /etc/sphinxsearch/sphinx.conf –all

Starting the Sphinx Services

As we have configured the index using the configuration file we now needed to edit the Sphinx configuration file, by default the Sphinx daemon is not started we needed to edit the file in this /etc/default/sphinxsearch

$ vi /etc/default/sphinxsearch
#
# Settings for the sphinxsearch searchd daemon
# Please read /usr/share/doc/sphinxsearch/README.Debian for details.
#
# Should sphinxsearch run automatically on startup? (default: no)
# Before doing this you might want to modify /etc/sphinxsearch/sphinx.conf
# so that it works for you.
START=yes

Below is the command to star the Sphinx Daemon

$ sudo systemctl restart sphinxsearch.services

Once we restart the sphinxsearch services we will check the status using the below command

$ sudo systemctl status sphinxsearch.service
sphinxsearch.service - LSB: Fast standalone full-text SQL search engine
Loaded: loaded (/etc/init.d/sphinxsearch; bad; vendor preset: enabled)
Active: active (exited) since Mon 2016-09-19 13:00:20 IST; 1h 10min ago
Docs: man:systemd-sysv-generator(8)
Tasks: 0 (limit: 512)
Memory: 0B
CPU: 0
Sep 19 13:00:20 ubuntu-16 systemd[1]: Starting LSB: Fast standalone full-text SQL search engine...
Sep 19 13:00:20 ubuntu-16 sphinxsearch[7804]: To enable sphinxsearch, edit /etc/default/sphinxsearch and set START=ye
Sep 19 13:00:20 ubuntu-16 systemd[1]: Started LSB: Fast standalone full-text SQL search engine.

Testing the Sphinx Search

Now we will connect to the SphinxQL using the port 9306 using the MySQL interface.

$ mysql -h0 -P9306
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 2.2.9-id64-release (rel22-r5006)
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>

Search for a word “test” in the Databases

mysql> SELECT * FROM test WHERE MATCH('test '); SHOW META;
+------+----------+------------+
| id   | group_id | date_added |
+------+----------+------------+
|    1 |        1 | 1474272578 |
|    2 |        1 | 1474272578 |
|    4 |        2 | 1474272578 |
+------+----------+------------+
3 rows in set (0.00 sec)
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 3     |
| total_found   | 3     |
| time          | 0.000 |
| keyword[0]    | test  |
| docs[0]       | 3     |
| hits[0]       | 5     |
+---------------+-------+
6 rows in set (0.00 sec)

By using the setup and configuration, we can configure the Sphinx as a powerful search engine which is more efficient and can handle huge data, sphinx search can handle billions of documents and can handle terabytes of data where a thousand of search queries can be executed per second.

raja
Published on 20-Jan-2020 10:27:28
Advertisements