msgbartop
很傻很天真的程序员
msgbarbottom

01 八 10 windows下安装使用sphinx

1.直接在http://www.sphinxsearch.com/downloads.html找到最新的windows版本,我这里下的是 Win32 release binaries with MySQL support,下载后解压在D:\sphinx目录下;

2.在D:\sphinx\下新建一个data目录用来存放索引文件,一个log目录方日志文件,复制D:\sphinx\sphinx.conf.in到D:\sphinx\bin\sphinx.conf(注意修改文件名);

3.修改D:\sphinx\bin\sphinx.conf,我这里列出需要修改的几个:

type = mysql # 数据源,我这里是mysql
sql_host = localhost # 数据库服务器
sql_user = root # 数据库用户名
sql_pass = ” # 数据库密码
sql_db = test # 数据库
sql_port = 3306 # 数据库端口

sql_query_pre = SET NAMES utf8 # 去掉此行前面的注释,如果你的数据库是uft8编码的

index test1
{
# 放索引的目录
path = D:/sphinx/data/
# 编码
charset_type = utf-8
# 指定utf-8的编码表
charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
# 简单分词,只支持0和1,如果要搜索中文,请指定为1
ngram_len = 1
# 需要分词的字符,如果要搜索中文,去掉前面的注释
ngram_chars = U+3000..U+2FA1F
}

# index test1stemmed : test1
# {
# path = @CONFDIR@/data/test1stemmed
# morphology = stem_en
# }

# 如果没有分布式索引,注释掉下面的内容

# index dist1
# {
# ‘distributed’ index type MUST be specified
# type = distributed

# local index to be searched
# there can be many local indexes configured
# local = test1
# local = test1stemmed

# remote agent
# multiple remote agents may be specified
# syntax is ‘hostname:port:index1,[index2[,...]]
# agent = localhost:3313:remote1
# agent = localhost:3314:remote2,remote3

# remote agent connection timeout, milliseconds
# optional, default is 1000 ms, ie. 1 sec
# agent_connect_timeout = 1000

# remote agent query timeout, milliseconds
# optional, default is 3000 ms, ie. 3 sec
# agent_query_timeout = 3000
# }

# 搜索服务需要修改的部分
searchd
{
# 日志
log = D:/sphinx/log/searchd.log

# PID file, searchd process ID file name
pid_file = D:/sphinx/log/searchd.pid

# windows下启动searchd服务一定要注释掉这个
# seamless_rotate = 1
}

4.导入测试数据

C:\Program Files\MySQL\MySQL Server 5.0\bin>mysql -uroot test

5.建立索引

D:\sphinx\bin>indexer.exe –all
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file ‘./sphinx.conf’…
indexing index ‘test1′…
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.101 sec, 1916.30 bytes/sec, 39.72 docs/sec

D:\sphinx\bin>

6.搜索’test’试试

D:\sphinx\bin>search.exe test
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file ‘./sphinx.conf’…
index ‘test1′: query ‘test ‘: returned 3 matches of 3 total in 0.000 sec

displaying matches:
1. document=1, weight=2, group_id=1, date_added=Wed Nov 26 14:58:59 2008
id=1
group_id=1
group_id2=5
date_added=2008-11-26 14:58:59
title=test one
content=this is my test document number one. also checking search within
phrases.
2. document=2, weight=2, group_id=1, date_added=Wed Nov 26 14:58:59 2008
id=2
group_id=1
group_id2=6
date_added=2008-11-26 14:58:59
title=test two
content=this is my test document number two
3. document=4, weight=1, group_id=2, date_added=Wed Nov 26 14:58:59 2008
id=4
group_id=2
group_id2=8
date_added=2008-11-26 14:58:59
title=doc number four
content=this is to test groups

words:
1. ‘test’: 3 documents, 5 hits
D:\sphinx\bin>

都所出来了吧。

6.测试中文搜索

修改test数据库中documents数据表,

UPDATE `test`.`documents` SET `title` = ‘测试中文’, `content` = ‘this is my test document number two,应该搜的到吧’ WHERE `documents`.`id` = 2;

重建索引:

D:\sphinx\bin>indexer.exe –all

搜索’中文’试试:

D:\sphinx\bin>search.exe 中文
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file ‘./sphinx.conf’…
index ‘test1′: query ‘中文 ‘: returned 0 matches of 0 total in 0.000 sec

words:
D:\sphinx\bin>

貌似没有搜到,这是因为windows命令行中的编码是gbk,当然搜不出来。我们可以用程序试试,在D:\sphinx\api下新建一个foo.php的文件,注意utf-8编码

require ’sphinxapi.php’;
$s = new SphinxClient();
$s->SetServer(’localhost’,3312);
$result = $s->Query(’中文’);
var_dump($result);
?>

启动Sphinx searchd服务

D:\sphinx\bin>searchd.exe
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

WARNING: forcing –console mode on Windows
using config file ‘./sphinx.conf’…
creating server socket on 0.0.0.0:3312
accepting connections

执行PHP查询:
php d:/sphinx/api/foo.php

08 七 10 Sphinx使用入门

搜索三大要素:数据来源、预处理、查询。 
在Sphinx+MySQL的架构中,MySQL主要提供了数据来源和查询接口,真正进行全文索引建立和查询的是Sphinx。 
MySQL里面存放真正的数据;Sphinx从MySQL中获取数据建立全文索引;应用程序使用相应的api与Sphinx交互以获得真正的数据(此处的api包含SQL接口、php接口,以及其他一些编程语言能够调用的接口)。 

假设test库内有表test,结构如下: 
CREATE TABLE `test` ( 
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, 
`content` varchar(20000) DEFAULT NULL, 
`gmt_create` datetime DEFAULT NULL, 
`v` char(100) DEFAULT NULL, 
PRIMARY KEY (`id`) 
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
我们需要在content字段上面做全文索引。 

根据以上信息配置Sphinx是一件相对容易的事情。在只涉及到单数据来源(MySQL)的情况下,我们只需要关注下面这几个参数: 
source src1 

# data source type. mandatory, no default value 
# known types are mysql, pgsql, mssql, xmlpipe, xmlpipe2, odbc 
type = mysql 

##################################################################### 
## SQL settings (for ‘mysql’ and ‘pgsql’ types) 
##################################################################### 

# some straightforward parameters for SQL source types 
sql_host = 192.168.*.* 
sql_user = realzyy 
sql_pass = realzyy 
sql_db = test 
sql_port = 3306 # optional, default is 3306 

# pre-query, executed before the main fetch query 
# multi-value, optional, default is empty list of queries 

sql_query_pre = SET NAMES utf8 
# sql_query_pre = SET SESSION query_cache_type=OFF 

# range query setup, query that must return min and max ID values 
# optional, default is empty 

# sql_query will need to reference $start and $end boundaries 
# if using ranged query: 

sql_query = \ 
SELECT id, gmt_create, content,v \ 
FROM test \ 
WHERE id>=$start AND id<=$end 

sql_query_range = SELECT MIN(id),MAX(id) FROM test 

# range query step 
# optional, default is 1024 

sql_range_step = 128 

# UNIX timestamp attribute declaration 
# multi-value (an arbitrary number of attributes is allowed), optional 
# similar to integer, but can also be used in date functions 

# sql_attr_timestamp = posted_ts 
# sql_attr_timestamp = last_edited_ts 
sql_attr_timestamp = gmt_create 

# string ordinal attribute declaration 
# multi-value (an arbitrary number of attributes is allowed), optional 
# sorts strings (bytewise), and stores their indexes in the sorted list 
# sorting by this attr is equivalent to sorting by the original strings 

sql_attr_str2ordinal = v 

# ranged query throttling, in milliseconds 
# optional, default is 0 which means no delay 
# enforces given delay before each query step 
sql_ranged_throttle = 0 

# document info query, ONLY for CLI search (ie. testing and debugging) 
# optional, default is empty 
# must contain $id macro and must fetch the document by that id 
sql_query_info = SELECT * FROM test WHERE id=$id 

因为这个表的字符集采用了utf8,所以还需要修改一下: 
index test1 

source = src1 
# document source(s) to index 
# multi-value, mandatory 
# document IDs must be globally unique across all sources 
source = src1 

# index files path and file name, without extension 
# mandatory, path must be writable, extensions will be auto-appended 
path = /u01/sphinx/var/data/index_for_test 

# document attribute values (docinfo) storage mode 
# optional, default is ‘extern’ 
# known values are ‘none’, ‘extern’ and ‘inline’ 
docinfo = extern 

# memory locking for cached data (.spa and .spi), to prevent swapping 
# optional, default is 0 (do not mlock) 
# requires searchd to be run from root 
mlock = 0 

# a list of morphology preprocessors to apply 
# optional, default is empty 

# builtin preprocessors are ‘none’, ‘stem_en’, ‘stem_ru’, ‘stem_enru’, 
# ‘soundex’, and ‘metaphone’; additional preprocessors available from 
# libstemmer are ‘libstemmer_XXX’, where XXX is algorithm code 
# (see libstemmer_c/libstemmer/modules.txt) 

# morphology = stem_en, stem_ru, soundex 
# morphology = libstemmer_german 
# morphology = libstemmer_sv 
morphology = none 
# minimum indexed word length 
# default is 1 (index everything) 
min_word_len = 1 

# charset encoding type 
# optional, default is ‘sbcs’ 
# known types are ‘sbcs’ (Single Byte CharSet) and ‘utf-8′ 
charset_type = utf-8 

# charset definition and case folding rules “table” 
# optional, default value depends on charset_type 

# defaults are configured to include English and Russian characters only 
# you need to change the table to include additional ones 
# this behavior MAY change in future versions 

# ‘sbcs’ default value is 
# charset_table = 0..9, A..Z->a..z, _, a..z, U+A8->U+B8, U+B8, U+C0..U+DF->U+E0..U+FF, U+E0..U+FF 

# ‘utf-8′ default value is 
charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F 

执行一下indexer创建全文索引;再执行searchd打开Sphinx的监听端口以便接收请求;最后可以用search去检测一下战果~ 
#indexer –all 
#searchd 
#search *****

08 七 10 mysql sphinx存储引擎安装

由于业务需求,需要在现有mysql中安装sphinx的存储引擎,要保证现有mysql运行的情况下完成。mysql也的确支持存储引擎的在线热插拔,下面介绍安装步骤: 

1、查看现有mysql的运行版本 
# mysqladmin  -u user -p pwd version 
… 
Server version          5.1.47-log 
Protocol version        10 
Connection              Localhost via UNIX socket 
UNIX socket             /data/mysql_db/mysql.sock 
Uptime:                 15 days 2 hours 17 min 40 sec 

2、下载mysql和sphinx 
mysql:wget ftp://ftp.ntu.edu.tw/pub/MySQL/Downloads/MySQL-5.1/mysql-5.1.47.tar.gz 
sphinx:wget http://sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz 

注意:mysql源码包的版本一定要与当前运行的mysql版本一致! 

解压 
# tar -xzvf mysql-5.1.47.tar.gz 
# tar -xzvf sphinx-0.9.9.tar.gz 

3、将sphinx-0.9.9下的mysqlse目录复制到mysql目录中 
#cp -r sphinx-0.9.9/mysqlse/ mysql-5.1.47/storage/sphinx 

build 
# cd mysql-5.1.47 
# sh BUILD/autorun.sh 
#./configure 
# make 

注意:这里到make这步即可,不用install 

4、将make好的文件复制到当前运行的mysql目录中 
# cp storage/sphinx/.libs/ha_sphinx.* /usr/local/mysql/lib/mysql/plugin 
更改所有者 
# chown mysql.mysql /usr/local/mysql/lib/mysql/plugin/* 

注:我当前运行的mysql目录在/usr/local/mysql 

5、登陆mysql加载sphinx引擎模块 
#mysql -u root -p -h localhost 
# mysql> INSTALL PLUGIN sphinx SONAME ‘ha_sphinx.so’; 
检查引擎模块是否正常加载 
mysql> show engines; 
+————+———+—————————————————————-+————–+——+————+ 
| Engine     | Support | Comment                                                        | Transactions | XA   | Savepoints | 
+————+———+—————————————————————-+————–+——+————+ 
| ndbcluster | NO      | Clustered, fault-tolerant tables                               | NULL         | NULL | NULL       | 
| MRG_MYISAM | YES     | Collection of identical MyISAM tables                          | NO           | NO   | NO         | 
| BLACKHOLE  | YES     | /dev/null storage engine (anything you write to it disappears) | NO           | NO   | NO         | 
| InnoDB     | YES     | Supports transactions, row-level locking, and foreign keys     | YES          | YES  | YES        | 
| SPHINX     | YES     | Sphinx storage engine 0.9.9                                    | NO           | NO   | NO         | 
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance         | NO           | NO   | NO         | 
| CSV        | YES     | CSV storage engine                                             | NO           | NO   | NO         | 
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables      | NO           | NO   | NO         | 
+————+———+—————————————————————-+————–+——+————+ 

安装完成!如果要卸载存储模块使用 
mysql> UNINSTALL PLUGIN sphinx;

Analytics Plugin created by Web Hosting

普人特福的博客cnzz&51la for wordpress,cnzz for wordpress,51la for wordpress