Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multiple etymologies #893

Open
wants to merge 55 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
ec62312
reformat code
qicz Jul 7, 2021
cf705e2
optimize code
qicz Jul 8, 2021
64c43a7
crlf to lf
qicz Jul 8, 2021
fd920fc
update config to yml
qicz Jul 9, 2021
a843449
update grant
qicz Jul 9, 2021
8d39443
update yml read logic
qicz Jul 9, 2021
f730a79
optimize code
qicz Jul 10, 2021
0d42c2d
optimize configuration
qicz Jul 10, 2021
df31678
optimize configuration properties
qicz Jul 10, 2021
ed59f32
refactor configuration properties
qicz Jul 11, 2021
3fc5545
refactor configuration properties
qicz Jul 11, 2021
497adf5
rename reload dict
qicz Jul 11, 2021
e61efef
add mysql and redis config
qicz Jul 11, 2021
5a62761
add remote dictionary logic
qicz Jul 11, 2021
222c606
export properties
qicz Jul 11, 2021
a2f698e
optimize dictionary & helper
qicz Jul 11, 2021
bc019df
refactor remote dictionary
qicz Jul 11, 2021
cffb603
fix remote dictionary inital logic
qicz Jul 11, 2021
8a9569a
typo
qicz Jul 11, 2021
2f42e53
optimize dictionary
qicz Jul 11, 2021
9ac1906
mysql remote dictionary implementation
qicz Jul 12, 2021
cca6af1
redis remote dictionary implementation
qicz Jul 12, 2021
c6f7843
fix http schema remote dictionary bug & optimize redis remote dictionary
qicz Jul 12, 2021
baee3e6
update readme
qicz Jul 12, 2021
a64a303
update readme
qicz Jul 12, 2021
b04baad
refactor dictionary logic
qicz Jul 12, 2021
4879120
typo
qicz Jul 12, 2021
b26cd88
optimize configuration
qicz Jul 12, 2021
922e77d
refactor remote dictionary schema logic
qicz Jul 12, 2021
f141420
support custom domain and etymology
qicz Jul 13, 2021
b3f364d
refactor dictionary logic
qicz Jul 13, 2021
4135faa
update readme
qicz Jul 13, 2021
3833fab
typo
qicz Jul 13, 2021
dc612cf
format readme
qicz Jul 13, 2021
6d7187c
fix default-domain logic
qicz Jul 13, 2021
2d6a813
check etymology
qicz Jul 13, 2021
7717197
format log
qicz Jul 13, 2021
8a5b208
refactor mysql remote dictionary logic
qicz Jul 14, 2021
334c670
link starter
qicz Jul 14, 2021
60d5812
import redip
qicz Jul 15, 2021
b4fc23d
update log
qicz Jul 15, 2021
0a080d1
using released redip
qicz Jul 15, 2021
78ac658
using AssertKit
qicz Jul 15, 2021
03e94df
update redip
qicz Jul 16, 2021
fd42a35
optimize dictionary reload logic
qicz Jul 16, 2021
42cd120
update testing
qicz Jul 16, 2021
de55714
testing & using redip 1.0.2, redis remote dictionary using zset store…
qicz Jul 16, 2021
e7961da
using enableMonitor function variable
qicz Jul 16, 2021
c00362f
update test logging
qicz Jul 16, 2021
a0b8d17
update redip1.0.3: add remote dictionary shutdown hook, close the res…
qicz Jul 19, 2021
bb4b9b2
adaptive ik xml configuration
qicz Jul 22, 2021
ad6a830
optimize import
qicz Jul 22, 2021
c87f744
update readme
qicz Jul 22, 2021
8c51480
fix readme bug
qicz Jul 22, 2021
bae22f7
support redis cluster
qicz Aug 24, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
/data
/work
/logs
/.idea
/target
/out
.DS_Store
*.iml
\.*
!.travis.yml
/data
/work
/logs
/.idea
/target
/out
.DS_Store
*.iml
\.*
!.travis.yml
147 changes: 146 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,151 @@ The IK Analysis plugin integrates Lucene IK analyzer (http://code.google.com/p/i

Analyzer: `ik_smart` , `ik_max_word` , Tokenizer: `ik_smart` , `ik_max_word`

### 2021.7.22 适配xml配置 by Qicz

```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<entry key="remote_ext_dict"></entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
<!-- redis 配置 -->
<entry key="redis.host"></entry>
<entry key="redis.port">6379</entry>
<entry key="redis.username"></entry>
<entry key="redis.password"></entry>
<entry key="redis.database">0</entry>
<!-- mysql 配置 <![CDATA[ 这里写url信息 ]]-->
<entry key="mysql.url"><![CDATA[ ]]></entry>
<entry key="mysql.username">root</entry>
<entry key="mysql.password">dbadmin</entry>
<!-- 刷新配置 -->
<entry key="refresh.delay">10</entry>
<entry key="refresh.period">60</entry>
</properties>
```

### 2021.7.12更新 by Qicz

- 改造使用yml配置文件;

```yaml
dict: # 扩展词库配置
local: # 本地扩展词典配置
main: # 本地主词典扩展词典文件
- extra_main.dic
- extra_single_word.dic
- extra_single_word_full.dic
- extra_single_word_low_freq.dic
stop: # 本地stop词典扩展词典文件
- extra_stopword.dic
remote: # 远程扩展词典配置
http:
# http 服务地址
# main-words path: ${base}/es-dict/main-words/{domain}
# stop-words path: ${base}/es-dict/stop-words/{domain}
base: http://localhost
redis:
# main-words key: es-ik-words:{domain}:main-words
# stop-words key: es-ik-words:{domain}:stop-words
host: localhost
port: 6379
database: 0
username:
password:
mysql:
url: jdbc:mysql://127.0.0.1/ik-db?useSSL=false&serverTimezone=GMT%2B8
username: root
password: dbadmin
refresh: # 刷新配置
delay: 10 # 延迟时间,单位s
period: 60 # 周期时间,单位s
```

- 调整优化重构Dictionary实现;
- 支持根据不同的业务指定远程动态词源

```bash
PUT es-ik-index
{
"settings": {
"analysis.analyzer": {
"ik_smart": {
"type": "ik_smart",
"enable_remote_dict": true,
"domain": "order", # 业务领域
"etymology": "redis" # 词源,可取值:redis,http,mysql,默认为http(xml配置适配)
}
}
},
"mappings": {
"properties": {
"field": {
"type": "text",
"analyzer": "ik_smart"
}
}
}
}
```



- 修复和重构Http扩展词提供方式的bug;
- 扩展RemoteDictionary,提供可配置的基于MySQL、Redis的扩展词库更新方式;

```sql
/*
@author Qicz

Date: 13/07/2021 10:18:19
*/

SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;

-- ----------------------------
-- Table structure for ik_dict_state
-- ----------------------------
DROP TABLE IF EXISTS `ik_dict_state`;
CREATE TABLE `ik_dict_state` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`domain` varchar(100) NOT NULL COMMENT '所属领域',
`state` varchar(10) NOT NULL COMMENT 'newly有更新non-newly无更新',
PRIMARY KEY (`id`) USING BTREE,
UNIQUE KEY `domain` (`domain`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

-- ----------------------------
-- Table structure for ik_words
-- ----------------------------
DROP TABLE IF EXISTS `ik_words`;
CREATE TABLE `ik_words` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`word` varchar(200) NOT NULL,
`word_type` tinyint(4) unsigned NOT NULL COMMENT 'word类型,1主词库,2stop词库',
`domain` varchar(100) NOT NULL COMMENT '所属领域',
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
PRIMARY KEY (`id`),
KEY `domain` (`domain`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

SET FOREIGN_KEY_CHECKS = 1;
```

> `jre/lib/security/java.policy`的grant中加入 `permission java.security.AllPermission;`

- 【TODO】 整理一个完成的使用手册
- [基于SpringBoot的MySQL、Redis扩展词库写入starter](https://github.com/OpeningO/redip)

Versions
--------

Expand Down Expand Up @@ -224,7 +369,7 @@ mvn compile
mvn package
```

拷贝和解压release下的文件: #{project_path}/elasticsearch-analysis-ik/target/releases/elasticsearch-analysis-ik-*.zip 到你的 elasticsearch 插件目录, 如: plugins/ik
拷贝和解压release下的文件: #{project_path}/elasticsearch-analysis-ik/target/releases/elasticsearch-analysis-ik-*.zip 到你的 elasticsearch 插件目录, 如: plugins/analysis-ik
重启elasticsearch

3.分词测试失败
Expand Down
13 changes: 0 additions & 13 deletions config/IKAnalyzer.cfg.xml

This file was deleted.

28 changes: 28 additions & 0 deletions config/analysis-ik/IKAnalyzer.cfg.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
<!-- redis 配置 -->
<entry key="redis.host"></entry>
<entry key="redis.port">6379</entry>
<entry key="redis.username"></entry>
<entry key="redis.password"></entry>
<entry key="redis.database">0</entry>
<!-- nodes配置写到<![CDATA[ 这里:host:port ]]> -->
<entry key="redis.cluster.nodes"><![CDATA[ ]]></entry>
<!-- mysql 配置 <![CDATA[ 这里写url信息 ]]-->
<entry key="mysql.url"><![CDATA[ ]]></entry>
<entry key="mysql.username">root</entry>
<entry key="mysql.password">dbadmin</entry>
<!-- 刷新配置 -->
<entry key="refresh.delay">10</entry>
<entry key="refresh.period">60</entry>
</properties>
Loading