We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ik_smart/ik_max_word在分词时,如果命中了词语,则不会进一步拆分单字,例如:“唐诗三百首”,采用ik_max_word分词结果为:["唐诗三百首", "唐诗三百", "唐诗", "三百", "首"] 这样的问题在于:如果我只是输入“诗”字,则这篇文档不会被命中。
ik_smart/ik_max_word新增一个属性splitWord2Char,如果为true,就会将词语拆分成单字。默认为false,以便跟现有的行为一致。 对ik_max_word而言,“唐诗三百首”的拆分结果将是:["唐诗三百首", "唐诗三百", "唐诗", "三百", "唐", "诗", "三", "百", "首"]
新增一个ik_char的分词器,将文本拆分成单字。“唐诗三百首”的拆分结果将是:["唐", "诗", "三", "百", "首"] 这个分词器的目的,主要是解决es本身不能处理Surrogate Pair的宽字节字符,而ik分词器目前能处理,而且对中文分词有很好的支持。
The text was updated successfully, but these errors were encountered:
#854 这个pr提供了一个ik_max_word_char的分词器,实际就是ik_max_word的splitWord2Char为true的情况。
Sorry, something went wrong.
No branches or pull requests
Problem description
ik_smart/ik_max_word在分词时,如果命中了词语,则不会进一步拆分单字,例如:“唐诗三百首”,采用ik_max_word分词结果为:["唐诗三百首", "唐诗三百", "唐诗", "三百", "首"]
这样的问题在于:如果我只是输入“诗”字,则这篇文档不会被命中。
Preferred solution
ik_smart/ik_max_word新增一个属性splitWord2Char,如果为true,就会将词语拆分成单字。默认为false,以便跟现有的行为一致。
对ik_max_word而言,“唐诗三百首”的拆分结果将是:["唐诗三百首", "唐诗三百", "唐诗", "三百", "唐", "诗", "三", "百", "首"]
新增一个ik_char的分词器,将文本拆分成单字。“唐诗三百首”的拆分结果将是:["唐", "诗", "三", "百", "首"]
这个分词器的目的,主要是解决es本身不能处理Surrogate Pair的宽字节字符,而ik分词器目前能处理,而且对中文分词有很好的支持。
The text was updated successfully, but these errors were encountered: