Unlock instant, AI-driven research and patent intelligence for your innovation.

Hybrid application method of two word segmentation machines based on SOLR search engine

A search engine and hybrid application technology, applied in the field of search engines, can solve problems such as inability to meet accurate word segmentation, and achieve the effect of improving usability

Inactive Publication Date: 2016-10-12
INSPUR QILU SOFTWARE IND
View PDF3 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Solr's built-in word segmentation device cannot satisfy accurate Chinese word segmentation, so it is necessary to introduce external Chinese word segmentation technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] The hybrid application method of the two word segmenters based on the SOLR search engine is to use the features of solr to integrate multiple word segmentation plug-ins when applying search engine searches, and use the fine-grained word segmentation method of IK Analyzer to create an index. When searching the index, the word segmentation method of mmseg4j is used to search.

[0062] The method steps are as follows:

[0063] Step 1. First, configure the IK Analyzer tokenizer in solr;

[0064] 1) In the configuration file schema.xml (location {SOLR_HOME} / config / ), the configuration information is as follows:

[0065]

[0066]

[0067]

[0068]

[0069]

[0070]

[0071] mode="complex" dicPath=" / hadoop / kbscloud / hhh / solr / credit / conf" / >

[0072]

[0073]

[0074]

[0075]

[0076]

[0077]

[0078] 2) IKAnalyzer.cfg.xml configuration file

[0079] Copy stopword.dic and IKAnalyzer.cfg.xml to the class root directory to en...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a hybrid application method of two word segmentation machines based on an SOLR search engine. The method includes the steps that when the search engine is applied for search, through the characteristic that multiple word segmentation plug-ins can be integrated in solr, when indexes are set up, a fine granularity word segmentation mode of IK Analyzer word segmentation is used, and when the indexes are searched for, a mmseg4j word segmentation mode is used for search. Compared with the prior art, by means of the hybrid application method of the two word segmentation machines based on the SOLR search engine, the defects of a solr word segmentation mode for Chinese word segmentation can be overcome, and the problems that due to the single word segmentation mode, search of the search engine is not precise enough, and the coverage range is not wide enough are solved; the usability of the search engine is greatly improved.

Description

Technical field [0001] The invention relates to the technical field of search engines, in particular to a hybrid application method of two word segmenters based on the SOLR search engine. Background technique [0002] Apache Solr is an open source, Lucene Java-based search server, easy to add to Web applications. Solr provides level search, hit highlight display and supports multiple output formats (including XML / XSLT and JSON formats). It is easy to install and configure, and it comes with an HTTP-based management interface. You can stick to Solr's excellent basic search function, or you can expand it to meet the needs of your business. [0003] For search engines, the importance of word segmentation technology is obvious, it is related to the most important part of search engines, the problem of search accuracy. Solr's built-in word segmentation cannot meet the requirements of accurate word segmentation for Chinese, so external Chinese word segmentation technology needs to be ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/3344G06F16/951
Inventor 孔彪
Owner INSPUR QILU SOFTWARE IND