Unlock instant, AI-driven research and patent intelligence for your innovation.

Self-learning word segmentation method and device, computer equipment and storage medium

A word segmentation method and self-learning technology, applied in the computer field, can solve the problems of lack of identification of new words, polluted index, slow update speed, etc., and achieve the effect of improving search accuracy and word segmentation accuracy.

Active Publication Date: 2020-08-04
上海七印信息科技有限公司
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] 1. The update speed is slow, and users need to rebuild the index every time they update, which consumes a lot of computing resources;
[0004] 2. It is impossible to update in time first, and at the same time, the new words are not screened, which may cause some words that do not need word segmentation to enter the word segmentation dictionary, polluting the index

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Self-learning word segmentation method and device, computer equipment and storage medium
  • Self-learning word segmentation method and device, computer equipment and storage medium
  • Self-learning word segmentation method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] In order to make it easy to understand the technical means, creation features, achieved goals and effects of the present invention, the present invention will be further described below with reference to the specific figures.

[0056] The self-learning word segmentation method provided by the present invention can be applied to such as figure 1 in the application environment shown. The user terminal 101 communicates with the server 102 through the network. The user terminal 101 can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and the server 102 can be implemented by an independent server or a server cluster composed of multiple servers. The user uses the search service through the user terminal 101 and inputs search keywords on the search window provided by the search engine. The server 102 receives the search keywords input by the user terminal 101, determines whether the sear...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a self-learning word segmentation method. The method comprises the following steps of obtaining missed search hot words; identifying whether the missed search hot words need tobe segmented or not; if word segmentation needs to be carried out, adding the missed search hot words to a remote word segmentation dictionary; periodically scanning the remote word segmentation dictionary, and judging whether the change amount of the remote word segmentation dictionary meets index reconstruction conditions or not; if a re-indexing condition is met, re-establishing a search indexby a sub-node; and after the sub-node reestablishes the search index, resetting the search hot word lexicon in the sub-node. The invention further discloses a device for realizing the self-learning word segmentation method, computer equipment and a storage medium. The hot words are selected in a real-time self-learning mode, the remote word segmentation dictionary is updated, uninterrupted updating of ElasticSearch search services is achieved, the search accuracy is effectively improved, and the word segmentation accuracy is improved by optimizing the dictionary.

Description

technical field [0001] The present invention relates to the field of computer technology, and in particular, to a self-learning word segmentation method, device, computer equipment and storage medium. Background technique [0002] ElasticSearch is a Lucene-based search server that provides a full-text search engine with distributed multi-user capabilities. The word segmentation task is mainly used in the process of indexing by ElasticSearch. The words passed through the word segmentation system can be recognized by the search engine and returned in the next search process. However, some words cannot be successfully segmented by the word segmentation system, or there are some hot words. The generation of , also cannot be processed by the word segmentation system. At this time, the mainstream practice is to update the word segmentation dictionary and re-establish the search index. However, this method has the following drawbacks: [0003] 1. The update speed is slow, and th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/242G06F16/31
CPCG06F40/289G06F40/242G06F16/316Y02D10/00
Inventor 张浩甘露
Owner 上海七印信息科技有限公司