Word segmentation method based on MMseg algorithm and pointwise mutual information algorithm

A technology of point-by-point mutual information and word segmentation method, which is applied in computing, natural language analysis, special data processing applications, etc., can solve the problems of slow word segmentation speed and low word segmentation efficiency, and achieve the effect of improving accuracy and ambiguity resolution ability

CN106528524AInactive Publication Date: 2017-03-22SUN YAT SEN UNIV

2 Cites 13 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SUN YAT SEN UNIV
Publication Date: 2017-03-22
Estimated Expiration: Not applicable · inactive patent

Smart Images

Figure 1
Figure 2
Figure 3

Patent Text Reader

Abstract

The invention relates to a word segmentation method based on an MMseg algorithm and a pointwise mutual information algorithm. A text is subjected to word segmentation processing by the MMseg algorithm based on a dictionary, and a word segmentation result is corrected by the pointwise mutual information algorithm after the word segmentation result is obtained. A specific process of correcting the word segmentation result by the pointwise mutual information algorithm comprises the following steps: calculating pointwise mutual information of a character x and a character y which are adjacent to each other in the text; judging whether the pointwise mutual information of the character x and the character y is larger than a set threshold value or not; and if so, segmenting the character x and the character y as an independent word.

Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention relates to the field of Chinese word segmentation, and more specifically, relates to a word segmentation method based on MMseg algorithm and point-by-point mutual information algorithm. Background technique

[0002] my country's research on natural language processing started relatively late, and it only established its own natural language processing model in the 1980s. Later, with the development of computers and the improvement of users' own needs, the domestic emphasis on natural language has greatly increased. The number of research institutions has increased and the research team has grown. The research team combined the characteristics of Chinese texts while drawing on foreign achievements, and proposed a new theoretical model to improve the level of research on Chinese understanding.

[0003] There are spaces between words in English word segmentation, but in Chinese text, characters between sentences are connected togethe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More