A method for establishing a word segmentation model, a word segmentation method and a device thereof

A word segmentation and model technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of information loss, reduce word segmentation accuracy, etc., and achieve the effect of improving accuracy and expanding dimensions

Active Publication Date: 2016-06-29
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is to provide a method for establishing a word segmentation model, a method for word segmentation and a device thereof, so as to solve the problem of the amount of information caused by replacing the relationship between words and words with the relationship between parts of speech and parts of speech in the prior art. A large loss, thereby reducing the defect of word segmentation accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for establishing a word segmentation model, a word segmentation method and a device thereof
  • A method for establishing a word segmentation model, a word segmentation method and a device thereof
  • A method for establishing a word segmentation model, a word segmentation method and a device thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0038] Please refer to figure 1 , figure 1 It is a schematic flowchart of an embodiment of a method for establishing a word segmentation model in the present invention. Such as figure 1 As shown, the method includes:

[0039] Step S101: labeling each lexical entry and the part of speech of each lexical entry to the training corpus.

[0040] Step S102: Determine the part of speech of each entry under the corresponding part of speech.

[0041] Step S103: Use the marked training corpus to calculate the generation probability of each entry under the corresponding part of speech and the transition probability between each part of speech.

[0042] Step S104: Use the generation probability of each word entry under the corresponding part of speech to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for establishing a word segmentation model, a word segmentation method, a device for establishing the word segmentation model and a word segmentation device. The method for establishing the word segmentation model comprises the following steps of: A1, labeling each entry of a training corpus and characteristics of each entry; B1, determining the parts of speech of each entry under corresponding characteristics; C1, counting the generation probabilities of each entry under corresponding parts of speech and the probabilities of conversion between the parts of speech by utilizing the labeled training corpus; and D1, obtaining a basic dictionary by utilizing the generation probabilities of each entry under the corresponding parts of speech, obtaining a conversion dictionary by utilizing the probabilities of the conversion between the parts of speech, and adding the basic dictionary and the conversion dictionary into the word segmentation model. When the word segmentation model is used for word segmentation, word segmentation accuracy can be improved, and characteristic labeling work can be finished at the same time of word segmentation.

Description

【Technical field】 [0001] The invention relates to the technical field of natural language processing, in particular to a method for establishing a word segmentation model, a word segmentation method and a device thereof. 【Background technique】 [0002] With the widespread use of the Internet, more and more texts and information are disseminated through the Internet. In order to retrieve and mine valuable content from these texts and information, natural language processing is an indispensable technology, and word segmentation is Fundamental work in natural language processing. [0003] In the prior art, word segmentation mainly includes rule-based word segmentation and statistics-based word segmentation. Rule-based word segmentation includes forward maximum matching, reverse maximum matching, two-way maximum matching, shortest segmentation segmentation, segmentation based on rule sets, etc. It is characterized by fast speed, but the effect on ambiguity segmentation is not g...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 何径舟吴中勤
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products