Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Adaptive word segmentation method

A word segmentation method and adaptive technology, applied in the construction and application of word segmentation system, can solve the problems of limited promotion effect, high cost of labeling corpus, and inconvenience of expansion in different fields.

Inactive Publication Date: 2016-03-30
贺惠新
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method is highly dependent on the training corpus, and different models need to be trained for different fields when applied, and the cost of labeling corpus is too high, so that this method is not easy to expand in different fields
There are also technologies that combine dictionary-based and statistical methods, but in fact, dictionaries and training corpus are added to the model as internal resources or independent resources, and the effect of promotion is limited when applied.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Adaptive word segmentation method
  • Adaptive word segmentation method
  • Adaptive word segmentation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0074] Below with the accompanying drawings figure 1 and figure 2 This embodiment will be described.

[0075] The method designed by the present invention is composed of two stages, a training model and an application model, and it includes the training stage of the following specific application mode 1 and the application phase of the specific application mode 2.

[0076] Specific application method 1: training phase

[0077] Training step 1: Obtain dependent resources in the model training phase: Obtain a set of NS sentences S={S(i)} that have been divided into good words (the characters in each sentence have a definite standard answer for dividing into independent words) for training Corpus, each sentence is recorded as S(i), where 1≤i≤NS, NS≥50000 is required; a dictionary D containing more than 50 words is manually given, and the dictionary expansion coefficient ε is artificially specified 1, ε 2 ;

[0078] Training step 2: Extract highly relevant words from the tra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a word segmentation system construction and application method, and belongs to the field of computer technology application for natural language processing. The present invention provides a word segmentation method combining a dictionary and a statistic model based on inherent properties of sentences of the natural language. According to the method, in processing, dictionary information is used as a resource that is fit and adjusted externally, and can produce effect and generate features in the statistic model, the impact of original training language materials on the dictionary in model learning is fully taken into account, and the algorithm has efficient feature generation and combination modes, thereby fully lowering calculation complexity, and effectively generating high-precision word segmentation model finally; and when the method is used, related words can be added in the dictionary conveniently, thereby effectively improving applicability of the algorithm in different scenes. According to the adaptive word segmentation method, the computer can automatically perform word segmentation processing on sentences of the natural language, it is convenient to extend applications in different fields, and word segmentation results can be used by the computer for following analysis.

Description

technical field [0001] The invention relates to a construction and application method of an adaptive word segmentation system, which belongs to the field of computer technology application of natural language processing. Background technique [0002] The current method of recording and transmitting information is mainly through human natural language. Human language is closely related to the environment of crowd activities, and it is the basic condition for mutual consultation and joint completion of tasks in the same cognitive category. This natural language is a tool for people to participate in social activities for more efficient information exchange and retention. Human language is based on independent words as the cognitive structure, and words are the constituent elements of words. When language is used for communication, words that can express independent meanings are connected in the form of sequences, and recorded in the form of continuous word strings with order...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/284
Inventor 贺惠新
Owner 贺惠新
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products