A Chinese word segmentation method based on depth learning and forgetting algorithm

A deep learning, Chinese word segmentation technology, applied in the field of word segmentation, can solve the problem of inability to complete word segmentation, and achieve the effect of appropriate dictionary size

Active Publication Date: 2019-02-26
北京布本智能科技有限公司
View PDF43 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing word segmentation algorithm is based on the thesaurus, words that do not appear in the thesaurus cannot complete word segmentation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese word segmentation method based on depth learning and forgetting algorithm
  • A Chinese word segmentation method based on depth learning and forgetting algorithm
  • A Chinese word segmentation method based on depth learning and forgetting algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0076] Obtain natural language by scanning sentences, and then use forgetting algorithm and deep learning to segment words, and the fused word segmentation results:

[0077] The following are the results of the word segmentation of the two algorithms:

[0078] Forgetting Algorithm Improves Word Segmentation Results

[0079]

[0080]

[0081]

[0082]

[0083]

[0084]

[0085] Word segmentation result of deep learning algorithm

[0086]

[0087]

[0088]

[0089]

[0090]

[0091]

[0092] The result after combining the above schemes:

[0093] Combined word segmentation results

[0094]

[0095]

[0096]

[0097]

[0098]

[0099] .

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese word segmentation method based on depth learning and forgetting algorithm, which comprises the following steps: 1. scanning sentences word by word to obtain natural language; dividing the scanned natural language into word sequences by depth learning word segmentation method and inputting them into a first word library; 2. scanning sentences word by word to obtainnatural language; 3. 2. Scan sentence verbatim to obtain natural language. 3, the word sequence in the first thesaurus and the candidate words in the second thesaurus are fused to obtain the final word segmentation result. The fusing method is: if the word sequence in the second thesaurus corresponds to the word in depth learning, the word sequence is merged into the word; a single word in the second thesaurus, if it corresponds to a word in depth learning, is combined forward or backward to form a word. The word segmentation method of the invention can automatically detect domain knowledge,complete the function of finding new words in the unsupervised domain and improve the word segmentation effect by integrating the depth learning word segmentation method and the forgetting algorithm word segmentation method.

Description

technical field [0001] The invention relates to the technical field of word segmentation, in particular to a Chinese word segmentation method based on deep learning and forgetting algorithms. Background technique [0002] Chinese Word Segmentation refers to dividing a sequence of Chinese characters into individual words. Word segmentation is the process of recombining continuous word sequences into word sequences according to certain specifications. [0003] 1. Word segmentation method based on string matching [0004] This method is also called the mechanical word segmentation method. It matches the Chinese character string to be analyzed with an entry in a "sufficiently large" machine dictionary according to a certain strategy. If a certain string is found in the dictionary, the match is successful. (a word is recognized). According to different scanning directions, the string matching word segmentation method can be divided into forward matching and reverse matching; a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/295G06F40/289Y02D10/00
Inventor 卢学裕王安杨大海杨利军
Owner 北京布本智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products