Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese word segmentation method

A Chinese, word segmentation technology, applied in the field of natural language processing and deep learning

Inactive Publication Date: 2017-09-15
YUNNAN UNIV
View PDF4 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the LSTM neural network can only remember the past information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese word segmentation method
  • Chinese word segmentation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] combined with figure 1 and figure 2 , the specific implementation manner provided according to the present invention is described in detail as follows.

[0015] The present invention aims to provide a Chinese word segmentation technology solution based on attention-based bidirectional long-short-term memory neural network and conditional random field, including five parts, (1) converting the input Chinese text into character vectors; (2) training And the sequence is modeled by a two-way long-term and short-term memory neural network based on the attention mechanism; (3) the score vector of the sequence is obtained through the linear chain conditional random field (CRF); (4) the corresponding character of each character is obtained from the score vector (5) Convert the word segmentation tag corresponding to each character into an output word segmentation text sequence separated by spaces.

[0016] figure 1 Represents the entire process from the input text sequence to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Chinese word segmentation is a process that a Chinese character string is segmented into a word sequence according to a certain specification. Since the structure of a Chinese sentence is complex and no formal delimiters are in the presence between words and even the information of a next episode needs to be combined to carry out word segmentation judgment, the accuracy of an existing Chinese word segmentation method needs to be improved. The invention discloses a Chinese word segmentation method, which comprises the following steps that: 1: inputting a Chinese text to be subjected to word segmentation into a system to serve as a sequence A; 2: transferring the sequence A to a word vector searching layer, and converting an input character into a word vector to obtain a sequence B; 3: transferring the sequence B as an input sequence to a bidirectional shot and long term memory neuron network based on an attention mechanism, and subsequently, through one layer of hidden layer, obtaining an output sequence C; 4: transferring the sequence C as the input sequence to a conditional random field decoding layer, and generating a word segmentation markup tag sequence D; and finally, converting the sequence D into a text sequence E spaced by a space.

Description

technical field [0001] The invention belongs to the technical field of natural language processing and deep learning, and specifically relates to a Chinese word segmentation method based on an attention mechanism-based bidirectional long-short-term memory neural network and a conditional random field. Background technique [0002] Chinese word segmentation refers to the process of dividing continuous Chinese character strings into word sequences according to certain specifications. Chinese is different from English. Its own characteristic is that Chinese is based on words as the basic writing unit. Sentences and paragraphs are demarcated by delimiters, but there is no formal delimiter between words. In natural language processing, words It is the smallest meaningful language component that can act independently, so the quality of word segmentation directly affects the subsequent natural language processing tasks. Chinese word segmentation is an important basic research in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06N3/04
CPCG06F40/253G06F40/289G06N3/04
Inventor 金宸李维华王顺芳郭延哺邓春云
Owner YUNNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products