Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Thai sentence segmentation method based on twin recurrent neural network

A cyclic neural network and twinning technology, applied in biological neural network models, neural learning methods, neural architectures, etc., can solve the problems of difficult lexical analysis natural language processing tasks and no obvious separators between sentences, which is conducive to training, The effect of reducing parameters and improving performance

Active Publication Date: 2020-05-08
KUNMING UNIV OF SCI & TECH
View PDF9 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Thai rarely uses punctuation marks, and there are no obvious separators between sentences, which brings additional difficulties to natural language processing tasks such as Thai lexical analysis, syntactic analysis, and machine translation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Thai sentence segmentation method based on twin recurrent neural network
  • Thai sentence segmentation method based on twin recurrent neural network
  • Thai sentence segmentation method based on twin recurrent neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0020] Embodiment 1: as figure 1 As shown, a Thai sentence segmentation method based on twin cyclic neural network, the specific steps of the method are as follows:

[0021] Step1. Take the word sequence before and after the space in the corpus as the input of the input layer of the twin cyclic neural network model, and obtain the one-hot matrix representation X corresponding to the word sequence before and after the space respectively; where, the twin cyclic neural network model represents two loops Neural network model, X=[x 1 ,x 2 ,...,x t ,...,x T ], the one-hot vector corresponding to each word represents x t The dimension is N w Dimension, T represents the number of words in the word sequence, N w is the size of the vocabulary, that is, the number of words counted and deduplicated from the corpus;

[0022] Step2, the one-hot matrix corresponding to the word sequence before and after the spaces obtained in step1 respectively represents that X passes through the emb...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Thai sentence segmentation method based on a twin recurrent neural network, and belongs to the field of sentence segmentation. The method does not need to manually design features, and does not depend on part-of-speech tagging and syntax information. Compared with an existing method, the method provided by the invention is simpler, and the sentence segmentation effect isalso improved; according to the method, when the word sequences before and after the space are encoded to obtain sentence segmentation features, the word sequences before and after the space use the same model framework and share the same parameters, so that the comparability between the word sequences before and after the space is better considered, meanwhile, the parameters are reduced, and model training is better facilitated; characteristic representation of the word sequence is learned through word embedding and a recurrent neural network, implicit semantics in sentences can be captured,and therefore the sentence segmentation performance is improved.

Description

technical field [0001] The invention relates to a Thai sentence segmentation method based on a twin cycle neural network, which belongs to the field of sentence segmentation. Background technique [0002] Thai rarely uses punctuation marks, and there are no obvious separators between sentences, which brings additional difficulties to natural language processing tasks such as Thai lexical analysis, syntactic analysis, and machine translation. [0003] Thai also has punctuation marks, and Unicode even provides a special zero-width space character (Zero-WidthSpace, ZWSP) to separate Thai words. However, unlike English, Thai rarely uses punctuation marks in practical applications, and usually does not use separators between words. Instead, spaces are used to separate sentences, phrases and special words, such as between titles and names, labels and content Between, between brackets and content, etc. Therefore, Thai sentence segmentation cannot rely on punctuation, but must ful...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/211G06F40/30G06F40/126G06N3/04G06N3/08
CPCG06N3/08G06N3/045
Inventor 线岩团王红斌余正涛文永华张志菊
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products