A Thai Sentence Segmentation Method Based on Siamese Recurrent Neural Network

A recurrent neural network and twinning technology, applied in biological neural network models, neural learning methods, neural architectures, etc., can solve the problems of no obvious separators between sentences, difficulty in lexical analysis and natural language processing tasks, and achieve benefits for training, The effect of improving the effect and the method is simple

Active Publication Date: 2021-10-29
KUNMING UNIV OF SCI & TECH
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Thai rarely uses punctuation marks, and there are no obvious separators between sentences, which brings additional difficulties to natural language processing tasks such as Thai lexical analysis, syntactic analysis, and machine translation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Thai Sentence Segmentation Method Based on Siamese Recurrent Neural Network
  • A Thai Sentence Segmentation Method Based on Siamese Recurrent Neural Network
  • A Thai Sentence Segmentation Method Based on Siamese Recurrent Neural Network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0020] Embodiment 1: as figure 1 As shown, a Thai sentence segmentation method based on twin cyclic neural network, the specific steps of the method are as follows:

[0021] Step1. Take the word sequence before and after the space in the corpus as the input of the input layer of the twin cyclic neural network model, and obtain the one-hot matrix representation X corresponding to the word sequence before and after the space respectively; where, the twin cyclic neural network model represents two loops Neural network model, X=[x 1 ,x 2 ,...,x t ,...,x T ], the one-hot vector corresponding to each word represents x t The dimension is N w Dimension, T represents the number of words in the word sequence, N w is the size of the vocabulary, that is, the number of words counted and deduplicated from the corpus;

[0022] Step2, the one-hot matrix corresponding to the word sequence before and after the spaces obtained in step1 respectively represents that X passes through the emb...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Thai sentence segmentation method based on a twin cycle neural network, which belongs to the field of sentence segmentation. The method of the invention does not need artificially designed features, and does not rely on part-of-speech tagging and syntactic information. Compared with existing methods, the method proposed by the present invention is more concise, and the sentence segmentation effect has also been improved; when the inventive method encodes the word sequence before and after the space to obtain the features of sentence segmentation, the words before and after the space The sequence uses the same model framework and shares the same parameters, which better considers the comparability between word sequences before and after spaces, and reduces the parameters, which is more conducive to model training; learning through word embedding and cyclic neural network The feature representation of word order helps to capture the hidden semantics in sentences, thereby improving the performance of sentence segmentation.

Description

technical field [0001] The invention relates to a Thai sentence segmentation method based on a twin cycle neural network, which belongs to the field of sentence segmentation. Background technique [0002] Thai rarely uses punctuation marks, and there are no obvious separators between sentences, which brings additional difficulties to natural language processing tasks such as Thai lexical analysis, syntactic analysis, and machine translation. [0003] Thai also has punctuation marks, and Unicode even provides a special zero-width space character (Zero-WidthSpace, ZWSP) to separate Thai words. However, unlike English, Thai rarely uses punctuation marks in practical applications, and usually does not use separators between words. Instead, spaces are used to separate sentences, phrases and special words, such as between titles and names, labels and content Between, between brackets and content, etc. Therefore, Thai sentence segmentation cannot rely on punctuation, but must ful...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/211G06F40/30G06F40/126G06N3/04G06N3/08
CPCG06N3/08G06N3/045
Inventor 线岩团王红斌余正涛文永华张志菊
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products