Thai and Burmese part-of-speech tagging method for fusing word-syllable pairs by using a local multi-head attention mechanism

An attention and syllable technology, applied in neural learning methods, special data processing applications, unstructured text data retrieval, etc., can solve the problems of low-frequency part-of-speech tagging effect, poor and other problems, achieve simple and effective model structure, small parameter scale , the effect of improving part-of-speech prediction

Active Publication Date: 2022-01-07
KUNMING UNIV OF SCI & TECH
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention provides a part-of-speech tagging method for Thai and Burmese texts using a local multi-head attention mechanism to fuse word-syllable pairs, so as to tag Thai and Burmese and other Southeast Asian phoneme-syllable languages, and solve the problem of low-frequency speech and unknown words The problem of poor part-of-speech tagging effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Thai and Burmese part-of-speech tagging method for fusing word-syllable pairs by using a local multi-head attention mechanism
  • Thai and Burmese part-of-speech tagging method for fusing word-syllable pairs by using a local multi-head attention mechanism
  • Thai and Burmese part-of-speech tagging method for fusing word-syllable pairs by using a local multi-head attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] Embodiment 1: as Figure 1-Figure 2 As shown, the Thai and Burmese part-of-speech tagging methods using the local multi-head attention mechanism to fuse word-syllable pairs, the specific steps of the method are as follows:

[0049] Step1. Perform text preprocessing on the Thai text LST20 data set or the Burmese ALT data set, such as a Thai sentence with m words. The present invention finds potential affix information in the words by performing syllable segmentation on each word in the sentence. , thereby expanding the sequence of words into a sequence of word-syllable pairs.

[0050] Step1.1. According to the vocabulary divided by "\n" in the Thai text, construct a word alphabet and a part-of-speech tag alphabet for the training set;

[0051] Step1.2. Call the most advanced Thai or Burmese syllable segmenter to segment the words in the text into syllables and construct a syllable alphabet;

[0052] Step1.3, subsequently, for each word, the present invention...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Thai and Burmese part-of-speech tagging method for fusing word-syllable pairs by using a local multi-head attention mechanism, and belongs to the field of natural language processing. The method comprises the following steps: preprocessing a Thai or Burmese text data set; selecting word-syllable pair features as model input in a windowing mode; learning context features from the word-syllable pair sequence by using a local multi-head attention mechanism; and finally, modeling a part-of-speech dependency relationship through a conditional random field, and predicting a part-of-speech tag. Experimental results for part-of-speech tagging data sets of Thai and Burmese show that compared with a current optimal model, the method integrates syllables as morphological features of words, facilitates learning of context features of unknown words, and relieves the influence of wrong tagging of the unknown words on the performance of the model. Moreover, a local multi-head self-attention mechanism is adopted, so that the model can obtain richer local dependence features, and a better tagging result is obtained in a part-of-speech tagging task.

Description

technical field [0001] The invention relates to a part-of-speech tagging method for Thai and Burmese texts using a local multi-head attention mechanism to fuse word-syllable pairs, and belongs to the technical field of natural language processing. Background technique [0002] Part-of-speech tagging is to judge the part of speech of each word in a given sentence, which belongs to one of the basic tasks in the field of natural language processing (NLP). Part-of-speech tagging can improve the accuracy of syntactic analysis , thus facilitating the improvement of many NLP tasks. [0003] Early part-of-speech tagging methods mainly include rule-based and statistical machine learning. The rule-based part-of-speech tagging method has the problems of incomplete rule customization and rule conflicts. At present, methods based on statistical machine learning mainly include Support Vector Machine SVM, Hidden Markov Model HMM and Conditional Random Fields CRFs. This kind of method us...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F40/284G06N3/04G06N3/08
CPCG06F16/353G06F40/284G06N3/08G06N3/047G06N3/044
Inventor 线岩团王悦寒余正涛相艳
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products