Improved Transformer + CRF (Content Recognition Function)-based method for identifying old rattle named entity

A technology for named entity recognition and named entities, applied in neural learning methods, semantic analysis, natural language data processing, etc., can solve problems such as the difficulty of summarizing language rules and the lack of corpus for Lao language research, and achieve improved accuracy and recognition accuracy , to avoid the effect of the distance problem

Inactive Publication Date: 2020-10-16
KUNMING UNIV OF SCI & TECH
View PDF2 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the characteristics of Lao language make it difficult to summarize the language rules of Lao language, and the research corpus of Lao language is extremely scarce. At present, there are few studies on Lao language in the field of named entity recognition. Therefore, the recognition accuracy of Lao language is still very high. room for improvement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved Transformer + CRF (Content Recognition Function)-based method for identifying old rattle named entity
  • Improved Transformer + CRF (Content Recognition Function)-based method for identifying old rattle named entity
  • Improved Transformer + CRF (Content Recognition Function)-based method for identifying old rattle named entity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] Embodiment 1: as Figure 1-3 As shown, a Lao language named entity recognition method based on improved Transformer+CRF, the specific steps are as follows:

[0041] Step1, preprocess the existing Lao language named entity corpus and divide the data set, in which the training set accounts for 90% and the test set accounts for 10%.

[0042] Step2, segment the Lao sentence, and pre-train the word vector through Gensim's word2vec model, and train the word vector with contextual semantics.

[0043] Step3, take the single character of each word after the word segmentation of the Lao sentence as input, and use Transformer as the character encoder to output the character-level feature vector.

[0044] Step4, splicing character-level feature vectors and word vectors trained in Step2 to form word embeddings.

[0045] Step5, perform position encoding on the word embedding (word embedding) obtained in Step4, and combine a position vector representing position information on the w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an improved Transformer + CRF-based method for recognizing a named entity of Lambda language, and belongs to the field of recognition of small languages in natural language processing. According to the method, an improved Transformer model is adopted to encode the Lao language, and then a CRF (Conditional Random Field) model is adopted to decode the Lao language to obtain an optimal annotation sequence. The method comprises the following steps: firstly, taking Transformer as a character encoder to carry out character encoding on a single character of the Lambda language; splicing the character-level vector obtained by encoding with a pre-trained word vector and carrying out position encoding; wherein the input signal is used as the input of a Transformer encoding part; through training of a multi-head attention layer and a feedforward neural network layer in sequence, finally, an obtained vector set is input into a CRF model fused with old-age naming and geographical name institution name linguistic features for named entity recognition training, and a named entity recognition model fused with old-age language rules is obtained. Compared with a mainstream BLSTM + CRF named entity recognition method, the recognition effect of the method is improved.

Description

technical field [0001] The invention relates to a Lao language named entity recognition method based on an improved Transformer+CRF, which belongs to the field of small and medium language recognition in natural language processing. Background technique [0002] Named entity recognition is a fundamental task in natural language processing. Currently, commonly used methods for named entities include rule-based and dictionary-based methods, statistical-based methods, hybrid model-based methods, and deep learning-based methods. Due to the lack of labeled corpus in Lao language, deep models such as RNN, CNN, BiLSTM, and BiGRU are currently mainly used in the research of Lao language named entity recognition. The neural network model trains itself and extracts features. With the introduction of the Transformer model based on the attention mechanism with higher parallel efficiency, people have also begun to use the Transformer+CRF model to complete the task of natural language p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/30G06N3/04G06N3/08
CPCG06F40/295G06F40/30G06N3/08G06N3/044G06N3/045
Inventor 周兰江杨志婥琪
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products