Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-task Chinese entity naming recognition method

An entity naming and recognition method technology, applied in neural learning methods, character and pattern recognition, instruments, etc., can solve problems such as difficulty in dividing entity boundaries, wrong word segmentation outside the vocabulary, and unsatisfactory BILSTM feature extractor effect. time consuming effect

Pending Publication Date: 2022-02-25
CHANGSHA UNIVERSITY OF SCIENCE AND TECHNOLOGY
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Because the entity boundaries of Chinese named entity recognition are difficult to divide, there are problems such as word segmentation errors, out-of-vocabulary (OOV), etc., BILSTM is not ideal as a feature extractor.
The pre-training of word vectors is based on the feature extraction of words and characters, ignoring the relevant information of words in the context, and extracting a static word vector that does not contain context, which also leads to the model’s ability to recognize entities. Decline

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-task Chinese entity naming recognition method
  • Multi-task Chinese entity naming recognition method
  • Multi-task Chinese entity naming recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0067] Below in conjunction with embodiment the specific embodiment of the present invention is described in further detail:

[0068] Build the model and train it:

[0069] Divide the experimental data set into training set, validation set, and test set, and use BIO labeling rules for entity labeling. The tags used are Begin-named entity, In-named entity, and Out. When the named entity is described by a word, it is marked as Begin-named entity. When the named entity is described by a word, the word beginning with Begin-named entity is marked , other words are marked as In-named entity, and words that are not named entities are marked as Out. Then build the BERT-BI-BiLSTM-CRF network structure, which includes a bidirectional encoder\decoder, a double-layer long-short-term memory network layer, an attention network, a hidden layer, and a conditional random field layer. The encoder, decoder, double-layer long-term short-term memory network layer and conditional random field lay...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-task Chinese entity naming recognition method, which comprises the following steps of: (1) firstly, preprocessing data, dividing a data set and setting a labeling task; (2) carrying out feature extraction on the input main task data and auxiliary task data through BERT; (3) for the main task and the auxiliary task, performing classification training on word vectors by adopting a double-layer LSTM neural network model comprising input, hiding and output; (4) performing full connection on the trained hidden layer information of the auxiliary task and the main task through an attention mechanism layer; (5) finally, considering global label information in the sequence through a CRF layer, and outputting an optimal label sequence; and (6) performing performance evaluation on the trained model through the verification set. According to the method, researchers can be helped to efficiently obtain valuable information and knowledge from massive Chinese text data, the problem that time and labor are consumed in manual information extraction is effectively solved, and the method has important significance in further text mining work.

Description

technical field [0001] The invention relates to the technical field of text mining, in particular to a multi-task Chinese entity naming recognition method. Background technique [0002] Nowadays, the problem of entity naming recognition in English text has been widely studied. However, Chinese NER still faces challenges such as Chinese word segmentation, and it is often difficult to define the composition of a word. In the past Chinese NER tasks, the recurrent neural network (RNN) was often used to improve the performance of the model on entity classification tasks, but RNN has the problem of gradient disappearance and gradient explosion in the long-distance training process. The long-term short-term memory model (LSTM) can be used in Longer sequences have better performance, and a simple tuning trick for LSTM units in RNNs can significantly reduce overfitting. The neural network model combining Bidirectional Long Short-Term Memory (BiLSTM) and Conditional Random Field (CR...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06K9/62G06N3/04G06N3/08
CPCG06F40/295G06N3/049G06N3/08G06N3/045G06F18/214
Inventor 唐小勇黄勇许佳豪王仕果章登勇张经宇
Owner CHANGSHA UNIVERSITY OF SCIENCE AND TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products