Chinese named entity recognition method based on BERT-BiGRU-CRF

A named entity recognition, bert-bigru-crf technology, applied in instruments, computing, electrical digital data processing and other directions, can solve the problems of non-migration, difficult to enumerate rules, tedious and complicated work, etc., to save training time, The effect of improving accuracy and reducing training parameters

Active Publication Date: 2019-08-02
WUHAN UNIV
View PDF4 Cites 99 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] 1. The dictionary-based named entity recognition method relies heavily on the dictionary database, cannot identify unregistered words, and cannot identify nested entities
[0010] 2. The rule-based named entity recognition method requires linguistic background knowledge when constructing rules. Chinese expressions are diverse, and rules are difficult to enumerate and easy to conflict. Another disadvantage is that they are not transferable and the work is cumbersome and complicated.
[0012] 4. The method of named entity recognition based on neural network, which cannot characterize the polysemy of words or words
[0013] 5. The fine-tuning method based on the language model has the disadvantages of a large amount of parameters and a long training time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese named entity recognition method based on BERT-BiGRU-CRF
  • Chinese named entity recognition method based on BERT-BiGRU-CRF
  • Chinese named entity recognition method based on BERT-BiGRU-CRF

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0041] Such as figure 1 Shown is a schematic flow chart of the Chinese named entity recognition method based on the BERT-BiGRU-CRF model of the present invention. Include the following steps:

[0042] A. Obtain the training corpus data of the language model and perform preprocessing;

[0043] B. Train the BERT (Bidirectional Encoder Representations from Transformers) language model according to the training corpus data preprocessed in step A;

[0044] C. Obtain and mark the training corpus data of the named entity recognition model to form a marked corpus;

[0045] D. Preprocessing the tagged c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese named entity recognition method based on BERT-BiGRU-CRF. The method comprises three stages: in the first stage, preprocessing mass text corpora, and pre-training a BERT language model; in the second stage, preprocessing the named entity recognition corpus, and encoding the named entity recognition corpus through the trained BERT language model; and at the third stage, inputting the encoded corpus into a BiGRU+CRF model for training, and performing named entity recognition on the to-be-recognized statement by using the trained model. Construction of the Chinesenamed entity recognition method based on BERT-BiGRU-CRF is carried out, semantic representation of characters is enhanced through a BERT pre-training language model, semantic vectors are dynamicallygenerated according to contexts of the characters, and the ambiguity of the characters is effectively represented. Compared with a method based on fine tuning of a language model, the method has the advantages that training parameters are reduced, and the training time is saved.

Description

technical field [0001] The invention belongs to the field of named entity recognition, and in particular relates to a Chinese named entity recognition method based on a BERT-BiGRU-CRF model. Background technique [0002] Named entity recognition aims to identify specific entity information in text, such as person names, place names, organization names, etc. It is widely used in information extraction, information retrieval, intelligent question answering, and machine translation, and is one of the foundations of natural language processing. Traditional named entity recognition methods can be divided into dictionary-based named entity recognition methods, rule-based named entity recognition methods, traditional machine learning-based named entity recognition methods, neural network-based named entity recognition methods and language model-based fine-tuning Methods. [0003] A dictionary-based named entity recognition method, which first constructs a large-scale entity dictio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/295G06F40/30Y02D10/00
Inventor 董文永杨飘
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products