Named entity recognition method based on rules and improved pre-training model

A named entity recognition and pre-training technology, applied in neural learning methods, biological neural network models, instruments, etc., can solve problems such as migration, high time and labor costs, and lack of training data.

Pending Publication Date: 2021-05-18
ZHEJIANG UNIV OF TECH
View PDF0 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, there are few studies on the extraction of proper nouns in the textile field, mainly because of the lack of high-quality training data, and the texts in this field are highly specialized, for example, "chemical fiber", "rayon" and so on. Entities with modifiers cannot be transferred from entity recognition tasks in general domains, making the recognition of such entities more challenging than general entity recognition tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Named entity recognition method based on rules and improved pre-training model
  • Named entity recognition method based on rules and improved pre-training model
  • Named entity recognition method based on rules and improved pre-training model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0079] Such as image 3 As shown, the named entity recognition method based on rules and improved pre-trained models is as follows:

[0080] Step 1: Get the structured and unstructured text information of "long-staple cotton" from Baidu Encyclopedia. Parsing html files, mainly extracting keywords, abstracts, content and other information in web pages;

[0081] Step 2: Process the extracted text, remove special symbols such as "quotation mark, exclamation mark, tilde, ellipsis" in the text, then merge key words and text information into text form, and use sentence separators to divide the text sentence;

[0082] Step 3: The domain dictionary database is the domain vocabulary captured from major professional authoritative websites and dictionary databases such as Sogou. The collected words are used as the exclusive dictionary database in the textile field, and all text data are first compiled from the textile domain dictionary database Carry out the first round of labeling; t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a named entity recognition method based on rules and an improved pre-training model. According to the method, on the basis of BERT pre-training, field data which are the same as downstream tasks are added to continue pre-training, and then fine adjustment is carried out on named entity recognition tasks; meanwhile, considering that part-of-speech can express attribute information of important words, additional feature information is added in the internal structure of the BERT model to enhance the recognition performance of the system; in the aspect of deep learning model construction, a convolutional neural network and a bidirectional recurrent neural network are integrated to carry out sentence-level feature extraction on a text, finally, an entity result recognized by the model is corrected in combination with rules, whether the entity length is smaller than a certain value or not is judged, and if the front is adjectives, a new entity is spliced to serve as the final entity word; according to the method, the named entity recognition accuracy can be improved, proper nouns in the textile fabric field can be effectively extracted, and compared with an existing method, the accuracy, the recall rate and the F1 value are greatly improved.

Description

technical field [0001] The invention relates to a named entity recognition method based on rules and an improved pre-training model, especially for proper noun recognition in data in the field of textile fabrics. Based on BERT pre-training, the present invention continues pre-training by adding the same domain data as downstream tasks, and then trains on the named entity recognition model; at the same time, considering that the part of speech can express the attribute information of important words, this paper also Additional feature information is added to the internal structure of the BERT model to enhance the recognition performance of the system. In terms of deep learning models, a method of integrating convolutional neural network (CNN) and bidirectional recurrent neural network (BiLSTM) is also proposed to extract sentence-level features from text. Finally, combined with the rule-based method, the entity words identified by the named entity recognition model are verifie...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/295G06F40/211G06F40/44G06N3/04G06N3/08
CPCG06F40/295G06F40/211G06F40/44G06N3/08G06N3/047G06N3/045
Inventor 杨良怀裴慧
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products