Named entity corpus annotation training system

A named entity and corpus labeling technology, applied in natural language data processing, instruments, calculations, etc., can solve the problems of few large-scale general corpora, large manpower and material resources, and poor model adaptive ability, so as to reduce labor costs and label High efficiency and reduced complexity

Active Publication Date: 2022-06-14
10TH RES INST OF CETC
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Moreover, statistics-based methods rely heavily on corpus, and there are relatively few large-scale general corpora that can be used to build and evaluate named entity recognition systems.
Since the acquisition of large-scale manually labeled data requires a lot of manpower and material resources, and the lack of training corpus, the domain adaptive ability of the model is poor.
This also makes it difficult for existing named entity recognition methods to be widely promoted.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Named entity corpus annotation training system
  • Named entity corpus annotation training system
  • Named entity corpus annotation training system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] See Figure 1。 In the preferred embodiment described below, a named entity corpora labeling training system, comprising: naming entity corpus labeling preparation module, semi-automated corpus naming entity labeling module, feedback model learning training module, and named entity labeling model effect evaluation module, characterized in that: the naming entity corpus labeling preparation module distinguishes data from different sources, for different named entity corpus, the named entity corpus source is selected, and an optional and applicable labeling algorithm is provided in the labeling process; Semi-automatic corpus naming entity annotation module for different annotation requirements and corpus characteristics, independently select the adaptation algorithm and carry out automatic annotation, through the integration of conditions with the airport CRF, long and short-term memory network LSTM + CRF, hidden Markov model HMM, support vector machine SVM, based on graph sort...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A named entity corpus tagging training system disclosed in the present invention aims to provide a semi-automatic tagging training device for named entity recognition that can improve the accuracy rate, correct rate, and recall rate of named entity recognition. The present invention is realized through the following technical solutions: the named entity corpus labeling preparation module provides selectable and applicable labeling algorithms during the labeling process; the semi-automatic corpus named entity labeling module independently selects the adaptation algorithm and performs automatic labeling, based on the named entity extraction algorithm At least one named entity extraction algorithm in the text corpus to be labeled is pre-labeled for a single named entity; when the labeling task is completed, the feedback model learning and training module uses the labeled corpus to train the named entity model, and the automatic feedback adjustment completes the new Named entity labeling task; the named entity labeling model effect evaluation module evaluates the quantitative labeling effect of model indicators, and recommends the default optimal algorithm model based on the evaluation results.

Description

Technical field [0001] The present invention relates to the field of training corpus and applied text mining techniques, in particular to the naming of solid corpora semi-automated labeling training methods and apparatus. Background [0002]In recent years, deep learning methods based on neural networks have achieved great success in the fields of computer vision and speech recognition, and have also made a lot of progress in the field of natural language processing. Deep learning has also achieved good results in the study of named entity recognition (NER), a key fundamental task of NLP. However, for deep learning methods, a large number of corpus need to be labeled, otherwise it is very easy to overfit and cannot achieve the expected generalization ability. With the rapid development of big data collection and acquisition methods, it has become particularly urgent to maximize the value from data, which puts forward new requirements for intelligent analysis of big data. The mode...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/295
CPCG06F40/295
Inventor 代翔崔莹黄细凤杨露丁洪丽张志朱宇涛谭礼晋
Owner 10TH RES INST OF CETC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products