Iterative construction method and device for military scenario text event extraction corpus

A technology of event extraction and construction method, which is applied in the field of iterative construction of military scenario text event extraction corpus, can solve the problems of unfavorable corpus management and protection, large time and manpower consumption, poor security and confidentiality, etc., to reduce the cost of manpower labeling, The effect of improving efficiency and speeding up

Active Publication Date: 2019-12-20
NAT UNIV OF DEFENSE TECH
View PDF8 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the construction of the corpus is mostly carried out by purely manual annotation, that is, the annotator directly performs the annotation operation on the original corpus, which has a low degree of automation, consumes a lot of time and manpower, is inefficient, and has poor security and confidentiality, which is not conducive to corpus management and protection of

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Iterative construction method and device for military scenario text event extraction corpus
  • Iterative construction method and device for military scenario text event extraction corpus
  • Iterative construction method and device for military scenario text event extraction corpus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The specific embodiments of the present invention will be further described below in conjunction with the accompanying drawings. It should be noted here that the descriptions of these embodiments are used to help understand the present invention, but are not intended to limit the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.

[0055] Refer to attached figure 1 , shows a kind of military scenario text event of the present invention extracts the schematic flow chart of the corpus iterative construction method embodiment, specifically comprises the following steps:

[0056] A, preprocessing, input the military scenario text corpus, carry out sentence segmentation and word segmentation on the military scenario text corpus in turn, and generate a data set represented by word sequence;

[00...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an iterative construction method and device for a military scenario text event extraction corpus. The method comprises the following steps of 1, preprocessing, and obtaining anoriginal data set represented by a word sequence; 2, constructing a seed data set, defining an event template, constructing an event trigger word dictionary, forming the seed data set through manualannotation, and dividing the seed data set into a seed training set and a test set; 3, training a model, training a machine learning model by using the seed training set, testing the model by using the test set, and optimizing the model parameters according to a test result to obtain a first learning model; 4, selecting an unlabeled training corpus, and inputting the unlabeled training corpus intothe first learning model to obtain a prediction result set; 5, correcting the prediction result set to form a new annotation corpus; and 6, through the continuous iteration, generating the training sets in sequence to form the event extraction corpus. According to the iterative construction method for the military scenario text event extraction corpus, the corpus construction efficiency is improved, the manual annotation cost is reduced, and the relatively higher corpus annotation accuracy is obtained.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, in particular to a method and device for iteratively constructing a military scenario text event extraction corpus. Background technique [0002] With the rapid development of information technology, information is also growing explosively. How to extract and organize a large amount of unordered information in a timely manner, quickly and accurately obtain useful information needed by users, and transform it into a structured form that both humans and machines can understand and use has become the focus of research and development. Information extraction is generated and developed under this background. Event extraction is an advanced stage of information extraction, and it is the most challenging task. It mainly studies the extraction of event information of interest to users from various texts, presents it in a structured form, and provides it for other information extracti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F16/35G06F17/27G06N3/00G06N20/00
CPCG06F16/36G06F16/35G06N3/006G06N20/00
Inventor 蒋序平战立莹杨若鹏温鸿鹏鲁义威卢稳新朱巍
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products