Unlock instant, AI-driven research and patent intelligence for your innovation.

Nested event extraction method based on domain pre-training

An event extraction and pre-training technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as attribute overlap, poor information extraction effect, and performance effect degradation

Pending Publication Date: 2021-07-06
EAST CHINA UNIV OF SCI & TECH
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First, solve the problem of directly migrating BERT to the vertical domain, which will be limited in specific domain-based scenarios, and its performance will decline
The present invention proposes a method for pre-training a domain language model, taking the news case domain as an example to improve the domain language processing capability; secondly, factors such as overlapping attributes of multi-subject nested events, event attributes scattered in different sentences and other factors cause information extraction The problem of poor performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Nested event extraction method based on domain pre-training
  • Nested event extraction method based on domain pre-training
  • Nested event extraction method based on domain pre-training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] In order to enable those skilled in the art to better understand the solution of the present invention, and to make the above-mentioned purpose, technical solution and advantages of the present invention more obvious and understandable, the present invention will be further described in detail below in conjunction with the embodiments and accompanying drawings.

[0042] See figure 1 , figure 1 It is a schematic diagram of the principle of the nested event extraction method based on domain pre-training provided by the embodiment of the present invention. The method includes: firstly, obtain the original domain corpus from the domain database, and preprocess it; secondly, improve the general BERT model, and use the domain corpus for pre-training to obtain the domain CaseBERT; thirdly, by combing the level of nested events relationship, a predefined layered extraction template for nested events; finally, through the joint extraction model of trigger words and event attrib...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a nested event extraction method based on domain pre-training. The method comprises the following steps: 1, acquiring and preprocessing a field corpus, constructing a field professional word list for the corpus by adopting an adjacent word solidification degree method, and randomly extracting text data for manual labeling to obtain a nested event text data set; 2, by using a domain corpus and a domain professional word list, pretraining domain language model CaseBERT and adding news category classification pre-training task to the universal language model BERT; 3, defining a template for extracting nested event information in a layered manner, and carding the hierarchical relationship of nested events; 4, adopting a CaseBERT model and a predefined nested event extraction template to perform trigger word and event attribute joint extraction on the nested event text data set. The method is suitable for a domain multi-body nested event extraction task, and the accuracy of domain nested event extraction is effectively improved by pre-training the domain language model and pre-defining the nested event hierarchical extraction template.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, specifically relates to the technical field of text information extraction, and more specifically relates to providing a nested event extraction method based on domain pre-training. Background technique [0002] With the explosive growth of Internet information, the need to quickly and accurately obtain the required information from the vast information sources is becoming more and more urgent. Event extraction is an in-depth research task of information extraction. It aims to extract events of interest to users from plain text, and present them to users in a structured form, and then provide them with query, analysis and utilization. Common important downstream applications , such as: building knowledge graphs, intelligent question answering and information retrieval, etc. Event extraction from text, especially multi-agent nested event extraction, has become a research diffi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/9532
CPCG06F16/9532
Inventor 张维彦阮彤叶琪翟洁
Owner EAST CHINA UNIV OF SCI & TECH