BERT-based multi-model fusion event subject extraction method

A multi-model and event technology, applied in the field of data processing, can solve problems such as poor recall rate, low portability, and difficulty in covering event types, so as to improve accuracy and ensure diversification.

Pending Publication Date: 2020-06-09
民生科技有限责任公司
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This type of method depends on the specific form of the text (language, domain, and document format, etc.), and the process of obtaining the template is time-consuming and laborious, which is highly professional. Moreover, it is difficult for the formulated model to cover all event types. When the corpus changes , the schema needs to be re-fetched
In view of the low portability and poor recall rate of the method based on pattern matching, event subject extraction based on machine learning has become the mainstream method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • BERT-based multi-model fusion event subject extraction method
  • BERT-based multi-model fusion event subject extraction method
  • BERT-based multi-model fusion event subject extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0079] 1. Data preprocessing.

[0080] Clean and process data, including the following processing options:

[0081] Remove special symbols that are not useful for training language models, such as ▽[[+_+]];

[0082] Replace consecutive spaces with commas;

[0083] For samples with multiple event bodies, the event bodies need to be adjusted to match the order in the original text

[0084] The data is divided into training samples and prediction samples;

[0085] 2. Embedding training samples as vectors

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a BERT-based multi-model fusion event subject extraction method, and belongs to the technical field of data processing. The method comprises the following steps: preprocessingcrawling data to obtain a training sample and a prediction sample; performing embedding operation on the training sample and the prediction sample to obtain a training sample input sequence and a prediction sample input sequence of the BERT pre-training network; adopting a plurality of single models with different complexity based on a BERT pre-training network, utilizing a training sample inputsequence to train the single models, and optimizing network parameters; inputting the prediction sample input sequence into a plurality of trained single models, and outputting a plurality of model results; and fusing the plurality of model results to obtain a final prediction result of the prediction sample. According to the method, models with different complexities are adopted, diversificationof the models is guaranteed, parameters are adjusted for training, detection results of the multiple models are fused, and the detection accuracy is further improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method for extracting event subjects based on BERT-based multi-model fusion. Background technique [0002] Event identification is one of the important tasks in the field of public opinion monitoring and the financial field. In the financial field, "events" are an important decision-making reference for investment analysis and asset management. The event subject extraction task for the financial field belongs to the limited domain event extraction in the event extraction task, and is one of the important links in information extraction and knowledge graph construction. The complexity of "event recognition" lies in the judgment of the event type and the event subject, and only the subject that has a specific event type is the extraction target. [0003] There are currently two main types of methods: pattern-matching-based methods and machine-learning-based methods. [...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
CPCG06F18/214G06F18/25
Inventor 李振刘恒赵兴莹秦培歌李勇辉
Owner 民生科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products