Entity relationship extraction method for automatically constructing data set

An entity relationship and automatic construction technology, applied in natural language data processing, unstructured text data retrieval, electronic digital data processing, etc., can solve the problems of difficulty in generating data sets, relying on a large number of resource calculations, etc., to solve the problem of difficult generation Effect

Active Publication Date: 2021-03-12
NO 30 INST OF CHINA ELECTRONIC TECH GRP CORP
View PDF11 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This solution mainly solves the following two problems. On the one hand, it can realize automatic parsing of text content, effectively solving the problem of difficult generation of training and test data sets; Extracting problems that rely on a large number of resource calculations, only by fine-tuning the Bert model, can efficiently extract the entity relationship in the text, so as to intuitively present the essential connection between multi-source heterogeneous data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity relationship extraction method for automatically constructing data set
  • Entity relationship extraction method for automatically constructing data set
  • Entity relationship extraction method for automatically constructing data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further described below in conjunction with the accompanying drawings.

[0033] Such as figure 1 As shown, the present invention proposes an entity relationship extraction method for automatically constructing a data set, including the following processes:

[0034] Step 1, carrying out the collection and preprocessing of corpus;

[0035] Step 2, define a triple dictionary table, and build a synonym table;

[0036] Step 3, using the LTP tool to generate a training data set and a test data set;

[0037] Step 4, training the network model according to the training data set;

[0038] Step 5. Perform entity and relationship prediction on the test data set through the trained network model;

[0039] Step 6. Optimizing the prediction results to obtain a triplet data set.

[0040] specific,

[0041] In step 1, crawler data and undisclosed document data are mainly collected, the collected multi-source heterogeneous data is cleaned up, and fina...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an entity relationship extraction method for automatically constructing a data set. The entity relationship extraction method comprises the following steps: 1, collecting and preprocessing corpora; 2, defining a triple dictionary table, and constructing a synonym table; 3, generating a training data set and a test data set by using an LTP tool; 4, training a network model according to the training data set; 5, performing entity and relationship prediction on the test data set through the trained network model; 6, optimizing the prediction result to obtain a triple data set. By the adoption of the scheme, text content can be automatically analyzed, and the problem that a training and testing data set is difficult to generate is effectively solved; according to the scheme, through optimization and adjustment of the bert model, the problem that extraction of the entity relationship needs to depend on a large number of resource calculations in the past is solved, theentity relationship in the text can be efficiently extracted only through fine adjustment of the bert model, and therefore the intrinsic relationship between the multi-source heterogeneous data is visually presented.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to an entity relationship extraction method for automatically constructing a data set. Background technique [0002] Entities and relationships summarize the main content of the text, can intuitively display the connection between data, and provide basic data for downstream tasks such as intelligent question answering and retrieval systems; at present, except for a few keywords provided by standardized documents such as papers, most Documents do not provide an intuitive data structure that can reflect their content; the traditional methods of relying on manual reading of texts to extract document entities and relationships are increasingly unable to meet the needs of practical applications in today's multi-source and massive document data; therefore, how to Efficient and accurate extraction of entities and relationships is a problem that needs to be solved urgently. At pre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F40/295G06F40/247G06F40/242G06N3/04
CPCG06F16/367G06F40/295G06F40/247G06F40/242G06N3/045
Inventor 房冬丽魏超李俊衡宇峰黄元稳
Owner NO 30 INST OF CHINA ELECTRONIC TECH GRP CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products