Entity and relationship extraction method and system, device and medium

A technology of relational extraction and entities, applied in neural learning methods, instruments, unstructured text data retrieval, etc., can solve problems such as low efficiency, achieve the effect of improving efficiency and avoiding low model performance

Active Publication Date: 2022-07-12
CHENGDU UNION BIG DATA TECH CO LTD
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since this method requires Span enumeration for each word in the sentence, the efficiency is quite low, and it is urgent to develop a fast and efficient named entity extraction method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity and relationship extraction method and system, device and medium
  • Entity and relationship extraction method and system, device and medium
  • Entity and relationship extraction method and system, device and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0084] Please refer to Figure 1-Figure 2 , figure 1 It is a schematic diagram of the principle of entity and relationship extraction method, figure 2 It is a schematic diagram of the Ngram CNN architecture. This method specifically includes:

[0085] Word vector representation learning:

[0086] For the input document D={w 1 ,w 2 ,…,w n }, the document D word comes from the vocabulary database, w i ∈W v , i=1,...,n, n represents the number of words in the document, v represents the size of the vocabulary, and W represents the space of the vocabulary. Then through the BERT pre-training language model, the vector representation sequence of the document word sequence is obtained: X={x 1 ,x 2 ,…,xn },x i ∈R d ,i=1,...,n. x i is a d-dimensional vector in the real number space, representing the vector representation of the ith word, and R represents the real number space.

[0087] Use CNN network for ngram encoding:

[0088] For word embedding representation matrix ...

Embodiment 2

[0124] In the second embodiment, the entity and relation extraction method in the present invention is described in detail.

[0125] For the sentence "Mr. K was born in place D, and he led a party to establish country A on a certain day in a certain year.":

[0126] ["K","Xian","Sheng",...,"Country","."] Obtain the vector representation of each word in the sentence through the BERT model;

[0127] Extract the vector representation of the Ngram text segment headed by the current word in the sentence through the Ngram CNN encoder. For example, the vector representation of "Mr. K" is [0.3, 0.4, 0.44,..., 0.234];

[0128] The attention weight of each Ngram text segment headed by each character is obtained through the attention mechanism. For example, the Ngram text segment for the character "K" in the first position of the sentence includes: "K", "K first", "Mr. K" , "Mr. K out". Calculate their attention weights as 0.1, 0.2, 0.5, 0.2 respectively. Then calculate the vector repr...

Embodiment 3

[0135]Please refer to image 3 , image 3 A schematic diagram of the composition of an entity and relationship extraction system, the third embodiment of the present invention provides an entity and relationship extraction system, the system includes:

[0136] The pre-trained language model is used to process the document input to the pre-trained language model, and obtain the vector representation sequence of the document word sequence;

[0137] Convolutional neural network, which is used to process the vector representation sequence input to the convolutional neural network, and encodes the embedding representation of each word with the attention mechanism to obtain the sequence embedding representation;

[0138] a first encoder, configured to process the sequence embedding representation input to the first encoder to obtain entity feature embedding representation information;

[0139] The entity classifier is used to perform entity classification by embedding the entity f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method, system, device and medium for extracting entities and relationships, and relates to the field of natural language processing. Obtain the sequence embedding representation; input the sequence embedding representation into the first encoder to obtain the entity feature embedding representation information; input the entity feature embedding representation information into the entity classifier to obtain the entity classification result; input the sequence embedding representation into the second encoder to obtain the relationship The feature embedding represents information; the entity feature embedding representation information and the relationship feature embedding representation information are spliced ​​and input into the feedforward neural network to obtain the embedded representation of relationship extraction, and the embedded representation of relationship extraction is input into the relationship classifier to obtain the relationship classification result; The entity classification and relationship extraction loss function optimizes the entity classification result and the relationship classification result, and the present invention improves the efficiency of entity and relationship extraction.

Description

technical field [0001] The present invention relates to the field of natural language processing, and in particular, to an entity and relation extraction method, system, device and medium. Background technique [0002] Entity and relationship extraction is an important branch of information extraction in the field of natural language processing, which is mainly responsible for extracting entities and relationships between entities included in the document from various unstructured documents. It is widely used in various fields of natural language processing, such as knowledge base construction, knowledge base-based intelligent question answering, etc. [0003] Named Entity Recognition: Also known as entity recognition, entity chunking, and entity extraction, is a subtask of information extraction that aims to locate and classify named entities in text into pre-defined categories such as person, organization, location, time etc., what kind of entity type to identify needs to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/295G06F40/284G06N3/04G06N3/08
CPCG06F16/355G06F40/295G06F40/284G06N3/08G06N3/045
Inventor 不公告发明人
Owner CHENGDU UNION BIG DATA TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products