Baidu Encyclopedia Relation Triple Extraction Method Based on Rules and Remote Supervision

A technology of remote supervision and Baidu Encyclopedia, applied in the field of knowledge graph, can solve the problem of poor effect of using rules

Active Publication Date: 2022-03-15
SUN YAT SEN UNIV
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The information box is highly structured and the content is fragmented, which is more suitable for rule extraction, while the text is unstructured text, and the effect of using rules will be poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Baidu Encyclopedia Relation Triple Extraction Method Based on Rules and Remote Supervision
  • Baidu Encyclopedia Relation Triple Extraction Method Based on Rules and Remote Supervision
  • Baidu Encyclopedia Relation Triple Extraction Method Based on Rules and Remote Supervision

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] like Figure 1-3 As shown, a Baidu Encyclopedia relationship triple extraction method based on rules and remote supervision, including the following steps:

[0051] S1: Extract relationship triples from the information box: extract the part of the information box from the HTML source code; for each line of the information box, the first attribute is the relationship, the second attribute is the tail entity, and the entry name is the head Entity; continue to investigate the relationship whose occurrence times are not less than the threshold N, as a meaningful relationship, and based on this, filter out the connection between the head and tail entities that are mainly nouns and named entities; then, the tail entity is completely All the triples enclosed by the title of the book are reserved; the tail entities with parallel relationships are disassembled and simplified into multiple triples with the same head entity and relationship; any relational triples related to mater...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a method for extracting relation triples from Baidu Encyclopedia based on rules and remote supervision. The method mainly adopts the method based on rules and regular expressions to extract relation triples for structured texts in information sets such as information boxes. groups, and these triples can then be used as inputs to long-range supervision algorithms. For the unstructured text with scattered information, on the one hand, the present invention extracts a small part of relation triples by writing simple, accurate and obvious rules; As the input of the remote supervision algorithm, mark all sentences containing head and tail entities in the body text, classify them according to the relationship, train the classifier, and then apply the classifier to other sentences in the body text, thereby discovering more Triad.

Description

technical field [0001] The present invention relates to the field of knowledge graphs, and more specifically, relates to a method for extracting relational triples from Baidu Encyclopedia based on rules and remote supervision. Background technique [0002] Knowledge graph, in essence, is a semantic network that reveals the relationship between entities. It can formally describe the things in the real world and their mutual relationship. More and more natural language processing fields such as search, intelligent question answering, and dialogue robots have been widely used. [0003] In the knowledge base, structured knowledge is usually expressed in the form of triples, namely (h, r, t), where h, r, and t represent the head entity, relation, and tail entity, respectively. Therefore, the extraction of relational triples is the most basic work for building a knowledge base. Only by ensuring a certain number and quality of triples can the subsequent application of knowledge gr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/36
Inventor 王珩毛明志潘嵘
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products