Relationship extraction data set construction method and device and electronic equipment

A technology of relation extraction and construction method, which is applied in the fields of electrical digital data processing, special data processing applications, natural language data processing, etc., can solve problems such as data set noise, achieve the effect of reducing noise, improving the effect of relation extraction, and improving quality

Pending Publication Date: 2022-06-03
CHINA TELECOM CLOUD TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Therefore, the technical problem to be solved by the present invention is to overcome the defect of noise in the data set constructed in the existing remote supervision data set construction process, thereby providing a method, device and electronic equipment for relation extraction data set construction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Relationship extraction data set construction method and device and electronic equipment
  • Relationship extraction data set construction method and device and electronic equipment
  • Relationship extraction data set construction method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0025] In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. The indicated orientation or positional relationship is based on the orientation or positional relationship shown in the accompanying drawings, which is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the indicated device or element must hav...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a relation extraction data set construction method and device and electronic equipment. The method comprises the steps that a plurality of triples with relation types meeting a preset relation type are obtained; collecting an unstructured corpus in the same field as the triple from the target database by using a preset data search method; segmenting the paragraphs in the unstructured corpus according to the paragraph labels and preset types of sentence segmentation punctuation marks to obtain a sentence set, and performing word segmentation operation on the sentence set by utilizing a preset word segmentation tool; matching each statement in the statement set with a head entity and a tail entity in the triple in terms of word granularity; if any statement is matched with the head entity and the tail entity of any triad at the same time, judging that the triad exists in the statement and taking the combination of the statement and the corresponding triad as a marked sample; and when the entity in the triad corresponding to any marked sample is in the target parallel sequence, expanding the statement set by utilizing the entity in the triad.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a relation extraction data set construction method, device and electronic device. Background technique [0002] With the rapid development of Internet technology and the advent of the era of big data, the explosive growth of data, especially unstructured data accounts for more than 75% of them, how to extract effective knowledge from a large amount of unstructured data has become a subject Concerns. Entity relationship extraction technology is a technology that accurately extracts the relationship between entities from unstructured and semi-structured data such as text and web pages. At present, due to the scarcity of data sets and other reasons, the most commonly used method of entity relation extraction technology is the semi-supervised machine learning method based on remote supervision. [0003] Remote supervision assumes that if two entities are...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/289G06F16/36
CPCG06F40/295G06F40/289G06F16/367
Inventor 刘雅璇
Owner CHINA TELECOM CLOUD TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products