Method, device and electronic equipment for determining knowledge sample data set

A sample data set and sample data technology, applied in the field of data processing, can solve the problems of high labor cost, difficult labeling, and small data scale.

Active Publication Date: 2020-09-22
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, this crowdsourcing labeling method is difficult to label, the labeling speed of crowdsourcing users is low, and the labor cost of labeling is high. As a result, the output data is small in scale and cannot meet the training needs of the deep learning models commonly used in the current academic and industry circles.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, device and electronic equipment for determining knowledge sample data set
  • Method, device and electronic equipment for determining knowledge sample data set
  • Method, device and electronic equipment for determining knowledge sample data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0064] SPO (Subject-Predicate-Object, subject-predicate-object) triplet data refers to a triplet composed of entity S, relationship P, entity O, or entity S, attribute P, and attribute value O, where S is the subject of P , O is the object of P. Schema refers to the relationship / property P, and the subject (S) type and object (O) type of the P. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the present invention discloses a method, device, and electronic equipment for determining a knowledge sample data set. The method includes: obtaining a preset number of subject-verb-object SPO triplet formats and source texts; , obtain n SPO entries corresponding to each SPO triplet format from the preset knowledge base; find m first texts matching n SPO entries in each source text, and generate the first knowledge sample data set; according to m k second texts conforming to the format of each SPO triplet are determined to generate a second knowledge sample data set; according to the first knowledge sample data set and the second knowledge sample data set, a target knowledge sample data set is generated. That is, this embodiment realizes the automatic generation of the knowledge sample data set, so that the generation speed is fast, the cost is low, and the data scale that can be produced is large, which can meet the training requirements.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of data processing, and in particular to a method, device and electronic equipment for determining a knowledge sample data set. Background technique [0002] A knowledge graph refers to a semantic network with entities and concepts as nodes and semantic relationships as edges. The knowledge graph makes knowledge acquisition more direct, so it can provide semantically related knowledge for reading, so as to realize the convenience, intelligence and humanization of reading. SPO triplet is a key component of knowledge graph. SPO triplet is entity relationship and entity attribute. From the perspective of knowledge graph construction, entity attribute can enrich entity information in knowledge graph, and entity relationship can enrich knowledge graph. edge relationship to improve the connectivity of the knowledge graph. [0003] When constructing a knowledge map, it is necessary to tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/36G06F40/295
CPCG06F40/295G06N5/022G06N20/00G06F40/289G06F40/30G06F16/335
Inventor 李双婕史亚冰梁海金张扬朱勇
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products