Knowledge base entity normalizing method, system, terminal and computer readable storage medium

A knowledge base and entity technology, applied in the field of database construction, can solve problems such as large differences in data forms, classification schemes that cannot solve the problem of normalization, complex and difficult knowledge base construction, etc., to reduce the amount of calculation, break through the limit of calculation scale, and reduce the number Effect

Active Publication Date: 2018-06-12
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF14 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Knowledge base construction is a very complex and difficult technical problem, so existing methods generally only deal with small-scale single vertical knowledge bases (millions to tens of millions of entities)
However, when facing a large-scale knowledge base (billion-level entities), it is impossible to efficiently solve the problem of normalizing large-scale entities.
On the other hand, due to the large differences in the shape of entity data, a single classification scheme cannot solve all normalization problems, and cannot uniformly and efficiently support various attributes, categories, and problem scenarios. Therefore, the existing method is to specialize knowledge base entities Processing, directly filter out the entities with thin attribute information and do not process them, and also do related processing on the quality of entity information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Knowledge base entity normalizing method, system, terminal and computer readable storage medium
  • Knowledge base entity normalizing method, system, terminal and computer readable storage medium
  • Knowledge base entity normalizing method, system, terminal and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0060] The embodiment of the present invention provides a knowledge base entity normalization method, such as figure 1 As shown, the method mainly includes the following steps:

[0061] Step S100: Obtain the entity set in the knowledge base.

[0062] Wherein, the knowledge base may be a knowledge base with a scale of millions, tens of millions, or hundreds of millions. The above-mentioned knowledge bases of various scales can be Chinese knowledge graphs, single-category or multi-category hybrid knowledge bases.

[0063] Step S200: Pre-partitioning the entity set by combining multiple partitioning methods.

[0064] It should be noted that multiple partitioning methods refer to two or more partitioning methods. Pre-partitioning is to divide the entity collection into multiple groups (or multiple zones), and the entity collection in each group is several entities that are suspected to be the same. The combination of multiple partitioning methods can be understood as each part...

Embodiment 2

[0119] The embodiment of the present invention provides a knowledge base entity normalization system, such as Figure 4 shown, including:

[0120] Obtaining module 10, used for obtaining the entity set in the knowledge base;

[0121] The multi-dimensional partition module 20 is used to pre-partition the entity set by combining multiple partition methods;

[0122] Sample construction module 30, for carrying out sample construction according to the result of pre-partitioning, extracting key samples;

[0123] Feature construction module 40, is used for carrying out feature construction according to the result of pre-partition, extracts similar feature;

[0124] The normalization determination module 50 is used to combine key samples and similar features through at least one normalization model, and perform a normalization determination on each entity pair in the pre-partitioned result, and determine whether each entity pair is the same entity;

[0125] A set division module 60...

Embodiment 3

[0137] The embodiment of the present invention provides a knowledge base entity normalization terminal, such as Figure 5 shown, including:

[0138] A memory 400 and a processor 500 , the memory 400 stores computer programs that can run on the processor 500 . When the processor 500 executes the computer program, the knowledge base entity normalization method in the foregoing embodiments is implemented. The number of memory 400 and processor 500 may be one or more.

[0139] The communication interface 600 is used for the memory 400 and the processor 500 to communicate with the outside.

[0140] The memory 400 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.

[0141] If the memory 400, the processor 500, and the communication interface 600 are implemented independently, the memory 400, the processor 500, and the communication interface 600 may be connected to each other through a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a knowledge base entity normalizing method, a system, a terminal and a computer readable storage medium. The method comprises the following steps: to obtain an entity set in a knowledge base; pre-partitioning the entity set by using multiple partitioning modes; according to a pre-partitioning result, constructing a sample; according to the pre-partitioning result, constructing features; through at least one normalizing model, performing normalized decision on each entity; and performing set partitioning on a normalized decision result. The system comprises an acquisitionmodule for getting the entity set in the knowledge base; a multi-dimensional partitioning module for pre-partitioning the entity set; a sample constructing module for constructing the sample according to the pre-partitioning result; a feature constructing module for constructing the features according to the pre-partitioning result; a normalized decision module for performing normalized decisionon each entity in the pre-partitioning result; and a set partitioning module for performing set partitioning on the normalized decision result. The method is capable of solving an entity normalizing problem of the large-scale knowledge base.

Description

technical field [0001] The present invention relates to the technical field of database construction, in particular to a knowledge base-based large-scale open domain entity normalization method, system, terminal and computer-readable storage medium. Background technique [0002] Knowledge base construction is a very complex and difficult technical problem, so existing methods generally only deal with small-scale single vertical knowledge bases (millions to tens of millions of entities). However, when facing a large-scale knowledge base (100 million-level entities), it cannot efficiently solve the problem of normalizing large-scale entities. On the other hand, due to the large differences in the shape of entity data, a single classification scheme cannot solve all normalization problems, and cannot uniformly and efficiently support various attributes, categories, and problem scenarios. Therefore, the existing method is to specialize knowledge base entities Processing, direct...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N5/02
CPCG06N5/022G06F18/25G06F18/241G06N3/08G06N3/044G06N3/045G06F18/24G06N20/00G06N5/025
Inventor 冯知凡陆超徐也方舟朱勇李莹
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products