An optimization method of entity alignment based on graph partition

An optimization method and entity pairing technology, applied in the database field, can solve the problems of high computational cost and the lack of generality of entity alignment methods, and achieve the effect of improving accuracy

Active Publication Date: 2019-02-19
ZHEJIANG UNIV
View PDF7 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, there are three problems in the research of traditional entity alignment methods: (1) When only two data sources are used for entity matching, if all entity pairs are directly traversed, the computational complexity is proportional to the square of the data source size, and the computational cost is too high
On the Internet, entity data is gener

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An optimization method of entity alignment based on graph partition
  • An optimization method of entity alignment based on graph partition
  • An optimization method of entity alignment based on graph partition

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0057] The technical scheme of the present invention will be further described in combination with specific implementation and schematic diagrams.

[0058] Such as figure 1 The embodiments of the present invention and the specific implementation process are as follows:

[0059] Step 1: Analyze and extract data of document types such as original web pages from the Internet, and use existing tools, such as Scrapy, to convert them into entities with a unified data structure. A single page or document is mapped to an entity. The main information of the entity includes name, unique code (ID), attributes and context information, and the context information of all entities constitutes a context corpus.

[0060] The preprocessing process before entity matching. Traverse the attribute information of the entity, count the weights of different attributes, traverse the context information of the entity, segment its context, and count the word frequency distribution of the entire corpus and oth...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an optimization method of entity alignment based on graph partition . Candidate entity pairs are mined from all entities by composite index, and whether the candidate entity pairs are aligned or not is judged by entity similarity measurement method, and then an optimization algorithm based on graph partition is proposed to improve the accuracy of equivalent entity alignmentby utilizing the similarity relationship between entities. The method of the invention solves the entity alignment problem of the large-scale Internet data, and can accurately and completely mine theentity set equivalent to each other in the original data.

Description

technical field [0001] The invention relates to an entity processing method in the database field, in particular to an entity alignment optimization method based on graph division. [0002] It involves inverted index and locality-sensitive hashing method in the field of database, TF-IDF model and Doc2Vec model in the field of machine learning, community partition algorithm in the field of social network, and entity alignment method in the field of semantic network. Background technique [0003] At present, Internet resources containing a large amount of information and knowledge have emerged on the Internet, such as Baidu Encyclopedia and Hudong Baike. There are natural data barriers between these different data sources, making it difficult to associate and interact with these data. However, if only a single data source is used to describe objects in the real world, there will be problems such as low object coverage and incomplete information description. Entity alignment ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/31G06F16/95
Inventor 陈珂寿黎但王凌阳陈刚江大伟伍赛胡天磊
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products