Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

An optimization method of entity alignment based on graph partition

An optimization method and entity pairing technology, applied in the database field, can solve the problems of high computational cost and the lack of generality of entity alignment methods, and achieve the effect of improving accuracy

Active Publication Date: 2019-02-19
ZHEJIANG UNIV
View PDF7 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, there are three problems in the research of traditional entity alignment methods: (1) When only two data sources are used for entity matching, if all entity pairs are directly traversed, the computational complexity is proportional to the square of the data source size, and the computational cost is too high
On the Internet, entity data is generally represented in the form of a single page or document, and the current entity alignment method is not universal.
(3) In the case of multiple data sources, most of the current entity alignment methods transform it into the entity alignment problem of multiple pairs of data sources, without analyzing and calculating from the perspective of multiple data sources.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An optimization method of entity alignment based on graph partition
  • An optimization method of entity alignment based on graph partition
  • An optimization method of entity alignment based on graph partition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] The technical scheme of the present invention will now be further described in conjunction with specific implementation and schematic diagrams.

[0058] Such as figure 1 , an embodiment of the present invention and its specific implementation process are as follows:

[0059] Step 1: Analyze and extract the document type data originally from the Internet, such as web pages, and use existing tools, such as Scrapy, to convert it into an entity with a unified data structure. Map a single page or document into an entity. The main information of the entity includes name, unique code (ID), attributes and context information, and the context information of all entities constitutes a context corpus.

[0060] The preprocessing process before entity matching. Traverse the attribute information of the entity, count the weight of different attributes, traverse the context information of the entity, segment the context, and count the word frequency distribution of the entire corpu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an optimization method of entity alignment based on graph partition . Candidate entity pairs are mined from all entities by composite index, and whether the candidate entity pairs are aligned or not is judged by entity similarity measurement method, and then an optimization algorithm based on graph partition is proposed to improve the accuracy of equivalent entity alignmentby utilizing the similarity relationship between entities. The method of the invention solves the entity alignment problem of the large-scale Internet data, and can accurately and completely mine theentity set equivalent to each other in the original data.

Description

technical field [0001] The invention relates to an entity processing method in the database field, in particular to an entity alignment optimization method based on graph division. [0002] It involves inverted index and locality-sensitive hashing method in the field of database, TF-IDF model and Doc2Vec model in the field of machine learning, community partition algorithm in the field of social network, and entity alignment method in the field of semantic network. Background technique [0003] At present, Internet resources containing a large amount of information and knowledge have emerged on the Internet, such as Baidu Encyclopedia and Hudong Baike. There are natural data barriers between these different data sources, making it difficult to associate and interact with these data. However, if only a single data source is used to describe objects in the real world, there will be problems such as low object coverage and incomplete information description. Entity alignment ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/31G06F16/95
Inventor 陈珂寿黎但王凌阳陈刚江大伟伍赛胡天磊
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products