Generic similarity calculation method and system based on heterogeneous information network

A heterogeneous information network and similarity calculation technology, which is applied in the fields of information technology and the Internet, can solve the problems of lack of calculation methods, low accuracy of results, and low calculation efficiency, so as to achieve high freedom of user choice, solve information overload, and improve Effect of Computational Accuracy

Inactive Publication Date: 2015-08-19
NORTHEAST NORMAL UNIVERSITY
View PDF3 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] At present, some traditional calculation methods for calculating the similarity between entities in the real world are usually only for specific data, the method is simple and fixed, and cannot well reflect the rich relationship between entities in the real world, lacking a general calculation method and The framework usually uses some simple similarity calculation methods, and only co

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Generic similarity calculation method and system based on heterogeneous information network
  • Generic similarity calculation method and system based on heterogeneous information network
  • Generic similarity calculation method and system based on heterogeneous information network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] Example 1, such as figure 1 As shown, a general similarity calculation method based on a heterogeneous information network in the embodiment of the present invention includes:

[0051] Step 1, preprocessing the input data set to ensure the validity of the input data;

[0052] The following is an example of the movie recommendation dataset MovieLens100k provided by the University of Minnesota. The specific implementation is as follows: In the case of only using movie recommendations, a large amount of user data is redundant and needs to be removed. In this data set, there is a lack of information such as movie actors (Actor), director (Director), and the data set provides a link from the movie to the Internet movie data set IMDb. Combine the data set MovieLens 100K and the data set IMDb to obtain effective data.

[0053] Step 2, perform metadata extraction, extract the description information of the input data, and store the description information in the metadata datab...

Embodiment 2

[0076] Example 2, such as Figure 4 As shown, the present invention also provides a general-purpose similarity calculation system based on heterogeneous information networks, including:

[0077] The processing module preprocesses the input data set to ensure the validity of the input data;

[0078] The extraction module extracts metadata, extracts the description information of the input data, and stores the description information in the metadata database, where the description information includes the global information of the overall situation of the input data set, the local information of each record, and the attributes of the data Conversion and corresponding information between identifiers and internal representations;

[0079] In the modeling module, the user selects the entities and data attributes involved in the similarity calculation, queries the corresponding metadata, displays the data type and value range of each metadata, and prompts the user to select the met...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a generic similarity calculation method based on a heterogeneous information network, comprising steps of: step 1, preprocessing an input data set, to ensure validity of input data; step 2, performing metadata extraction: extracting description information of the input data, and storing the description information in a metadata library; step 3, establishing a heterogeneous information network mode via user interaction, and storing the network mode; step 4, performing similarity calculation by using a metapath-based similarity calculation method of the heterogeneous information network; step 5, performing post similarity processing, to form overall similarity as final output. The generic similarity calculation method and system based on a heterogeneous information network have the beneficial effects: modeling is performed by using the heterogeneous information network, and a generic similarity calculation method is proposed; different types of data sets can be processed; various demands of similarity calculation can be met; a user can designate various calculation methods and post-result processing manners; the degree of freedom of selection is high; the calculation accuracy and efficiency are improved; and the problem of information overload is better solved.

Description

technical field [0001] The present invention relates to the fields of information technology and Internet technology, in particular, to a general heterogeneous information network-based similarity calculation method and system. Background technique [0002] With the development of information technology and the Internet, people have gradually entered the era of information overload from the early data scarcity. Especially in the current era of big data, how to solve the problem of information overload and extract valuable information from massive data is a key issue that people urgently need to solve. Whether in various information retrieval systems or in various personalized recommendation systems and applications, information similarity calculation is a key technology in these systems and applications, and usually plays a decisive role in the processing accuracy of related systems and applications. [0003] Heterogeneous information network is a relatively new research fi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/288
Inventor 张邦佐汤树林尹宗铭徐桂萍蔡永健徐坤
Owner NORTHEAST NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products