Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Large-scale heterogeneous data oriented co-clustering method

A technology of heterogeneous data and clustering methods, applied in text database clustering/classification, structured data retrieval, unstructured text data retrieval, etc., can solve high time complexity, unbalanced, abnormally sparse relational data, etc. problem, to achieve fast joint clustering, improve accuracy, and reduce sparsity

Active Publication Date: 2015-05-20
HARBIN ENG UNIV
View PDF2 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] (1) Unbalanced problem: When the scale of heterogeneous data to be analyzed increases, the scale of different types of entities in the heterogeneous data does not show a uniform growth pattern
The time complexity of the traditional non-negative matrix factorization method is related to the row and column scale of the matrix, so the computational time complexity is high when dealing with large-scale data
[0005] (2) Sparsity problem: relational data in a real heterogeneous network is relatively sparse, and as the scale of heterogeneous data to be analyzed further increases, relational data becomes extremely sparse
The traditional non-negative matrix factorization method does not work well for the abnormally sparse relationship matrix

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale heterogeneous data oriented co-clustering method
  • Large-scale heterogeneous data oriented co-clustering method
  • Large-scale heterogeneous data oriented co-clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0042] When large-scale heterogeneous data is jointly clustered, the scale growth of different types of entities is unbalanced, and the heterogeneous relational data also becomes extremely sparse, resulting in imbalance and sparse problems. In view of the above two problems, the present invention proposes a heterogeneous relationship matrix joint clustering method based on the correlation matrix, and its overall schematic diagram is as follows figure 1 shown. It transforms the traditional non-negative matrix factorization problem into a two-stage factorization problem. Firstly, the association relationship corresponding to a class of entities with a smaller scale is extracted to construct an association matrix, and the partition indicator matrix is ​​obtained through symmetric non-negative matrix decomposition. Compared with the original relationship matrix...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a large-scale heterogeneous data oriented co-clustering method. The method comprises the following steps that entities and a heterogeneous relation between the entities are extracted from the heterogeneous data to obtain a heterogeneous relation matrix; the entity X2 of the small scale is selected from two corresponding entities in the heterogeneous relation matrix, and an incidence matrix is set according to an incidence relation of the entity X2; a symmetric matrix sparse decomposition method is adopted to decompose the incidence matrix to obtain a clustering instruction matrix B corresponding to the entity X2; the matrix B is used as an input, tri-decomposition is carried out on heterogeneous relation moment R to obtain a clustering instruction matrix corresponding to an entity X1, and entity type division is achieved through the clustering instruction matrix corresponding to the entity X1 and the clustering instruction matrix corresponding to the entity X2. According to the method, sparsity of the matrixes can be reduced, and the accuracy of the co-clustering method is improved.

Description

technical field [0001] The invention belongs to the field of Internet information mining, and in particular relates to a joint clustering method for large-scale heterogeneous data, which can reduce the sparsity of large-scale heterogeneous data. Background technique [0002] With the rise of heterogeneous information networks such as Weibo and social networks, heterogeneous information mining has become a research hotspot in the field of data mining. A heterogeneous network contains many types of entities, and there are complex interactions among entities. For example, Weibo contains entities such as users, messages, tags, words, etc. When a user publishes a message, the message is composed of words, and the message also includes tags. By extracting the relationship data between entities and performing joint clustering analysis, the potential structural relationship between different entities in a heterogeneous network can be mined. [0003] Non-negative matrix factorizati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/212G06F16/256G06F16/27G06F16/285G06F16/35G06F16/951
Inventor 杨武申国伟王巍苘大鹏玄世昌
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products