Probability clustering method of cross-categorical data based on key word

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A clustering method and entry technology, applied in the field of probabilistic clustering of cross-type data, can solve the problem of not considering the uncertainty of the clustering process, etc.

Inactive Publication Date: 2009-04-15

NORTHEASTERN UNIV

View PDF0 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The existing data clustering methods do not take into account the uncertainty in the clustering process (uncertainty)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0080] An embodiment of the invention:

[0081] (1) Define the type of subject entry and rank the entries by weight

[0082] assuming d 1 and d 2 are two data in the data space, T(d 1 ) and T(d 2 ) respectively represent the entry items contained in each data, where T(d 1 ) = {data, index, search, precision, meeting, clustering, lookup, similarity, summary, contains, version}, T(d 2 ) = {data, search, accuracy, session, image, measure, indeterminate}. T(d 1 ) and T(d 2 ) Each entry in ) is given a weight value, and is sorted from high to low according to the weight value, such as Figure 7 (a) and (b) shown.

[0083] (2) Representing data subjects with probabilities

[0084] in d 1 Among them, "data", "index", "search" and "accuracy" are taken as topic-related entries, "meeting" and "clustering" are topic-related semi-related entries, and the rest are topic-irrelevant entries. The weights of "meeting" and "clustering" are 4 and 3 respectively, and d 1The maximum w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A probabilistic clustering method of trans-type data based on keyword entries belongs to the database field and comprises the following steps: (1) defining the type of the keyword entry; and dividing the trans-type data into a keyword correlation entry, a keyword half-correlation entry and a keyword non-correlation entry; (2) allocating probability for each entry; (3) expressing data keywords by the probability; (4) constructing a data keyword entry probabilistic similarity matrix M; for any two data of the trans-type data dx and dy in the step (3), computing similarity of any two descriptive forms of the dx and the dy, summing the probabilities of the similarity which is greater than a certain threshold, and storing the direct correlation probabilities of the any two data in the matrix M; (5) constructing a clustering model M<c> based on the matrix M; and (6) obtaining the clustering method based on the clustering model M<c>. The method clusters the trans-type data by utilizing the similarity of the entry related to the keywords, which improves the data clustering precision and reduces the clustering time.

Description

technical field [0001] The invention belongs to the field of databases, in particular to a method for probabilistic clustering of cross-type data based on subject entries. Background technique [0002] Over the past few decades, traditional relational database management systems have played a very important role. However, with the continuous development of computer application technology, especially Web information technology, today's data presents the two characteristics of "massive" and "data everywhere", and the data features are complex. Therefore, a certain traditional database management system can no longer meet the needs of such a database management, and much of today's data or information is not stored in the database management system at all, as Serge Atiteboul et al. published in ACM Communication (Volume 48, No. 5) and Homman pointed out in the DASFAA2007 conference report, currently only about 20% of the data or information is stored in the database. This mea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

Inventor王国仁于亚新王波涛丁国辉王斌赵相国赵宇海信俊昌乔百友韩东红张恩德李淼

OwnerNORTHEASTERN UNIV

Probability clustering method of cross-categorical data based on key word

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements:Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology