Non-recursive clustering algorithm based on quicksort (NR-CAQS) suitable for large data

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of large-scale data and clustering methods, applied in database models, relational databases, electronic digital data processing, etc.

Inactive Publication Date: 2015-08-19

BEIJING UNIV OF TECH

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

One data segmentation is completed once the data to be processed is scanned once

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0022] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0023] In the following example, the data sequence D={d1, d2, d3, d4, d5, d6, d7, d8, d9}, there are four known clusters, namely C={C_1={d1, d3, d5 }, C_2={d2,d6}, C_3={d4,d9}, C_4={d7,d8}}, and the similarity between the data in the cluster is greater than or equal to 0.8, and the similarity between the data in the cluster is less than 0.8. In order to obtain correct clustering results, the similarity threshold input during the specific operation is set to 0.8. The steps to use the quicksort-based non-recursive clustering method on this data sequence are as follows:

[0024] Step 1: Input the user similarity threshold K=0.8 and the initial data sequence D to be processed containing 9 data samples;

[0025] Step 2: Define the indicator pointers of the head and tail of the data sequence to be processed as start and end respectively, and ass...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A non-recursive clustering algorithm based on quicksort (NR-CAQS) suitable for large data belongs to the technical field of data mining. The algorithm is characterized by using a two-layer circulation to realize data clustering, defining two positioning pointers in advance, randomly selecting one benchmark data to be viewed as representative data of a cluster from a data sequence, and exchanging to the rightmost side of the data to be processed, and simultaneously defining a scanning process pointer and initializing, scanning the data to be processed and calculating a similarity value of residual data and the benchmark data, and comparing with a user threshold, adjusting the position of the residual data in a sequence according to the comparison result, exchanging the data whose similarity value is more than the user threshold to the left side of the sequence, and exchanging the data whose similarity value is less than the user threshold to the right side of the sequence to finish data partitioning, finally resetting the positioning pointer, positioning new data to be processed and returning to a outer circulation to continuously execute until total data sequence clustering is finished. The algorithm is applied to cluster spherical data and a large data set which has high time requirements.

Description

technical field [0001] A fast clustering method suitable for large-scale data belongs to the research field of clustering in data mining. In particular, it relates to a clustering method suitable for a higher requirement on time. Background technique [0002] With the popularization of mobile computing technology and the rise of the Internet of Things, massive amounts of data are generated, especially multimedia data such as text, images, and videos. As stated in "IDC Predictions 2014", in 2014, the size of the "digital universe"—that is, all digital information created, copied, and consumed in a year—will continue to expand, reaching about 6ZB (6 trillion trillion bytes) by more than 50%. megabytes). Analyzing and mining these big data in a reasonable and acceptable time becomes the biggest challenge in the field of IT. Clustering or cluster analysis in the field of data mining is often used for data preprocessing, which is a common form of exploratory data analysis and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

CPCG06F16/285

Inventor冀俊忠高明霞宋辰刘金铎

OwnerBEIJING UNIV OF TECH

Non-recursive clustering algorithm based on quicksort (NR-CAQS) suitable for large data

What is AI technical title? AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document. A technology of large-scale data and clustering methods, applied in database models, relational databases, electronic digital data processing, etc.

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of large-scale data and clustering methods, applied in database models, relational databases, electronic digital data processing, etc.

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology