Text data stream clustering algorithm based on affinity propagation

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of data stream clustering and neighbor propagation, which is applied in electrical digital data processing, special data processing applications, and computing, etc. Problems such as local solution, a priori parameters—the average clustering dimension is difficult to determine, etc.

Active Publication Date: 2015-07-15

HEFEI UNIV OF TECH

View PDF5 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] This algorithm also has the following disadvantages: the number of clusters needs to be determined in advance for each clustering, and the number of clusters cannot be changed as the category changes.

The algorithm can achieve better clustering results for spherical clusters, but it is difficult to cluster into clusters of arbitrary shapes.

There are also studies that propose an HPStream algorithm, which uses high-dimensional projection technology to select subspaces for clustering, and uses decay functions to represent evolution information, but the prior parameter—the average clustering dimension is difficult to determine

The above improvement studies have adapted to the problems of flow clustering to a certain extent, but the accuracy and robustness of the clustering results have not been well resolved, and further improvement is needed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0044] In this embodiment, a text data flow clustering algorithm based on neighbor propagation——OWAP-s algorithm is carried out according to the following steps:

[0045] Step 1. Perform dimensionality reduction processing on the text data set to obtain the corresponding text vector set;

[0046] In order to cope with the high-dimensional and sparse characteristics of text data, the following dimensionality reduction method is adopted:

[0047] First, build a word index by building the entire document, and then convert the obtained to . Among them, index refers to the serial number of the word, and value refers to the value. Since the indexes of all documents are arranged from small to large, we search for the indexes in the vectors of the two documents in order when calculating the similarity. If the index values of the two documents are equal, then the two documents are indexed accordingly. The values are multiplied together and accumulated until the similarity betw...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a text data stream clustering algorithm based on affinity propagation. The text data stream clustering algorithm is characterized by including the following steps: 1, carrying out dimension reduction processing on a text data set to obtain a corresponding text vector set; 2, obtaining clustering centers of all moments, and completing the clustering algorithm. By means of the text data stream clustering algorithm, the accuracy and the robustness of the algorithm can be improved without assigning the number of clusters in advance, and therefore the requirements for solving practical problems are met.

Description

technical field [0001] The invention relates to a clustering algorithm of text data flow based on neighbor propagation. Background technique [0002] With the advent of the big data era, a large amount of unstructured data has been generated on the network. Faced with these unstructured data, which are generated in real time, have a huge amount of data, and have complex structures, people urgently need to extract valuable information and knowledge from them. Text data stream clustering technology is a common method for analyzing these unstructured data. It has achieved good application results in news filtering, topic detection and tracking (TDT), user feature recommendation, etc., and has quickly become a current research hotspot. Since text data has high-dimensional sparse features, how to improve the efficiency and accuracy of clustering algorithms is very important. In 2005, Shi Zhong proposed the OSKM algorithm, which is an extension of the k-means algorithm. It divid...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

Inventor倪丽萍李一鸣倪志伟伍章俊

OwnerHEFEI UNIV OF TECH

Text data stream clustering algorithm based on affinity propagation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology