Text Data Stream Clustering Algorithm Based on Neighbor Propagation

A technology of data stream clustering and neighbor propagation, which is applied in electrical digital data processing, special data processing applications, and computing, etc. Problems such as local solution, a priori parameters—the average clustering dimension is difficult to determine, etc.

Active Publication Date: 2018-02-02
HEFEI UNIV OF TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] This algorithm also has the following disadvantages: the number of clusters needs to be determined in advance for each clustering, and the number of clusters cannot be changed as the category changes.
The algorithm can achieve better clustering results for spherical clusters, but it is difficult to cluster into clusters of arbitrary shapes.
There are also studies that propose an HPStream algorithm, which uses high-dimensional projection technology to select subspaces for clustering, and uses decay functions to represent evolution information, but the prior parameter—the average clustering dimension is difficult to determine
The above improvement studies have adapted to the problems of flow clustering to a certain extent, but the accuracy and robustness of the clustering results have not been well resolved, and further improvement is needed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text Data Stream Clustering Algorithm Based on Neighbor Propagation
  • Text Data Stream Clustering Algorithm Based on Neighbor Propagation
  • Text Data Stream Clustering Algorithm Based on Neighbor Propagation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In this embodiment, a text data flow clustering algorithm based on neighbor propagation——OWAP-s algorithm is carried out according to the following steps:

[0045] Step 1. Perform dimensionality reduction processing on the text data set to obtain the corresponding text vector set;

[0046] In order to cope with the high-dimensional and sparse characteristics of text data, the following dimensionality reduction method is adopted:

[0047] First, build a word index by building the entire document, and then convert the obtained to . Among them, index refers to the serial number of the word, and value refers to the value. Since the indexes of all documents are arranged from small to large, we search for the indexes in the vectors of the two documents in order when calculating the similarity. If the index values ​​​​of the two documents are equal, then the two documents are indexed accordingly. The values ​​are multiplied together and accumulated until the similarity betw...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text data stream clustering algorithm based on neighbor propagation, which is characterized in that the steps are as follows: 1. Perform dimensionality reduction processing on text data sets to obtain corresponding text vector sets; 2. Obtain cluster centers at all times , and complete the clustering algorithm. The present invention can improve the accuracy and robustness of the algorithm without specifying the number of clusters in advance, so as to meet the needs of solving practical problems.

Description

technical field [0001] The invention relates to a clustering algorithm of text data flow based on neighbor propagation. Background technique [0002] With the advent of the big data era, a large amount of unstructured data has been generated on the network. Faced with these unstructured data, which are generated in real time, have a huge amount of data, and have complex structures, people urgently need to extract valuable information and knowledge from them. Text data stream clustering technology is a common method for analyzing these unstructured data. It has achieved good application results in news filtering, topic detection and tracking (TDT), user feature recommendation, etc., and has quickly become a current research hotspot. Since text data has high-dimensional sparse features, how to improve the efficiency and accuracy of clustering algorithms is very important. In 2005, Shi Zhong proposed the OSKM algorithm, which is an extension of the k-means algorithm. It divid...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 倪丽萍李一鸣倪志伟伍章俊
Owner HEFEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products