Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

34 results about "Data stream clustering" patented technology

In computer science, data stream clustering is defined as the clustering of data that arrive continuously such as telephone records, multimedia data, financial transactions etc. Data stream clustering is usually studied as a streaming algorithm and the objective is, given a sequence of points, to construct a good clustering of the stream, using a small amount of memory and time.

Methods and apparatus for data stream clustering for abnormality monitoring

Techniques for monitoring abnormalities in a data stream are provided. A plurality of objects are received from the data stream and one or more clusters are created from these objects. At least a portion of the one or more clusters have statistical data of the respective cluster. It is determined from the statistical data whether one or more abnormalities exist in the data stream.
Owner:IBM CORP

Distributed data stream clustering method and system

The invention discloses distributed data stream clustering method and system and overcomes the defect that the existing most data steam clustering algorithms are unable to run in the distributed cloud environment, unable to easily extend and low in operational time efficiency. The method includes: summarizing data streams to obtain a plurality of eigenvectors of the data streams; performing locality-sensitive hashing algorithm to obtain a plurality of clusters with each comprising at least one eigenvector, and selecting at least one cluster as a candidate cluster; periodically using the candidate cluster to cluster eigenvectors of newly arrived data streams. The real-time performance better than that of the prior art is guaranteed by the use of the method and system based on the locality-sensitive hashing algorithm.
Owner:CHINA INFORMATION TECH SECURITY EVALUATION CENT +1

System and method for recommending hot spot area in real time

The invention provides a system and a method for recommending a hot spot area in real time. The system for recommending the hot spot area in real time comprises a server and user equipment, wherein the server comprises a GPS (global positioning system) information extraction module, a real-time data stream clustering module, a hot event mining module, a hot event library and a hot spot regional information integration module, wherein the GPS information extraction module is used for extracting GPS information from the user equipment and / or a picture sharing website; the real-time data stream clustering module is used for receiving the extracted GPS information from the GPS information extraction module and performing real-time data stream clustering on the GPS information, so that a clustering center taken as the hot spot area is obtained; the hot event mining module is used for mining a hot discussed event through an information resource sharing platform and reserving a hot event having territoriality; the hot event library is used for storing the reserved hot event having the territoriality; and the hot spot regional information integration module is used for integrating obtained hot event information and hot scenic spot information and providing the integrated hot spot area information for the user equipment.
Owner:SAMSUNG ELECTRONICS CHINA R&D CENT +1

Torjan detection method based on uncontrolled end flow analysis

The invention discloses a torjan detection method based on uncontrolled end flow analysis. The method includes the steps that firstly, a captured network data package is processed; secondly, the network data package is organized into data flows according to quintuple information and requirements of protocol specifications; then, the data flows are classified according to equivalent tetrads to form data flow sets identified by the tetrads; finally the data flows in the data flow sets are clustered to form data flow clusters by the adoption of a data flow clustering algorithm based on timestamps. According to the torjan detection method based on the uncontrolled end flow analysis, on the basis of carrying out clustering on the network data flows to form the data flow clusters, the data flows are processed with the data flow cluster as a unit to analyze the difference between torjan communication behaviors and normal network communication behaviors, in addition, the difference between the torjan communication behaviors and the normal network communication behaviors are deeply dug in combination with the technologies of statistic analysis and data mining, and therefore uncontrolled end torjan flow in a network can be detected.
Owner:PLA STRATEGIC SUPPORT FORCE INFORMATION ENG UNIV PLA SSF IEU

Mixed attribute data flow clustering method for automatically determining clustering center based on density

The invention discloses a mixed attribute data flow clustering method for automatically determining a clustering center based on density, and the method comprises the following steps: 1) initialization: carrying out the clustering of initial Ninit data objects in a data flow through a New-FSFDP algorithm, generating initial intensive micro-clusters, so as to initialize the whole on-line process and enable the mean radius of all generated initial intensive micro-clusters to serve as an initial epsilon; 2) on-line maintenance; 3) off-line clustering. The method is higher in precision, and is good in processing capability of off-group points.
Owner:ZHEJIANG UNIV OF TECH

Trojan horse communication feature fast extraction method based on clustering analysis of multiple data streams

The invention discloses a Trojan horse communication feature fast extraction method based on network data stream clustering. The method comprises the steps that firstly, a captured network data packet is sorted according to a network conversation, wherein an IP address and a port of a monitoring object serve as a source IP address and a source port, and the data packet is subjected to conversation division according to equivalent tetrads; secondly, data streams are clustered into data stream clusters through a data stream clustering algorithm based on timestamps; lastly, Trojan horse communication features are extracted, wherein the Trojan horse communication features are extracted at the Trojan horse interactive operation stage. According to the Trojan horse communication feature fast extraction method, on the basis of network data stream clustering, the network data streams are processed with clusters as units, the difference between a Trojan horse communication behavior and a normal network communication behavior is analyzed, the difference between the two behaviors is dug deeply and the network communication features are extracted in combination with traditional statistic analysis, correlation analysis and other technologies, the false alarm rate is lowered while the detection rate is guaranteed, and the Trojan horse communication feature fast extraction method can be used for detecting a secret stealing behavior in a network.
Owner:PLA STRATEGIC SUPPORT FORCE INFORMATION ENG UNIV PLA SSF IEU

Method for fusion reconstruction and interaction of intelligent power distribution network big data

ActiveCN104616210AImprove control intelligence levelEnhanced self-healing control functionData processing applicationsInformation technology support systemSelf-healingData stream
The invention discloses a method for fusion reconstruction and interaction of intelligent power distribution network big data. The method is characterized by comprising the steps that (1) division of initial clusters is conducted on the intelligent power distribution network big data, an association rule is established according to the operating state of an intelligent power distribution network, and division of extended clusters is achieved; (2) current data are allocated to the initial clusters based on historical data, the operating state is predicted on the basis of the association rule, so that a self-healing control strategy is determined, and panoramic risk management and control and self-healing control are conducted. According to the method, division of the initial clusters is conducted according to the grid density, division of the extended clusters is conducted through establishment of the association rule, fusion reconstruction and interaction of the intelligent power distribution network big data are achieved, cluster efficiency and cluster accuracy of data streams are effectively improved, and the method has good extensibility and achieves system integration between panoramic risk management and control and self-healing control of the intelligent power distribution network. When the method for fusion reconstruction and interaction of the intelligent power distribution network big data is applied, the intelligent control level of the power distribution network can be improved, and the self-healing control function of the power distribution network is enhanced.
Owner:HOHAI UNIV CHANGZHOU

Text data stream clustering algorithm based on affinity propagation

The invention discloses a text data stream clustering algorithm based on affinity propagation. The text data stream clustering algorithm is characterized by including the following steps: 1, carrying out dimension reduction processing on a text data set to obtain a corresponding text vector set; 2, obtaining clustering centers of all moments, and completing the clustering algorithm. By means of the text data stream clustering algorithm, the accuracy and the robustness of the algorithm can be improved without assigning the number of clusters in advance, and therefore the requirements for solving practical problems are met.
Owner:HEFEI UNIV OF TECH

Cancer subtype precise discovery and evolution analysis method based on data stream clustering

The invention provides a cancer subtype precise discovery and evolution analysis method based on data stream clustering. The method comprises the following steps of (a) initialization of gene expression data stream; (b) online real-time clustering of the gene expression data stream: putting each reachable data point into a corresponding grid cell; performing online grid maintenance; and when the specific time node is reached, deleting a sparse grid according to the grid density information; (c) offline precise clustering of the gene expression data stream: regarding the grid as a virtual data point with the density information; clustering the virtual data point by using a clustering method based on density-distance distribution; performing fast clustering division on other data points according to the density information of the determined clustering center points; and finally outputting a clustering result; and (d) class cluster evolution migration analysis. The invention provides the cancer subtype precise discovery and evolution analysis method based on data stream clustering with high precision.
Owner:ZHEJIANG UNIV OF TECH

A DDoS attack detection method based on an intelligent bee colony algorithm

The invention provides a DDoS attack detection method based on an intelligent bee colony algorithm. Through fusion of a clustering algorithm and an intelligent bee colony algorithm, the DDoS attack detection accuracy is effectively improved. The fusion of the intelligent bee colony algorithm and the clustering algorithm can eliminate the defect that the clustering algorithm excessively relies on an original clustering center and thus improve the data flow clustering effect. The IP addresses of exceptional data flows clustered after improvement are statistically analyzed and the flow characteristic entropy H (x) of the IP addresses are calculated; if H (x) is greater than and equal to a discriminating factor RM (x) of a primary clustering data flow, it is judged that the data flows are DDoSattack data flows; otherwise, it is determined that the data flows are other exceptional data flows. The method has the advantages of less time consumption, high DDoS attack detection accuracy and low false alarm rate.
Owner:SHANGHAI MARITIME UNIVERSITY

Data flow clustering method based on density and extension network

InactiveCN107273532ASolve manual setting of clustering parametersSolve the problem of improper selection of initial centroidCharacter and pattern recognitionSpecial data processing applicationsCluster algorithmGrid density
The invention relates to a data flow clustering method based on density and an extension network. A Spark parallel computing platform is used for analyzing and improving a traditional data flow clustering algorithm, the data flow clustering algorithm based on density and the extension grid is provided, so that defects of the method for manually setting a clustering parameter are improved, and clusters in any shape can be acquired. The algorithm comprises the basic steps as follows: 1, local density of each sampling point and distances with the other sampling points are used for determining the quantity of cluster centers in a grid, the cluster centers are automatically determined, and influence on a clustering result due to improper selection of an initial centroid is avoided; 2, data points outside the grid are clustered by expanding the network, so that clusters in the grid are expanded, and clustering accuracy is ensured; 3, adjacent density estimation and grid boundary are introduced for combining the grids, so that memory consumption is saved; and 4, an attenuation factor is used for updating the grid density in real time, and reflecting an evolution process of a space dataflow.
Owner:UNIV OF JINAN

Real-time knowledge discovery method and system for coal-fired boiler process object

The invention provides a real-time knowledge discovery method and system for a coal-fired boiler process object, and the method comprises the steps: carrying out the time sequence adjustment of the collected production state parameter data of a boiler, and obtaining correct time sequence data; adopting a data flow clustering method based on a sliding window, storing a clustering center result of each time, comparing clustering results of the last time each time, and if the difference value of every two adjacent clustering results is within a set range, conducting no operation and enabling thenext data flow to continue to be waited; otherwise, modifying and updating the change trend mathematical formula to be suitable for the latest production state; continuously carrying out a subsequentknowledge discovery process to obtain a new formula, carrying out association chain mining through an association rule algorithm to obtain a latest influence relationship and a change rule of each production parameter, and generating an association chain among the parameters; and finally, performing modeling prediction on the data through a flexible neural tree, and outputting a new change trend mathematical formula of the data, thereby assisting in adjusting production process parameters.
Owner:UNIV OF JINAN

Real-time clustering method for evolution data stream

The invention provides an online clustering method for an evolution data stream. According to the technical scheme, the method comprises the following steps that 1, a valid-type set, a vanishing-typeset and a separation-point set are established; 2, a to-be-processed point obtained at the current moment is classified to a certain set; 3, the separation-point set, the valid-type set and the vanishing-type set are updated. According to the online clustering method, for three kinds of typical evolution modes, namely emergence, vanishing and re-emergence of a type, in the evolution data stream, detection functions are designed respectively, integrated and unified, the stability of the clustering method for the data stream is improved, and the application range of the clustering method for thedata stream is enlarged.
Owner:NAT UNIV OF DEFENSE TECH

Data flow clustering method integrating cluster existence strength

The invention relates to the technical field of the web, and discloses a data flow clustering method integrating cluster existence strength. The method includes the following specific steps of conducting preprocessing, wherein information of a specific user is preprocessed and stored to a user attribute database; conducting user clustering, wherein skill-oriented clustering is conducted on user attributes; forming association rules, wherein association rules based on user attribute data are formed; conducting drift detection, wherein the association rules are detected in real time so that effectiveness of the association rules can be ensured. The data flow clustering method has the advantages that the influences of the cluster existence strength on clustering are fully utilized, and the uncertain data flow clustering method can integrate three factors, namely, the distance, the cluster existence probability and the cluster existence strength, indeed.
Owner:ZHEJIANG GONGSHANG UNIVERSITY

Time sequence data stream clustering method based on wavelet attenuation synopsis tree

The invention discloses a time sequence data stream clustering method based on a wavelet attenuation synopsis tree. Firstly, time sequence attenuation characteristics are introduced into a wavelet synopsis structure, a multi-dimensional time sequence tree-like attenuation synopsis construction method based on wavelet transformation is provided, a good approximation of an original sequence can be reconstructed by reserving r<n most important wavelet coefficients, the influence of 'dimension disasters' is relieved, the synopsis structure is constructed, and a time sequence is approximately represented. On the basis, similar characteristics of data stream are rapidly extracted based on the synopsis structure, the approximate distance between the data stream and a clustering center is calculated, and the K-means clustering method is suitable for the multi-dimensional time sequence data stream and solves the problem that a traditional clustering method cannot be directly applied to the data stream with infinite length, evolution with time and large data volume.
Owner:ZHEJIANG GONGSHANG UNIVERSITY

Data flow clustering method and device based on density peak value

The invention discloses a data flow clustering method and device based on density peak values. A density peak value and a fuzzy clustering method are taken as the basis; the concept of suspected outliers is proposed for the first time; a width adaptive sampling window model and a space-time attenuation mechanism are used as main innovation points; a new data stream clustering method and device, namely a density peak value-based data stream clustering method and device, are innovatively provided by taking improvement of the efficiency of an algorithm for data stream clustering as a main targetand starting point, and a more efficient data stream clustering effect is obtained on the premise of ensuring considerable clustering precision.
Owner:UNIV OF JINAN

Spatio-temporal data stream clustering method based on data field

The invention discloses a spatial-temporal data stream clustering method based on a data field, and the method comprises the steps: firstly dividing a research region according to grids, distributingnew data to corresponding grid units according to the attribute values of the new data when the new data arrives, and adding the new data into a cache list of grids; attenuating the historical qualityof the grids every other calculation interval, calculating the new quality of the grids through a data field method, then updating the grid potential value and the data field parametersand finally dynamically adjusting a clustering result according to the change condition of the grid state. A data field theory is introduced into data stream clustering. The defect that the traditional data streamclustering algorithm is difficult to sense the correlation between the data is improved, and the method can perform effective dynamic clustering on the spatio-temporal data streams, so that the methodis applied to spatio-temporal data mining scenes such as urban hotspot dynamic detection and the like.
Owner:WUHAN UNIV

SDN stream clustering method based on gaussian mixture

The invention relates to an SDN stream clustering method based on gaussian mixture. According to the method, a basic gaussian mixture model algorithm is improved, side information of streams is introduced, and a gaussian mixture model based on side information and constrained by an equivalent set is constructed, so that a clustering effect is improved, and the SDN stream clustering method is applied to SDN data stream clustering. By adopting the method, the accuracy of a clustering result is improved greatly and a clustering speed is increased greatly.
Owner:FUZHOU UNIV

A Big Data Fusion Reconstruction and Interaction Method for Smart Distribution Network

ActiveCN104616210BImprove control intelligence levelEnhanced self-healing control functionData processing applicationsInformation technology support systemExtensibilitySelf-healing
The invention discloses a method for fusion reconstruction and interaction of intelligent power distribution network big data. The method is characterized by comprising the steps that (1) division of initial clusters is conducted on the intelligent power distribution network big data, an association rule is established according to the operating state of an intelligent power distribution network, and division of extended clusters is achieved; (2) current data are allocated to the initial clusters based on historical data, the operating state is predicted on the basis of the association rule, so that a self-healing control strategy is determined, and panoramic risk management and control and self-healing control are conducted. According to the method, division of the initial clusters is conducted according to the grid density, division of the extended clusters is conducted through establishment of the association rule, fusion reconstruction and interaction of the intelligent power distribution network big data are achieved, cluster efficiency and cluster accuracy of data streams are effectively improved, and the method has good extensibility and achieves system integration between panoramic risk management and control and self-healing control of the intelligent power distribution network. When the method for fusion reconstruction and interaction of the intelligent power distribution network big data is applied, the intelligent control level of the power distribution network can be improved, and the self-healing control function of the power distribution network is enhanced.
Owner:HOHAI UNIV CHANGZHOU

A Mixture-Gaussian-Based SDN Flow Clustering Method

The invention relates to an SDN stream clustering method based on gaussian mixture. According to the method, a basic gaussian mixture model algorithm is improved, side information of streams is introduced, and a gaussian mixture model based on side information and constrained by an equivalent set is constructed, so that a clustering effect is improved, and the SDN stream clustering method is applied to SDN data stream clustering. By adopting the method, the accuracy of a clustering result is improved greatly and a clustering speed is increased greatly.
Owner:FUZHOU UNIV

A Fast Extraction Method of Trojan Horse Communication Features Based on Clustering Analysis of Multiple Data Streams

The invention discloses a Trojan horse communication feature fast extraction method based on network data stream clustering. The method comprises the steps that firstly, a captured network data packet is sorted according to a network conversation, wherein an IP address and a port of a monitoring object serve as a source IP address and a source port, and the data packet is subjected to conversation division according to equivalent tetrads; secondly, data streams are clustered into data stream clusters through a data stream clustering algorithm based on timestamps; lastly, Trojan horse communication features are extracted, wherein the Trojan horse communication features are extracted at the Trojan horse interactive operation stage. According to the Trojan horse communication feature fast extraction method, on the basis of network data stream clustering, the network data streams are processed with clusters as units, the difference between a Trojan horse communication behavior and a normal network communication behavior is analyzed, the difference between the two behaviors is dug deeply and the network communication features are extracted in combination with traditional statistic analysis, correlation analysis and other technologies, the false alarm rate is lowered while the detection rate is guaranteed, and the Trojan horse communication feature fast extraction method can be used for detecting a secret stealing behavior in a network.
Owner:PLA STRATEGIC SUPPORT FORCE INFORMATION ENG UNIV PLA SSF IEU

Real-time hotspot area recommendation system and method

The invention provides a system and a method for recommending a hot spot area in real time. The system for recommending the hot spot area in real time comprises a server and user equipment, wherein the server comprises a GPS (global positioning system) information extraction module, a real-time data stream clustering module, a hot event mining module, a hot event library and a hot spot regional information integration module, wherein the GPS information extraction module is used for extracting GPS information from the user equipment and / or a picture sharing website; the real-time data stream clustering module is used for receiving the extracted GPS information from the GPS information extraction module and performing real-time data stream clustering on the GPS information, so that a clustering center taken as the hot spot area is obtained; the hot event mining module is used for mining a hot discussed event through an information resource sharing platform and reserving a hot event having territoriality; the hot event library is used for storing the reserved hot event having the territoriality; and the hot spot regional information integration module is used for integrating obtained hot event information and hot scenic spot information and providing the integrated hot spot area information for the user equipment.
Owner:SAMSUNG ELECTRONICS CHINA R&D CENT +1

A ddos ​​attack detection method based on intelligent bee colony algorithm

The invention provides a DDoS attack detection method based on an intelligent bee colony algorithm. Through fusion of a clustering algorithm and an intelligent bee colony algorithm, the DDoS attack detection accuracy is effectively improved. The fusion of the intelligent bee colony algorithm and the clustering algorithm can eliminate the defect that the clustering algorithm excessively relies on an original clustering center and thus improve the data flow clustering effect. The IP addresses of exceptional data flows clustered after improvement are statistically analyzed and the flow characteristic entropy H (x) of the IP addresses are calculated; if H (x) is greater than and equal to a discriminating factor RM (x) of a primary clustering data flow, it is judged that the data flows are DDoSattack data flows; otherwise, it is determined that the data flows are other exceptional data flows. The method has the advantages of less time consumption, high DDoS attack detection accuracy and low false alarm rate.
Owner:SHANGHAI MARITIME UNIVERSITY

A distributed data stream clustering method and system

The invention discloses distributed data stream clustering method and system and overcomes the defect that the existing most data steam clustering algorithms are unable to run in the distributed cloud environment, unable to easily extend and low in operational time efficiency. The method includes: summarizing data streams to obtain a plurality of eigenvectors of the data streams; performing locality-sensitive hashing algorithm to obtain a plurality of clusters with each comprising at least one eigenvector, and selecting at least one cluster as a candidate cluster; periodically using the candidate cluster to cluster eigenvectors of newly arrived data streams. The real-time performance better than that of the prior art is guaranteed by the use of the method and system based on the locality-sensitive hashing algorithm.
Owner:CHINA INFORMATION TECH SECURITY EVALUATION CENT +1

Data stream clustering method and device based on density peak value

InactiveCN113269238AGuaranteed freshnessMitigate the effects of obsolescenceCharacter and pattern recognitionClustered dataAlgorithm
The invention provides a data stream clustering method and device based on a density peak value, and relates to the technical field of data processing. According to the method, a Jaccard similarity distance and a Gaussian kernel function are combined to calculate the local density of an evolved data stream, a heuristic strategy for kernel density estimation is introduced, and a local clustering center point is selected from points with large local density. And finally, local clustering center points are recorded through a variable-scale bucket sequence, and when a clustering request arrives, the local clustering centers are merged to obtain a global clustering center. According to the method, the concept drift in the data stream can be identified, the data stream is quickly clustered, and good clustering performance is achieved.
Owner:NANJING UNIV OF POSTS & TELECOMM

A Method of Event Detection in Wireless Sensor Networks

The invention belongs to the technical field of wireless sensors, and discloses an event detection method in a wireless sensor network. The event detection method in the wireless sensor network comprises the steps of using time correlation of node reading to cluster data streams in a node time window, detect an event, and store data features of the stored event via an improved low rank spatial clustering algorithm; and classifying abnormal values in real time and judging whether the abnormal value is the event reading via an improved random forest classification algorithm. According to the method provided by the invention, the clustering algorithm and the classification algorithm is combined, so that the event region detection can be rapidly and accurately performed under the condition of being lack of a training set; furthermore, low energy consumption is needed by a detection algorithm, so that compared with traditional algorithms, the detection algorithm saves energy consumption of information exchange between nodes, and is particularly applicable to multi-dimensional data.
Owner:XIDIAN UNIV

Trojan horse detection method based on traffic analysis without control terminal

The invention discloses a torjan detection method based on uncontrolled end flow analysis. The method includes the steps that firstly, a captured network data package is processed; secondly, the network data package is organized into data flows according to quintuple information and requirements of protocol specifications; then, the data flows are classified according to equivalent tetrads to form data flow sets identified by the tetrads; finally the data flows in the data flow sets are clustered to form data flow clusters by the adoption of a data flow clustering algorithm based on timestamps. According to the torjan detection method based on the uncontrolled end flow analysis, on the basis of carrying out clustering on the network data flows to form the data flow clusters, the data flows are processed with the data flow cluster as a unit to analyze the difference between torjan communication behaviors and normal network communication behaviors, in addition, the difference between the torjan communication behaviors and the normal network communication behaviors are deeply dug in combination with the technologies of statistic analysis and data mining, and therefore uncontrolled end torjan flow in a network can be detected.
Owner:PLA STRATEGIC SUPPORT FORCE INFORMATION ENG UNIV PLA SSF IEU

Mixed data stream clustering method based on merging and pruning

The invention discloses a mixed data stream clustering method based on merging and pruning, which comprises the following steps: converting a classification attribute value into a numerical attribute by using an important measurement criterion, normalizing data, and then reducing the dimension of the data by using a principal component analysis method. The mixed data stream clustering method adopts an online / offline two-stage processing framework. In the online stage, a new micro-cluster feature vector is adopted as a data structure to store data flow summary information, the data flow summary information required in the offline stage is dynamically maintained through a micro-cluster merging algorithm and a micro-cluster pruning algorithm, and the evolution process of the data flow is accurately reflected. In the offline stage, a density peak clustering method is adopted, the micro-clusters are used as virtual objects for clustering, and a final clustering result is obtained.
Owner:ZHEJIANG GONGSHANG UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products