A data flow detection method based on fuzzy c-means clustering algorithm and entropy theory

A technology of mean clustering and detection method, applied in computing, computer parts, character and pattern recognition, etc., can solve the problem that data cannot be processed as accurately as possible, performance is degraded, etc.

Active Publication Date: 2018-03-27
天津津汉科技有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Concept drift occurs in the data, and the old system cannot process the new data as accurately as possible, and the performance will drop

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data flow detection method based on fuzzy c-means clustering algorithm and entropy theory
  • A data flow detection method based on fuzzy c-means clustering algorithm and entropy theory
  • A data flow detection method based on fuzzy c-means clustering algorithm and entropy theory

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] We selected an artificial dataset and two real datasets for experiments. The real data is downloaded from the open database UCI. The first is the real data without concept drift, the Seeds Data dataset, which includes three categories, Kama, Rosa and Canadian, each with 70 samples and seven attributes. from figure 1 It can be seen that FCM clusters the data more accurately; figure 2 is the entropy value curve of the data. It can be seen from the ordinate that when the concept drift of good classification and no attribute change occurs, the entropy value is relatively low.

Embodiment 2

[0034] Gaussian datasets are used to detect concept drift. The two sets of Gaussian data obey the distribution of N([2;2], 1) and N([4;4], 8). The data flow length is 1000, and the concept drift length is 400. image 3It is the classification of two sets of Gaussian data. Because the mean and variance are different, it shows that the data attributes have changed, and the concept drift of attribute changes has occurred at the junction. Figure 4 is the curve of its data flow entropy. It can be seen that the peak value of the entropy curve appears at the junction, indicating that the conceptual drift of property changes has occurred; after that, the entropy value tends to be stable, indicating that the current system can adapt to the new data flow and does not need to update parameters.

Embodiment 3

[0036] Power supply dataset. This dataset collects 24-hour mainnet and subnet power supply data. There are 1247 samples per hour. The experiment selected data from three time periods: 0 o'clock, 1 o'clock, and 21 o'clock. First, the data at 0:00 and 21:00 are used for experiments. Compared with 0:00, 21:00 is the peak point of electricity consumption, it can be considered that the conceptual drift of property changes has occurred compared with 0:00. Figure 5 It is the entropy value curve of the two sets of data at the junction. It can be seen that the entropy value increases significantly, and after the data is stable, the entropy value decreases. Image 6 It is the entropy curve of the data flow at point 0 and point 1. The power consumption at point 0 and point 1 is similar. It can be regarded as a data flow without concept drift, so the entropy value curve is stable.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data flow detection method based on fuzzy C-means clustering algorithm and entropy theory, which introduces the FCM algorithm into the cluster analysis of data flow, and performs fuzzy C-mean clustering analysis on data flow data; Calculate the information entropy of the data flow based on the degree of membership of the data; by analyzing the change trend of the entropy of the data flow, detect whether there is concept drift of attribute changes; including the calculation of the degree of membership and the entropy of the data flow. The present invention introduces the theory of entropy, calculates the entropy of the data flow by using the degree of membership of the data to the class, expresses the change of the entropy value on the time axis, and detects whether there is a conceptual drift of attribute change through the trend of this curve . The concept drift of attribute changes can be detected visually and intuitively by observing the trend of the entropy curve. This detection is mainly used to remind the system whether to update the parameters in time, so as to ensure the correct clustering analysis of the continuously influx of data streams as possible.

Description

technical field [0001] The invention relates to a data stream clustering and attribute change concept drift detection technology. The method is simple, practical, vivid and intuitive, and overcomes the shortcomings of complex classification algorithms in previous data mining. Background technique [0002] In recent years, as data stream mining has become a research hotspot, its classification problem has naturally been widely concerned by the academic community. The emergence of the Internet and wireless communication networks has produced a large number of data flow types: transaction record data of large supermarkets, stock prices of stock exchanges, stock transaction information data, network monitoring data, call record data of telecommunications departments, credit card transaction flow , data sent back by sensors, etc. We have noticed that most of these data are related to geographic information, mainly because geographic information has a large dimension, and it is ea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
CPCG06F18/2321
Inventor 赵一航王为秦姗张宝菊
Owner 天津津汉科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products