Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Abnormal data diagnosis method and system based on two-step clustering algorithm

A technology of abnormal data and clustering algorithm, applied in relational database, database model, structured data retrieval, etc., can solve problems such as difficult effective application, cumbersome calculation process, and large amount of data processing, so as to improve efficiency and good flexibility , Guarantee the effect of reliability and stability

Pending Publication Date: 2021-05-07
北京易莱信科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 1). The selection of the prediction model is not a simple yes / no option. In many cases, the selection of the user model algorithm and the selection of parameters are different, and the evaluation index is usually more than one, so different users may choose different The abnormal data diagnosis results are too dependent on the selection of the prediction model, which leads to relatively high requirements for the user's data processing and analysis professional ability, and poor applicability;
[0007] 2). The reliability of the prediction model cannot be guaranteed, that is, the quality of the prediction model corresponding to each type of data needs to be guaranteed by massive data for calculation and fitting. The data processing volume is large and the calculation process is cumbersome, otherwise the model cannot be truly represent the overall trend of the data;
[0008] 3). For unsupervised data diagnosis scenarios, this type of method is difficult to apply effectively. Usually, this type of method is applied to data with two types of parameters: independent variable (cause) and dependent variable (result). If for unsupervised When there is no obvious and effective trend envoy in the scene or original data, the prediction model cannot be established. It can be seen that the stability and consistency of the method of model identification are not good, which greatly limits its practicability for different data scenarios

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Abnormal data diagnosis method and system based on two-step clustering algorithm
  • Abnormal data diagnosis method and system based on two-step clustering algorithm
  • Abnormal data diagnosis method and system based on two-step clustering algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0057] figure 2 It shows a schematic flowchart of the abnormal data diagnosis method based on the two-step clustering algorithm provided by Embodiment 1 of the present invention, refer to figure 2 It can be seen that the method includes the following steps

[0058] In the diagnostic class definition step, before performing the diagnostic operation, classify the data with diagnostic requirements in a custom category, and determine the diagnostic class used as clustering auxiliary setting information.

[0059] In the data clustering step, a two-step clustering algorithm is used for clustering processing based on the determined diagnostic class according to a set strategy.

[0060] The abnormal diagnosis step is to analyze and calculate the clustered data by using a set algorithm, determine the abnormal index corresponding to each data, and find all target abnormal data according to the abnormal index.

[0061] Specifically, the embodiment of the present invention determines ...

Embodiment 2

[0095] Further, based on the characteristics of the clustering algorithm, considering that the data contained in the data clusters formed by it may be data containing multiple fields, for example, in the above embodiment, all consumer objects are divided into five different types of consumers Finally, the different identity feature information fields of each consumer, or the different date fields in the combined birth date information of human beings, in which each field is regarded as a different feature attribute corresponding to the data, in order to further improve the reliability of abnormal data diagnosis and practicability, the researchers of the present invention consider that sometimes data with abnormalities in individual feature attributes cannot be effectively identified, and will become potential influencing factors in the remaining data. In order to make such data as clear as possible for users, Avoid potential influencing factors from affecting the subsequent dat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an abnormal data diagnosis method and system based on a two-step clustering algorithm, and the method comprises a diagnosis class definition step which is used for carrying out user-defined class division of data with a diagnosis requirement, and determining a diagnosis class, a data clustering step used for carrying out clustering processing by applying a two-step clustering algorithm according to a set strategy on the basis of the determined diagnosis class; and am abnormality diagnosis step used for analyzing and calculating the clustered data, determining an abnormality index of each data and discovering target abnormal data. By adopting the scheme, the abnormal data is diagnosed on the basis of the classification formed by clustering and the data distance corresponding to the classification on the basis of the two-step clustering algorithm, the limitation of application data types and data scenes in the existing diagnosis technology can be overcome, the method can be suitable for all users with diagnosis requirements, practicability is better. A more stable and more accurate data diagnosis result is provided for each user, and a reliable resource basis is provided for data analysis and processing work.

Description

technical field [0001] The invention relates to the technical field of data cleaning and processing, in particular to a method and system for diagnosing abnormal data based on a two-step clustering algorithm. Background technique [0002] Data cleaning is an important step in data analysis, and the identification and processing of abnormal data is the cornerstone of effective data cleaning and data analysis. Existing researchers mostly diagnose abnormal data based on the following types of abnormal data identification methods: [0003] 1. The method of identification based on data characteristics. Commonly used identification methods of this type include box plot analysis, standardization and distance identification. Among them, the box plot analysis identifies the data beyond a certain range beyond the upper and lower quartiles through analysis. Data, define satisfied data as abnormal data. The normalization method calculates the normal normalization value of each data, a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/28G06K9/62
CPCG06F16/215G06F16/285G06F18/231
Inventor 汪尚闫秀媛
Owner 北京易莱信科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products