Data preprocessing method

A data preprocessing, data point technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of low efficiency in judging a large amount of data one by one, and achieve the effect of reliable removal, improved accuracy and efficiency

Active Publication Date: 2014-07-02
厦门见福连锁管理有限公司
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Especially for the data of some social activities and economic activities, whether a single data point is credible can often not be judged by the laws of natural science, and the efficiency of judging a large amount of data one by one is also very low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data preprocessing method
  • Data preprocessing method
  • Data preprocessing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] Such as figure 1 As shown, the data preprocessing method of this embodiment includes the following steps:

[0045] S 101 1. Selecting a plurality of data points as the first data group, each data point in the first data group includes a first coordinate value and a second coordinate value;

[0046] S 102 , removing data points whose first coordinate values ​​are different from the first coordinate values ​​of all other data points in the first data group as a second data group;

[0047] S 103 1. The data points with the same first coordinate value in the second data group are used as sub-point groups, and all sub-point groups are set to an uncalculated state, and the number threshold k of points in the same group is set;

[0048] S 104 , judging whether there are sub-point groups in the uncalculated state, and executing step S when the judging result is yes 105 , execute step S when the judgment result is negative 112 ;

[0049] S 105 1. Select an uncalculated ...

Embodiment 2

[0064] Compared with the data preprocessing method of embodiment 1, the difference of the data preprocessing method of this embodiment only lies in:

[0065] In this step S 113 and the step S 114 There is also a step S 1131 : Use all the data points of the denoised data set for curve fitting to obtain a second fitting curve and a second standard deviation, and make the distance from the second fitting curve greater than or equal to three times the second standard deviation All data points of are removed from the denoised data set.

[0066] In this step S 102 and the step S 103 There is also a step S 1021 : remove the data points with the largest and smallest second coordinate values ​​from the second data group.

[0067] Through the above steps, the reliability of screening abnormal data points can be further improved.

[0068] Such as figure 2 shows the distribution of data points for the price and sales values ​​of the raw data, figure 2 , image 3 , Figure 4 A...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data preprocessing method. The data preprocessing method includes the following steps: dividing data points into sub point groups according to first coordinate values; acquiring differences of second coordinate values of the data points in each sub point group through calculation, acquiring local outlier factors of the data points through calculation, and removing abnormal data points in each sub point group by outlier denoising; fitting all the data points subjected to outlier denoising, removing the data points with large deviations, and outputting data groups subjected to denoising. Through calculation and analysis of the coordinate values of all the data points, data preprocessing accuracy and efficiency are improved, and the abnormal data points can be removed reliably.

Description

technical field [0001] The invention relates to a data preprocessing method. Background technique [0002] When using the collected or measured data for further research, it is necessary to perform some calculations based on the data to find out some laws or principles from the data. However, in the measurement or collection of data, abnormal data points will be encountered, that is, some data points have large errors due to objective measurement conditions or defects in collected samples, or subjective operational errors of data collection personnel. not of research value. If the abnormal data points are included in the subsequent calculation and analysis without discrimination, it will have a great impact on the final calculation and analysis results. How to distinguish and eliminate abnormal data points is an important topic in data preprocessing. [0003] Especially for the data of some social activities and economic activities, whether a single data point is credible...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/00
Inventor 蔡飞向旗
Owner 厦门见福连锁管理有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products