The traditional method is to ignore these missing data, but this method will cause errors when using missing data for data mining and analysis
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0039] see figure 1 , in a data filling method based on a clustering algorithm of the present embodiment, the following steps may be included:
[0040] Step 01, determine the attributes of the missing data.
[0041] In the process of data collection or transmission, due to human error or mechanical reasons, null values may be caused, resulting in missing data. In this embodiment, the positioning of missing data can be realized by using a null value positioning method.
[0042] In the embodiment of the present invention, after the missing data is located, the attribute of the missing data may be determined according to the data content. For example, if a boy's love for basketball is missing, then the love for basketball is determined as an attribute of the missing data. For another example, if a user has missing data on the probability of renewal of the purchased target insurance after expiration, then the probability of renewal of the target insurance after expiration is ...
Embodiment 2
[0096] see Figure 5 The data filling method based on the clustering algorithm of the present embodiment is based on the first embodiment, including the following steps:
[0097] Step 501, determining attributes of missing data.
[0098] Step 502, performing binary group integration on the data according to the attributes of the missing data.
[0099] Step 503, clustering the data after the binary group integration to form clusters.
[0100] In the embodiment of the present invention, in order to realize the filling of the missing data, the data with the same attribute as the missing data can be clustered according to the data after the binary group integration and the reference data as a benchmark. For example, based on boys as the benchmark, clustering the degree of love for basketball can form multiple clusters. The formed clusters are all boys’ love for basketball, but the degree of love is different. For example, five clusters are formed , respectively: Like it very mu...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The invention provides a clustering algorithm-based data filling method and device, and a computer device. The method comprises the steps of determining attributes of missing data; performing two-tuple integration on the data according to the attributes of the missing data; clustering the data after the two-tuple integration to form a class cluster; determining a class cluster where the missing data is located; determining a reference data set for filling the missing data according to the class cluster where the missing data is located; and filling the missing data according to the reference data set. According to the method, the missing data can be filled, the accuracy of the filled missing data is ensured, and a basis is provided for the accuracy of data mining and analysis.
Description
technical field [0001] The present invention relates to the field of big data technology, in particular to a data filling method, device and computer equipment based on a clustering algorithm. Background technique [0002] With the rise of big data, the demand for data processing has become larger and wider. However, missing data may occur during data acquisition, or during data processing. The traditional method is to ignore these missing data, but this method will cause errors when using missing data for data mining and analysis. Contents of the invention [0003] The object of the present invention is to provide a data filling method, device and computer equipment based on a clustering algorithm, which are used to solve the problems existing in the prior art. [0004] To achieve the above object, the present invention provides a data filling method based on a clustering algorithm, characterized in that, the method comprises the following steps: [0005] Identify attr...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.