Check patentability & draft patents in minutes with Patsnap Eureka AI!

Data sample uniform sampling method and device for big data analysis

A data sample, uniform sampling technology, applied in complex mathematical operations and other directions, can solve problems such as uneven data distribution, uneven data, waste of manpower, etc., to achieve accurate automatic analysis results, improve sampling efficiency, and improve overall efficiency.

Pending Publication Date: 2020-01-24
WUHAN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The simplest sampling method is random sampling, and the biggest problem encountered by random sampling is that the sample area with high density may be sampled too much, and the samples that are biased towards outliers cannot be covered by sampling.
This is also a problem with most sampling methods. Such sampling results may be a great waste of manpower when supervised by experts, and it will also cause rare samples to not be selected, and the supervision effect is not perfect, resulting in poor model performance after training, which affects The accuracy of subsequent automated analysis results
In many methods, there is an assumption that the data distribution is uniform. However, the actual data is often very unevenly distributed. Even in some special analysis scenarios, the data is not only uneven, but the number of samples in some categories is very small. This kind of problem is It's tricky, and in common practice, upsampling will lead to overfitting, and downsampling will lose data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data sample uniform sampling method and device for big data analysis
  • Data sample uniform sampling method and device for big data analysis
  • Data sample uniform sampling method and device for big data analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention, and are not intended to limit this invention.

[0023] The embodiment of the present invention provides a uniform sampling method of data samples for big data analysis, which is a uniform sampling method not affected by the distribution density of data samples, and a given data set is recorded as P={p 1 ,p 2 ,...,p n}, p i is the i-th data point (1≤i≤n) in the data set, and is a d-dimensional vector. Assuming that the tth representative point is obtained after the tth selection, use R t and C t Represent the representative point set and the candidate point set after the tth selection, each data point ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data sample uniform sampling method for big data analysis, which comprises the following steps of: firstly, determining initial point data as a first representative point, including an initial data point appointed by a user, or selecting a data point closest to the center of a data set as an initial data point; and calculating the distances between all candidate points and the nearest representative points, selecting the candidate point with the farthest distance to add into the representative point set until enough representative points are found, and returning the representative points to serve as finally selected sampling points. According to the method, the sampling result which is uniform in distribution and complete in coverage can be obtained, so that the data preprocessing work is better completed, the sampling efficiency is improved, the overall efficiency of big data analysis is improved, and a more accurate automatic analysis result is provided.

Description

technical field [0001] The invention belongs to the field of data preprocessing in big data analysis, and in particular relates to a method and device for evenly sampling data samples for big data analysis. Background technique [0002] Data is the industrial foundation of the big data era. Selecting representative samples from large-scale data is the premise of big data analysis. Data from logistics, multimedia and other aspects are widely collected and analyzed. The application of big data analysis is very wide. For example, use big data analysis technology to explore various behavior trajectories behind the behavior of file users; IBM also fully uses big data analysis tools to help companies make predictions; and big data in medical It has also achieved great results in disease prediction. At present, there are some research results on the realization of big data technology, such as big data storage service method-201610668885.1 and a big data encryption method-201410258...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/10
CPCG06F17/10
Inventor 雷伯涵彭亚楠黄浩
Owner WUHAN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More