Three-decision unbalanced data oversampling method based on Spark big data platform

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A big data platform and oversampling technology, applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problems of reducing efficiency, achieve the effect of solving classification problems, improving performance, and ensuring recognition rate

Active Publication Date: 2017-04-26

CHONGQING UNIV OF POSTS & TELECOMM

View PDF3 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Another feature of RDD is that it is elastic. When the memory of the machine overflows during the operation of the job, the RDD will interact with the hard disk data. Although it will reduce efficiency, it can ensure the normal operation of the job.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0035] The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0036] The technical scheme that the present invention solves the problems of the technologies described above is:

[0037] Using the three-branch decision imbalance data oversampling method based on the Spark big data platform includes the following steps:

[0038] Obtain the sample set that needs to be sampled from the system, and HDFS automatically performs distributed storage, and then uses Spark to perform data transformation on the entire sample to obtain a normalized sample set in LabeledPoint format . Specific steps: first create a SparkContext object, and then use its textFile(URL) function to create a distributed dataset RDD. Once created, this distributed dataset can be operated in parallel; se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a three-decision unbalanced data oversampling method based on a Spark big data platform, and relates to a Spark big data technology in the field of data excavation. The method comprises the following steps: firstly, carrying out data transformation with an RDD (Resilient Distributed Dataset) of Spark to obtain a normalized sample set with the LabeledPoint format <label: [features]>, and dividing the sample set into a training set and a test set; secondly, carrying out data variation by adopting the RDD of Spark, calculating a distance between samples, determining the radius of a domain, and classifying the samples in the whole training set into positive domain samples, boundary domain samples and negative domain samples according to a neighborhood three-decision model; then respectively oversampling the boundary domain samples and the negative domain samples; and finally, calling a Spark Mllib machine learning algorithm to verify a sampling result. According to the three-decision unbalanced data oversampling method based on the Spark big data platform, the problem of classification of a large-scale unbalanced data set in the field of machine learning and mode recognition is effectively solved.

Description

technical field [0001] The invention belongs to the fields of data mining, pattern recognition and big data processing, and specifically relates to a three-way decision-making unbalanced data oversampling method based on a Spark big data platform. Background technique [0002] In recent years, mobile phones have already become our daily necessities, and their replacements are quite frequent. It seems that it is becoming more and more common for users to replace their mobile phones. On the one hand, the faster users change their mobile phones, the greater the market value and the higher the manufacturer's income. Therefore, manufacturers need to do everything possible to design new products to stimulate users to replace their mobile phones. On the other hand, major operators are successively using data mining technology to improve marketing efficiency. In actual work, the analysis of customer terminal preferences in the current communication industry is simply based on busine...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06K9/62

CPCG06F18/217G06F18/24133G06F18/214

Inventor胡峰王蕾欧阳卫华于洪王进雷大江李智星瞿原赵蕊张其龙

OwnerCHONGQING UNIV OF POSTS & TELECOMM

Three-decision unbalanced data oversampling method based on Spark big data platform

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology