Method for determining data sample class and system thereof

A technology for data sampling and determining data, applied in multi-programming devices, concurrent instruction execution, machine execution devices, etc., can solve the problem of low processing efficiency, achieve the effect of reducing memory capacity requirements and improving data processing efficiency

Active Publication Date: 2010-08-11
CHINA MOBILE SUZHOU SOFTWARE TECH CO LTD +2
View PDF0 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Embodiments of the present invention provide a method and system for determining the category of data samples to solve the problem of low processing efficiency in existing data classification methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for determining data sample class and system thereof
  • Method for determining data sample class and system thereof
  • Method for determining data sample class and system thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The embodiment of the present invention adopts the Map / Reduce mechanism to realize. Map / Reduce is an implementation method for distributed processing of massive data. This mechanism allows programs to be executed step by step on a very large cluster composed of ordinary nodes.

[0027] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0028] see figure 1 , is a schematic flow diagram of determining the data sample category in the embodiment of the present invention, and the flow includes:

[0029] A plurality of parallel execution Map tasks are generated, and each Map task is responsible for processing part of the data samples in the data sample set to be classified (equivalent to a subset of the data sample set to be classified).

[0030] Before starting the Map task, you can set the Map task by setting the Map function. The Map function can generate multiple Map tasks according to the preset para...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for determining data sample class and a system thereof. The method of the invention comprises the following steps: performing a plurality of first Map tasks in parallel, wherein each Map task obtains parts of data samples integrated by data samples to be classified; calculating the similarity of each data sample in parts of data samples and a training sample centralized by the data sample; taking the class corresponding to the first K descending sort similarities by aiming at each data sample; performing an Reduce task which respectively collects the classes corresponding to the K similarities of each data; and determining the majority of class in each data sample to the class of the data sample. The invention can improve the efficiency of classified treatment of a data sample.

Description

technical field [0001] The invention relates to data mining technology in the communication field, in particular to a method and system for determining the category of data samples. Background technique [0002] K-Nearest Neighbor (K-Nearest Neighbor), the K-Nearest Neighbor method commonly used in data mining processing, was first proposed by Cover and Hart in 1968 and is a relatively mature method in theory. The idea of ​​this method is: if most of the k most similar samples in the feature space (that is, the nearest neighbors in the feature space) of a sample belong to a certain category, then the sample also belongs to this category. In the classification decision, this method only determines the category of the sample to be divided according to the category of the nearest one or several samples. Using this method can better avoid the problem of sample imbalance. In addition, since the KNN method mainly relies on the limited surrounding samples rather than the method o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/38G06F9/46
Inventor 徐萌邓超高丹罗治国周文辉何清庄福振郑诗豪沈亚飞陈磊
Owner CHINA MOBILE SUZHOU SOFTWARE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products