Automatic big-data analysis method and system capable of simplifying programming

An analysis method and big data technology, applied in the field of data analysis, can solve problems such as improving the learning curve, a large number of program modifications, complex data matching and analysis, etc.

Active Publication Date: 2016-08-10
XUANCAI INTERACTIVE NETWORK SCI & TECH
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] 1) Data with a simple text structure is not friendly to developers, such as: conversion of types, formats, etc.
[0009] 2) The document data analysis work is coupled with business logic processing, resulting in a linear increase in program complexi

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic big-data analysis method and system capable of simplifying programming
  • Automatic big-data analysis method and system capable of simplifying programming
  • Automatic big-data analysis method and system capable of simplifying programming

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0058] Specific embodiments of the present invention will be further described below:

[0059] A big data automatic parsing method that can simplify programming. Before executing the parsing method, the Hive table of the input data source of MapReduce should be pre-defined. The Hive table to be processed by the task and the data address corresponding to the table, register the anonymous class on the Map class (the member variables of the anonymous class containing the @IField annotation will be actively mapped to the corresponding fields of the Hive table); after the above predefined process ends , the system will automatically submit the MapReduce task and related parameters, and automatically execute the automatic analysis method during the Map process, including:

[0060] Match the input data of the Map stage and its file address according to the predefined Hive table and the corresponding data address of the table, and simultaneously match the anonymous class object consis...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an automatic big-data analysis method and system capable of simplifying programming. Aiming at the problems of dirty data, low analysis efficiency and the like due to easily made mistakes of data analysis when the existing big-data programming needs fussy analysis, a manner of automatically analyzing formatted and non-formatted data is provided; each datum in a Hive table is converted into a Java anonymous class object; the corresponding relationship of the Hive table and an anonymous class field is recorded in the system; and an anonymous class matches according to variable names of member variables of @IField and field names of the Hive table, and directly gives corresponding table field values to matched member variables. The speed and the accuracy for data analysis are increased while the data flexibly is kept in big-data processing.

Description

technical field [0001] The invention relates to the field of data analysis, in particular to a Hadoop-based object-oriented analysis method and system for big data processing. Background technique [0002] Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of the cluster for high-speed computing and storage. [0003] HDFS (Hadoop Distributed File System) is used to provide Hadoop with a file system to store data. HDFS has high fault tolerance and is designed to be deployed on low-cost hardware; and it provides high throughput (high throughput) to access applications Program data, suitable for applications with large data sets. HDFS relaxes (relax) the requirements of POSIX, and can access (streaming access) data in the file system in the form of streams. [0004] MapReduce is a programming model for parallel operation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/258G06F16/283
Inventor 尤海浪唐勇陈杰
Owner XUANCAI INTERACTIVE NETWORK SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products