Method for generating data processing flow codes

A data processing and process technology, which is applied in the direction of program control devices, etc., can solve the problems of not directly supporting n-step and n-branch data processing process operations, influence, and unstable process execution efficiency.

Active Publication Date: 2011-04-27
WUHAN DAMENG DATABASE
View PDF2 Cites 64 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] 1) Does not directly support complex n-step n-branch data processing flow operations, which are very common in actual data processing
[0009] 2) Lack of strict support for processing multiple data sets at the same time, it must be implemented by the user's own programming, which is a very difficult job
[0010] 3) Some commonly used basic data operations, such as: filtering, joining, grouping and other operations, must be manually coded repeatedly every time they are used
[0013] 1) There is no unified data operation component model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for generating data processing flow codes
  • Method for generating data processing flow codes
  • Method for generating data processing flow codes

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0056] The present invention will be further described below in conjunction with the drawings and embodiments.

[0057] Assuming that there is a customer table and an order table in the database, which contain customer information and order information, about 100 million user data and 7 billion orders, now we need to perform the following statistical operation on this table, Statistics of the top 100 customer information and order amount information with the largest total order amount. At the same time, the format of some fields in the customer information must be displayed in the revised format, such as birthday, income expression, etc. The structure of the customer table is as follows:

[0058] customer(

[0059] c_custkey decimal(9, 0)not null,

[0060] c_name varchar(25)not null,

[0061] c_address varchar(40)not null,

[0062] c_birthday datetime not null,

[0063] c_phone char(15)not null,

[0064] c_income decimal(7, 2)not null,

[0065] c_comment varchar(117)not null

[0066] )

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for generating data processing flow codes, belonging to the field of data integration of cloud computing. The method comprises the steps of: (1) extracting a visualized data processing flow omega to be used as a logic model embodiment; wherein the data processing flow is a directed acyclic graph flow comprising a flow name, a version, data processing nodes and node connection information; (2) converting the logic model embodiment into a physical model embodiment of the data processing flow, wherein the physical model embodiment is in a directed acyclic graph structure; and (3) generating MapReduce codes of the data processing flow according to the physical model embodiment of the data processing flow. The method is beneficial to the reduction of user development difficulty and the acceleration of data analysis progress; and in addition, by the invention, a user can carry out parameter tuning configuration, code optimization, automatic flow logic optimization and the like on the data processing flow so that the flow execution efficiency is greatly improved.

Description

Technical field [0001] The invention relates to a method for generating data processing flow codes, in particular to a method for generating data processing flow codes oriented to a MapReduce calculation model, and belongs to the data integration field of cloud computing. Background technique [0002] Data processing can usually be described as a process of multi-step data processing operations on one or more data sets. These data processing operations include some common relational operations, such as filtering, merging, grouping, joining, and counting. It also includes some field-related operations, such as semantic annotation, face detection, etc. We call this streamlined data processing process the data processing process. Common data processing process applications include ETL processes in data warehouse applications, data analysis and mining processes in business intelligence applications, scientific workflows in the field of scientific computing, and a large number of anal...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/44
Inventor 叶丹易小华刘杰虞海江徐罡
Owner WUHAN DAMENG DATABASE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products