A data traceability method and system based on a big data model analysis platform

A technology of model analysis and data traceability, applied in file systems, electronic digital data processing, digital data information retrieval, etc., to achieve the effect of improving query speed and reducing I/O operations

Active Publication Date: 2019-04-02
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to overcome the deficiencies of the prior art, provide a data traceability method and system based on a big data model analysis platform, and solve the data traceability problem of multi-model combined processing under the big data model analysis platform

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data traceability method and system based on a big data model analysis platform
  • A data traceability method and system based on a big data model analysis platform
  • A data traceability method and system based on a big data model analysis platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The technical solution of the present invention will be further described in detail below in conjunction with the accompanying drawings, but the protection scope of the present invention is limited to the following description.

[0041] Such as figure 2 As shown, a data traceability method based on a big data model analysis platform includes the following steps:

[0042] S1. Model workflow analysis: Model workflow is a workflow running on the Hadoop platform composed of control flow nodes (for example, start nodes and end nodes) and action nodes, and analyzes the model workflow composed of models on the Hadoop platform Input node, output node and action node, and obtain the unique identifier of each node.

[0043] The step S1 includes the following sub-steps:

[0044] S11. Scanning the model workflow, looking for the first action node of the model workflow, and obtaining the input file path of the first action node as the input file path of the model workflow;

[00...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data traceability method and system based on a big data model analysis platform. The method comprises the following steps: S1. Model workflow analysis: analyzing the input nodes, output nodes and Action node, and obtain the unique identifier of each node; S2. Design the traceability information metadata model: describe a traceability file according to each model workflow; S3. Traceability information storage: build an index for the traceability file, and store the index information In the cache database, the index file is stored on HDFS; S4. Data traceability and tracking: determine whether to trace the data generation process, if not, obtain the address of the traceability file by querying the index information. The invention overcomes the problem that the traditional data traceability method is not applicable under the big data platform, builds an index for the traceability file, reduces input / output operations, and improves the query speed.

Description

technical field [0001] The invention relates to the technical field of data traceability, in particular to a data traceability method and system based on a big data model analysis platform. Background technique [0002] The big data model analysis platform is a platform involving model design, development and transaction built on the Hadoop cluster. The system provides a basic model on which users can build their own model through a visual designer, and use this model to analyze the industry data provided by the platform. Since the underlying storage and computing are supported by Hadoop clusters, this platform is a model analysis platform built on top of a big data environment. The design diagram of the model is shown in figure 1 shown. [0003] In recent years, with the development of computers and mobile Internet, all kinds of information have grown explosively. These information can basically be divided into two categories, one is the original input data, and the other...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/182G06F16/13G06F16/2458
CPCG06F16/134G06F16/182G06F16/2471
Inventor 林劼郝鹏飞彭世锦李年华陆文斌王晓明
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products