Large-scale data flow analysis method and system based on Spark

A data flow analysis, large-scale technology, applied in the direction of electrical digital data processing, special data processing applications, digital data information retrieval, etc., can solve the problems of computing and memory resource constraints, to achieve accelerated static data flow analysis, highly scalable sexual effect

Active Publication Date: 2021-09-24
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF7 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, as the scale of the program continues to increase, the calculation and memory resources are tight during analysis.
Therefore, static analysis of large-scale programs, especially data flow analysis, is still a very challenging problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale data flow analysis method and system based on Spark
  • Large-scale data flow analysis method and system based on Spark
  • Large-scale data flow analysis method and system based on Spark

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0059] Embodiment 1: The present invention designs a distributed large-scale data flow analysis algorithm that can quickly analyze large-scale programs. The algorithm mainly consists of two stages and adopts different levels of parallel strategies: the intra-procedural preprocessing stage using function-level parallelism and the inter-procedural data flow integration stage using edge-level parallelism. In the in-procedure preprocessing stage, the form of single-entry analysis is no longer used, but distributed and parallel in-procedural analysis is performed on all functions, and the impact of inter-procedural calls on data flow analysis is temporarily ignored. All functions in the program to be tested are distributed to different computing nodes for distributed computing. Function-level preprocessing schematic diagram such as figure 1 shown. The master node distributes the function task set to different computing nodes, and each computing node processes the functions in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a large-scale data flow analysis method and system based on Spark, and the method comprises the steps: constructing a distributed calculation framework based on Spark, and carrying out the distributed data flow analysis of a target code through the distributed calculation framework, wherein the distributed data flow analysis comprises an intra-process preprocessing stage and a distributed inter-process data integration stage. The system comprises a data acquisition module, a data processing module, a data analysis module, a data storage module and a data display module. According to the method and system, a two-stage parallel strategy is adopted, parallel computing is carried out in a multi-entry mode, high expandability is achieved, meanwhile, the load balancing problem of distributed computing is considered, the computing power advantage of a distributed cluster can be fully utilized, and static data flow analysis is accelerated.

Description

technical field [0001] The invention relates to the field of code data flow analysis, in particular to a large-scale data flow analysis method and system based on Spark. Background technique [0002] With the continuous development and iteration of software, the scale of software is also increasing rapidly, and the problem that follows is that software vulnerabilities are gradually increasing. Due to the existence of complex inter-procedural calls in large-scale (million-line) programs, the overall analysis time and difficulty are greatly increased. Furthermore, inter-procedural data flow analysis needs to be context-sensitive to ensure the effectiveness of the analysis, so many different calling contexts need to be generated and retained. However, with the continuous increase of the program size, the calculation and memory resources are tight during analysis. Therefore, the static analysis of large-scale programs, especially the data flow analysis is still a very challeng...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2455G06F16/2458G06F9/54
CPCG06F16/24568G06F16/2471G06F9/542
Inventor 计卫星张宗毓景德江王一拙高玉金石峰
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products