Unlock instant, AI-driven research and patent intelligence for your innovation.

A spark-based large-scale data stream analysis method and system

A data flow analysis, large-scale technology, applied in electrical digital data processing, digital data information retrieval, special data processing applications, etc., can solve problems such as tight computing and memory resources, achieve accelerated static data flow analysis, and highly scalable sexual effect

Active Publication Date: 2022-06-17
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, as the scale of the program continues to increase, the calculation and memory resources are tight during analysis.
Therefore, static analysis of large-scale programs, especially data flow analysis, is still a very challenging problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A spark-based large-scale data stream analysis method and system
  • A spark-based large-scale data stream analysis method and system
  • A spark-based large-scale data stream analysis method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0059] Embodiment 1: The present invention designs a distributed large-scale data flow analysis algorithm that can quickly analyze large-scale programs. The algorithm mainly consists of two stages and adopts different levels of parallel strategies: the intra-procedural preprocessing stage using function-level parallelism and the inter-procedural data flow integration stage using edge-level parallelism. In the intra-procedural preprocessing stage, the single-entry analysis is no longer used, but distributed and parallel intra-procedural analysis is performed on all functions, temporarily ignoring the impact of interprocedural calls on data flow analysis. All functions in the program to be tested are distributed to different computing nodes for distributed computing. The schematic diagram of function-level preprocessing is as follows figure 1 shown. The master node distributes the function task set to different computing nodes, and each computing node processes the functions i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a large-scale data flow analysis method and system based on Spark. The method includes constructing a distributed computing framework based on Spark, and performing distributed data flow analysis on target codes through the distributed computing framework, wherein the distributed data flow analysis , including a preprocessing stage within a process, and a data integration stage between distributed processes; the system includes a data acquisition module, a data processing module, a data analysis module, a data storage module, and a data display module; the present invention adopts a two-level parallel strategy, with multiple entries Parallel computing is performed in a highly scalable manner. At the same time, the load balancing problem of distributed computing is considered. It can make full use of the advantages of distributed cluster computing power and accelerate static data flow analysis.

Description

technical field [0001] The invention relates to the field of code data flow analysis, in particular to a Spark-based large-scale data flow analysis method and system. Background technique [0002] With the continuous development and iteration of software, the scale of software is also growing rapidly, and the accompanying problem is that software vulnerabilities are gradually increasing. Due to the complex interprocedural calls in large-scale (million lines) programs, the time and difficulty of the overall analysis are greatly increased. Moreover, the analysis of inter-procedural data flow needs to be context-sensitive to ensure the validity of the analysis, so many different calling contexts need to be generated and retained. However, as the scale of the program continues to increase, the computational and memory resources are strained during analysis. Therefore, static analysis of large-scale programs, especially data flow analysis, is still a very challenging problem. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/2455G06F16/2458G06F9/54
CPCG06F16/24568G06F16/2471G06F9/542
Inventor 计卫星张宗毓景德江王一拙高玉金石峰
Owner BEIJING INSTITUTE OF TECHNOLOGYGY