Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Correlativity feature-containing simulative stream big data generation method for system test

A technology for data generation and system testing, applied in electrical digital data processing, software testing/debugging, error detection/correction, etc., to achieve the effect of increasing parallelism, increasing efficiency, and enriching fields

Inactive Publication Date: 2017-01-04
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] [Purpose of the invention]: The present invention mainly solves the problem of simulation data generation in streaming big data system testing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Correlativity feature-containing simulative stream big data generation method for system test
  • Correlativity feature-containing simulative stream big data generation method for system test
  • Correlativity feature-containing simulative stream big data generation method for system test

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The present invention will be described in detail below in conjunction with the accompanying drawings and specific examples.

[0018] as attached figure 1 As shown, the method consists of three parts: parameter setting and data description, correlation control, parallelism and flow rate control.

[0019] The parameter setting and data description module, after obtaining the seed data, first processes the seed data to extract the attributes of the data, including the number of attributes, attribute types, and attribute value ranges. For string type data, its maximum length needs to be calculated. Generate an xml file; at the same time, the user needs to define four parameters, the maximum correlation ignore coefficient c, the time series correlation regression order r, the time segment T, and the segmental flow rate S.

[0020] Next, data correlation analysis is carried out, and the specific implementation process is as follows:

[0021] Step 1: Perform MIC calculation...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a correlativity feature-containing simulative stream big data generation method for a system test, and solves the problem in simulative data generation in a stream big data system test. The method comprises the steps of firstly, analyzing a seed data set of a real scene, and giving out a scatter diagram of every two attributes; secondly, describing the correlativity between every two attributes by adopting a maximum mutual information coefficient, and generating a weighted undirected complete graph of N nodes (N is the number of data attributes and a weight value is a calculated related coefficient); thirdly, proposing a c-prim algorithm to divide an attribute set so as to enable the decomposed attribute sets to have the characteristics similar to highly cohesive and low coupling, giving out a time sequence model selection policy, and performing simulation by adopting different time sequence models according to different characteristics of the attribute sets, thereby ensuring the correlativity of finally generated data in time sequence; and finally, proposing a double-layer sliding window method to control a degree of parallelism and a stream data output speed. According to the method, stream big data with data features relatively close to those of the real scene can be generated and the stream speed of a generated data stream can be simply and effectively controlled.

Description

technical field [0001] The invention discloses a method for generating simulated streaming big data for system testing with correlation features, which is mainly used to solve the problem of generating simulated data in streaming big data system testing, involving correlation detection, graph division algorithm, Data flow control and other technologies. Background technique [0002] Streaming big data processing platforms, the more famous ones are Apache Storm, Apache Spark, Apache Samza and so on. In order to accurately evaluate the performance of similar platforms, Yahoo has developed a cloud service test suite YCSB, which is used to perform basic tests on cloud services, with the goal of comparing the performance of cloud data service systems; Ruirui Lu et al. proposed a test suite StreamBench, depicting The performance testing framework of the streaming system comprehensively evaluates the streaming big data system; Zhan Jianfeng et al. proposed the big data test benchm...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/36
CPCG06F11/3684
Inventor 江国华曹旭峰周明泉
Owner NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products