Unlock instant, AI-driven research and patent intelligence for your innovation.

A method used for mass generation of relational simulation data

A simulated data and relational technology, applied in the field of data simulation, can solve problems such as high cost, unsatisfied constraints, failure of computing tasks, etc.

Pending Publication Date: 2019-01-18
南京安讯科技有限责任公司
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In the prior art, the amount of manually written test data is often small, and there is a gap of several orders of magnitude from the actual data amount of the production environment; the manually written test data is often found to have dirty data after actual testing, that is, it does not meet the constraints or is incomplete data
However, by manually adjusting these dirty data, the cost is huge but the effect is very little; according to the experience of the big data project in the past, after running in the production environment, there are often problems that cannot be reproduced in the test environment
For example, there may be very large data sets with uneven distribution that cause computing tasks to fail during shuffle, and it is difficult for the test environment to detect such hidden problems early because such a large number of data sets cannot be manually generated.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method used for mass generation of relational simulation data
  • A method used for mass generation of relational simulation data
  • A method used for mass generation of relational simulation data

Examples

Experimental program
Comparison scheme
Effect test

example

[0022] (1) Name Simulator: It can simulate Chinese names according to the pre-configured Chinese character library;

[0023] (2) Date Simulator: According to the specified start and end dates, random dates can be simulated during this period;

[0024] (3) Dictionary Simulator: The enumeration value can be randomly selected according to the pre-configured dictionary list;

[0025] (4) Digital Simulator: Randomly generate numbers according to the specified value range;

[0026] (5) Linkage simulator: combine two or more fields into a linkage field, and use a linkage dictionary simulator (currently only supports linkage dictionaries, such as province / city / district three-level dictionary linkage) to randomly generate linkage enumerations value;

[0027] (6) Sequence Simulator: It can start from the minimum value of the specified interval, and generate numbers in order until the end of the maximum value of the interval.

[0028] S4: Create and maintain the simulator instance acc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method used for mass generation of relational simulation data. The method comprises: at run time, based on the partition field definitions in the data template, generating asequence composed of one or more partition fields, starting from the first partition set of the sequence, generating a local file through the simulator and logical rules, after the local file is imported into the specified location of Hive warehouse, entering the second partition set to start simulation, and generating the second local file at this time. The second local file references the firstlocal file line by line, translates the same key fields, and changes the remaining fields in accordance with the logical rules.

Description

technical field [0001] The invention relates to the technical field of data simulation, in particular to a method for generating relational simulation data in large batches. Background technique [0002] In the prior art, the amount of manually written test data is often small, and there is a gap of several orders of magnitude from the actual data amount of the production environment; the manually written test data is often found to have dirty data after actual testing, that is, it does not meet the constraints or is incomplete data. Manually adjusting these dirty data is costly but has little effect. According to the experience of launching big data projects in the past, after running in the production environment, problems that cannot be reproduced in the test environment often occur. For example, there may be very large datasets with unbalanced distribution that cause computing tasks to fail during shuffle, and it is difficult to detect such hidden problems early on in t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/28
Inventor 王晟
Owner 南京安讯科技有限责任公司