Social insurance big data distributed preprocessing method and system

A data preprocessing and preprocessing technology, applied in the field of big data processing, can solve problems such as time inconsistency, and achieve the effect of improving efficiency, reducing frequency, and reducing time-consuming

Inactive Publication Date: 2016-11-16
SOUTH CHINA UNIV OF TECH
View PDF4 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

After the data record completes the preprocessing operation node, it arrives at the data loading operation node. However, since the data preprocessing operation and the data loading operation are carried out in two threads at the same time, the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Social insurance big data distributed preprocessing method and system
  • Social insurance big data distributed preprocessing method and system
  • Social insurance big data distributed preprocessing method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The present invention will be further described below in conjunction with specific examples.

[0035]The distributed preprocessing method of social security big data described in this embodiment is specifically: firstly, various operations of data preprocessing such as data extraction, field type conversion, data format conversion, data connection, deduplication, data loading, etc. are defined As a data operation node, a set of data preprocessing process is defined as a data preprocessing job, and a data preprocessing job is composed of data operation nodes. For a given data preprocessing job, one or more threads are assigned to each data operation node. A data operation node with multiple threads assigned is called a parallel data operation node. Starting a preprocessing job means starting multiple threads at the same time. Work. Such as figure 1 As shown, in the preprocessing flow chart that only includes single-threaded data operation nodes, the data flow is passed...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a social insurance big data distributed preprocessing method and system. According to the main technical scheme, the method comprises the steps of defining a data preprocessing process as a data preprocessing operation that contains a plurality of preprocessing operation nodes, and concurrently executing the preprocessing operation nodes in independent threads; allocating a plurality of executive threads to a data operation node with high complexity, and concurrently executing the data preprocessing operation by a distributed cloud server cluster; and loading and writing data of the distributed preprocessing system in a distributed file system by column, and performing cache optimization on the data writing operation by utilizing NoSQL. According to the method and the system, the processing performance of preprocessing cloud servers is brought into full play, the performance bottleneck of a single server is overcome, redundant data transmission between the servers and data nodes of the HDFS (Hadoop Distributed File System) is avoided, and the efficiency of loading the data in the HDFS is improved, so that the big data preprocessing efficiency is enhanced.

Description

technical field [0001] The invention relates to the technical field of big data processing, in particular to a method and system for distributed preprocessing of social security big data. Background technique [0002] The national-level informatization planning program "Golden Insurance Project" proposes to comprehensively promote the construction of e-government projects, and promote the informatization of national economic and social development in the way of "government first". Today, social security has covered more than one billion people, and the government has a large amount of social security big data. If the rapid development of big data technology can be used to mine and statistically analyze data in various social security business fields, it can provide a reference for the formulation of government policies and guidelines, realize innovative government services, and further promote the construction of e-government projects. [0003] The mining and analysis of so...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/27G06F16/283
Inventor 张星明陈伟健林育蓓吴世豪
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products