Multi-source data aggregation sampling method and system based on big data environment

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A multi-source data and big data technology, applied in the field of big data, can solve the lack of research on semi-structured and unstructured data preprocessing, the inability to integrate large-scale heterogeneous data sources, and the inability to well meet user needs, etc. problems, to avoid the loss of effective information, reduce storage resources and network bandwidth, and reduce or eliminate noise data

Pending Publication Date: 2019-08-20

ZHEJIANG UNIVERSITY OF SCIENCE AND TECHNOLOGY

View PDF6 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] In the existing multi-source data aggregation sampling process under the big data environment, there is insufficient research on structured data, semi-structured and unstructured data preprocessing, and usually only includes two modules of data acquisition and data cleaning, and the data The cleaning method is also relatively simple, which cannot well meet the needs of users; at the same time, when the data is fused, there is no open link data set as prior knowledge, and it is impossible to efficiently and accurately perform large-scale heterogeneous data sources while reducing the complexity. fusion of

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0072] In order to further understand the invention content, features and effects of the present invention, the following embodiments are exemplified, and detailed descriptions are included with the accompanying drawings.

[0073] The structure of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0074] Such as figure 1 As shown, the multi-source data aggregation sampling method based on the big data environment provided by the present invention comprises the following steps:

[0075] S101, collect multiple original data sources through the data source collection module, each original data source includes a data source name and at least one associated field;

[0076] S102, the central control module uses the data processing program to clean, identify and remove redundant operations on the collected data sources through the preprocessing module;

[0077] S103, using the construction program to obtain the original policy l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of big data, and discloses a multi-source data aggregation sampling method and system based on a big data environment, and the method comprises the steps:collecting a plurality of original data sources, wherein each original data source comprises a data source name and at least one association domain; cleaning and identifying the acquired data source,and removing redundant of the acquired data source; obtaining an original strategy list by utilizing a construction program according to the original data source, and sorting the original strategies in the original strategy list to form a strategy list between the data sources; carrying out fusion processing on different source data sets by utilizing a fusion program; carrying out word segmentation on the fused file to form a two-dimensional word frequency matrix of file words; setting a balance verification numerical value, circularly matching each word, and carrying out snowball sampling; and displaying the acquired multi-source data by using a display. According to the method, distributed computing is completed by scheduling the computing nodes by the Spark through the preprocessing module, more efficient data preprocessing can be achieved, practicability is high, and the application range is wide.

Description

technical field [0001] The invention belongs to the technical field of big data, and in particular relates to a multi-source data aggregation sampling method and system based on a big data environment. Background technique [0002] Multi-source data fusion technology refers to the use of relevant means to integrate all the information obtained through investigation and analysis, and to make a unified evaluation of the information, and finally to obtain unified information. The purpose of this technology is to integrate various data information, absorb the characteristics of different data sources, and then extract unified, better and richer information than single data. However, in the process of multi-source data aggregation sampling in the existing big data environment, there is insufficient research on structured data, semi-structured and unstructured data preprocessing, and usually only includes two modules: data acquisition and data cleaning. Moreover, the method of da...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F16/182G06F16/174G06F16/11

CPCG06F16/11G06F16/174G06F16/182

Inventor云本胜钱亚冠胡月

OwnerZHEJIANG UNIVERSITY OF SCIENCE AND TECHNOLOGY

Multi-source data aggregation sampling method and system based on big data environment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology