Unlock instant, AI-driven research and patent intelligence for your innovation.

Data source selection method for multi-source heterogeneous data fusion

A multi-source heterogeneous data and heterogeneous data source technology, applied in the field of big data analysis, can solve problems such as low analysis efficiency and waste of resources

Active Publication Date: 2020-06-19
HARBIN INST OF TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention aims to solve the problems of low analysis efficiency and a large amount of waste of resources in the existing big data analysis using manual collection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data source selection method for multi-source heterogeneous data fusion
  • Data source selection method for multi-source heterogeneous data fusion
  • Data source selection method for multi-source heterogeneous data fusion

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0033] Specific implementation mode one: the following combination figure 1 Describe this embodiment. The data source selection method for multi-source heterogeneous data fusion described in this embodiment is implemented based on the heterogeneous data source set S={S1, S2, ..., Sn}, and the heterogeneous data source The attribute set of each data source Si in the set S is 1 ,xi 2 ,...,xi n >

[0034] The method specifically includes:

[0035] Step 1. Establish the attribute set A={A1,A2,...,Ar} of the data analysis task target data set; randomly extract a target attribute Ai from the attribute set A as the search attribute, and search for the included attribute in the data source set S The data source of Ai, obtain the data source set P, and initialize the discriminant function value D old is 0;

[0036] Step 2. Construct each element Pi in the data source set P into a set {Pi}, forming a set T={{Pi}|Pi belongs to P};

[0037] Step 3. Calculate the score of each sub-se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data source selection method for multi-source heterogeneous data fusion, and belongs to the technical field of big data analysis. The problems that an existing big data analysis mode adopts a manual collection mode, the analysis efficiency is low, and a large amount of resources are wasted are solved. The method includes: on the basis of a heterogeneous data source set, establishing an attribute set of a data analysis task target data set, randomly extracting a target attribute from an attribute set A to serve as a search attribute, and a data source set P is obtainedthrough searching in a data source; constructing each element Pi in the data source set P into a set {Pi} to form a set T; calculating the score of each subset in the set T; obtaining a subset Tmax with the maximum score; judging whether the attribute of the set Tmax contains all target attributes or not; and if yes, judging whether redundant attributes are contained or not, reselecting the redundant attributes, if not, continuing searching, calculating a discrimination function value of the target data source set, continuing searching if the function value is increased, and otherwise, stopping searching to obtain the target data source set.

Description

technical field [0001] The invention belongs to the technical field of big data analysis. Background technique [0002] With the advent of the era of big data, hundreds of millions of data are generated every moment. Based on massive data, people need to extract useful information to understand and even guide people's daily life and work. Hence, big data analytics was born and is becoming an increasingly popular field. [0003] However, for a big data analysis task, how to obtain the data set required by the task is a very critical issue. In many data analysis algorithms, especially most machine learning algorithms, data plays a key role, and data plays a decisive role in the quality of the analysis results. However, people often assume that the dataset is given. However, most of the data sets for data analysis tasks are still obtained manually by experts or institutions in this field. Although manual acquisition of data sets can ensure data quality and is feasible when...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/242G06F16/22G06F16/28
CPCG06F16/2246G06F16/242G06F16/284
Inventor 王宏志赖昕王春楠
Owner HARBIN INST OF TECH