Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data integration method and system, equipment and storage medium

A data integration and data fusion technology, applied in the field of data processing, can solve the problems of inappropriate traditional calculation methods and inaccurate calculation results, and achieve the effect of reducing the number of comparisons and accurate applicability

Pending Publication Date: 2021-12-07
SHANGHAI MININGLAMP ARTIFICIAL INTELLIGENCE GRP CO LTD
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] (2) For the calculation of attribute similarity, the current calculation of single-valued attribute similarity is relatively mature, but for the similarity calculation of multi-valued attributes, that is, set attributes, the existing technology regards the set as a whole, such as set{ The similarity between a, b, c} and the set {c} is 1 / 3 according to the Jaccard similarity, while in the entity matching scenario, the appearance of the set is a combination of values ​​from different data sources, and the elements in the set are relatively independent For example, an attribute value of an entity after integrating multiple data sources is {a,b,c}, and the attribute value of the entity in a new data source is {c}, the actual meaning is the attribute value and one of them The data source is an exact match, therefore, the traditional computing method is not suitable for this scenario
In the process of data entity matching and data fusion of different data sources, the processing method of only retaining the true value causes information loss and is not suitable for incremental or dynamic change scenarios. The solution is proposed to use the collection data structure to store multi-value attributes method of information
Moreover, aiming at the calculation problem of multi-valued attribute similarity in this scenario, to solve the problem of inaccurate calculation results in the prior art, a new method for calculating the similarity of set attribute values ​​is proposed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data integration method and system, equipment and storage medium
  • Data integration method and system, equipment and storage medium
  • Data integration method and system, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045]In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described and illustrated below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application. Based on the embodiments provided in the present application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

[0046] Obviously, the accompanying drawings in the following description are only some examples or embodiments of the present application, and those skilled in the art can also apply the present application to other similar scenarios. In addition, it can also be understood that although such development efforts may be complex and lengthy, for those of o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data integration method which comprises the following steps: an entity matching step: based on similarity calculation of set data, matching a plurality of data source entities to complete multi-data source entity matching; and a data fusion step: linking and fusing at least one successfully matched data source entity, storing the multi-valued attributes from the plurality of data source entities by adopting a set type data structure, generating a multi-valued attribute set, and completing data integration of the multi-source data. Incremental multi-source data integration is realized, the matching frequency can be reduced, and the storage space of processed data is released.

Description

technical field [0001] The present application relates to the field of data processing, in particular to a data integration method, system, computer equipment and computer-readable storage medium. Background technique [0002] Nowadays, many enterprises regard data as an important asset, but often due to changes in management personnel, scattered physical layout, system autonomy, etc., data sources are complex (different types of relational databases, data from different departments, etc.), For problems such as structural heterogeneity (SQL, NoSQL database, text files, Hive big data, etc.), it is not easy to complete the unified management of data assets in different departments. In the process of digital transformation of an enterprise, the integration and fusion of multi-source heterogeneous data is a necessary basic condition for the enterprise to do a good job in upper-level applications, and entity matching and data fusion are very important links in this process. For ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2455
CPCG06F16/2456
Inventor 黄艳香白强伟
Owner SHANGHAI MININGLAMP ARTIFICIAL INTELLIGENCE GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products