Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Entity pair matching method and device in database, electronic equipment and storage medium

A matching method and entity pair technology, applied in the field of data recognition, can solve problems such as difficulty, low efficiency, and high labeling cost, and achieve the effect of reducing difficulty, reducing labeling process, and improving data fusion ability

Active Publication Date: 2021-10-29
北京明智和术科技有限公司
View PDF17 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the existing technology, there are two ways of entity matching: unsupervised entity matching and supervised entity matching. When unsupervised entity matching judges whether two entities refer to the same person in reality, it first judges whether the names are exactly the same. If the names are the same, then Then calculate the string similarity of the address, if it is higher than a certain threshold, continue to judge through other attributes, it can be seen that unsupervised entity matching needs to set different judgment conditions for different judgment methods, and the judgment process requires multiple judgments , there is a problem of low efficiency
Although the supervised entity matching method is effective, it has the problem of high labeling cost and difficulty.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity pair matching method and device in database, electronic equipment and storage medium
  • Entity pair matching method and device in database, electronic equipment and storage medium
  • Entity pair matching method and device in database, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only It is a part of the embodiments of this application, not all of them. The components of the embodiments of the application generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of the application. Based on the embodiments of the present application, every other embodiment obtained by those skilled in the art withou...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an entity pair matching method and device in a database, electronic equipment and a storage medium, and the method comprises the steps: obtaining two to-be-matched target entities in the database, and respectively removing attribute tags from the two entities to obtain corresponding text sequences; inputting the text sequence into a vector representation learning model to obtain vector representation of the to-be-matched entity; calculating the similarity of the vector representation; and determining whether the two target entities are matched according to a difference value between the similarity between the vector representations of the two target entities and a threshold value. The BERT model is trained according to the comparison loss function of the entity pair and the comparison loss function of the entity attribute, so that the problem that the vector representation cannot reflect the characteristics of different attributes is avoided, the accuracy of the vector representation pair is improved, and the accuracy of a matching result is improved. The process of repeated judgment in the prior art is avoided, and the entity matching efficiency is improved.

Description

technical field [0001] The present application relates to the technical field of data identification, in particular to a method, device, electronic equipment and storage medium for matching entity pairs in a database. Background technique [0002] With the continuous development of information technology, enterprises all over the world are facing a wave of digital transformation. Obviously, a lot of data will be generated in this process. To make good use of these data to promote digital transformation, data governance is necessary to provide enterprises with a unified and clean data source. There is an important problem in the field of data governance, called entity matching (Entity Matching) or entity resolution (Entity Resolution). The goal of entity matching is to determine whether two entities in a database point to the same entity in the real world. [0003] In the existing technology, there are two ways of entity matching: unsupervised entity matching and supervise...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/194G06F40/30G06N3/04G06N3/08
CPCG06F40/194G06F40/30G06N3/04G06N3/08
Inventor 白强伟薛小娜
Owner 北京明智和术科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products