Object name editing distance calculating method and object name editing distance matching method based on information entropy

A technology of object name and edit distance, applied in the field of data processing, can solve problems such as incorrect conclusions and inability to effectively identify similarity, achieve better results and improve calculation methods

Active Publication Date: 2015-04-29
SHENZHEN AUDAQUE DATA TECH
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, when users use the edit distance method for name matching, they will draw some incorrect conclusions and cannot effectively identify the similarity between two object names

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Object name editing distance calculating method and object name editing distance matching method based on information entropy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The technical solution and beneficial effects of the present invention will be apparent through the detailed description of specific embodiments of the present invention in conjunction with the accompanying drawings.

[0031] see figure 1 , which is a flow chart of a preferred embodiment of the method for calculating the edit distance of an object name based on information entropy in the present invention. The method for calculating the edit distance of an object name based on information entropy mainly includes:

[0032] Step 10, collect all the object names to be identified, count the number of occurrences freq of each character and the total number totalNum of the object name, if the character appears multiple times in an object name, calculate it once;

[0033] Step 20, for each character, calculate the information entropy of the character according to the ratio between the total number totalNum of the object name and the number of occurrences freq of the character...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an object name editing distance calculating method and an object name editing distance matching method based on information entropy. The object name editing distance calculating method comprises the following steps of (10) counting the using frequency of each character and the total number of object names and defining that the character which is frequently used in an object name only appears in the object name once; (20) calculating the information entropy of characters according to the ratio of the total number of the object names to the using frequency of the characters to obtain editing cost of the characters; and (30) enabling the editing cost of inserting or deleting of a character to be equal to the editing cost of the character when editing distances of the object names are calculated and performing substitution operation under the condition that the editing cost of substitution is zero if two characters are the same and the editing cost of the substitution is the sum of the two characters if the characters are different. The invention also provides the corresponding matching method. By the object name editing distance calculating method and the object name editing distance matching method based on the information entropy, the absolute difference between two object name character strings can be reflected accurately, the similarity between two object names can be recognized effectively, and an effect of handling the problem of matching of data such as names is high.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to an information entropy-based object name editing distance calculation method and an object name matching method. Background technique [0002] Object recognition, also known as record matching, aims to identify records representing the same real-world object from various (unreliable) data sources. Object recognition plays an important role in applications such as data cleaning, data integration, and data analysis. Among the data used for object recognition, a commonly encountered and very important type of data is name data, such as institution names, drug names, building names, etc. How to effectively calculate the similarity between two names is crucial to object recognition. [0003] The result of name matching is usually obtained by comparing string similarity. Existing calculation methods of string similarity include edit distance, vector space, QGram, etc. The e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F17/10G06F40/194
Inventor 王明兴吴颖徽马帅汤南贾西贝
Owner SHENZHEN AUDAQUE DATA TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products