A visual-based string similarity calculation method and similarity judgment method

A similarity calculation and visual similarity technology, applied in the field of string matching, can solve problems such as not taking into account the visual characteristics of characters, and achieve the effect of easy implementation, easy expansion, and simple model

Active Publication Date: 2019-06-04
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The current various string matching methods consider their similarity more from the perspective of strings, and do not take into account the visual characteristics of characters.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A visual-based string similarity calculation method and similarity judgment method
  • A visual-based string similarity calculation method and similarity judgment method
  • A visual-based string similarity calculation method and similarity judgment method

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0045] Example 1 Vision-based calculation method of string similarity

[0046] Take the calculation of "g00gle" and Google "google" as an example to calculate their visual similarity:

[0047] 1. First, get the rasterized font image by configuring parameters such as font and rasterization size, such as figure 1 , the example character is the lowercase character of the character "L":

[0048] 2. Then, convert all the characters into vectors, the schematic process is as follows Figure 4 .

[0049] 3. Character similarity calculation

[0050] Calculate the cosine distance of the resulting vector for any two characters, the calculation formula is as follows:

[0051]

[0052] The final result is as follows: (represents the character similarity between characters)

[0053]

[0054]

[0055] 4. Calculate the similarity of strings, using the improved edit distance (Levenshtein Distance) formula to calculate, the formula is as follows:

[0056]

[0057]

[0058] Th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a visual-based character string similarity calculation method and a similarity judgment method. The method for calculating the character string similarity of the present invention is: 1) converting each character in the two character strings to be compared into a corresponding gray scale image; 2) splicing the gray value of each line of the image obtained in step 1) , to obtain the one-dimensional vector of the corresponding picture; 3) for any two different characters in the two character strings to be compared, calculate the similarity of the two one-dimensional vectors corresponding to these two characters, and determine the character similarity of the selected two characters according to the similarity 4) Calculate the visual similarity of two character strings to be compared based on the character similarity and character position obtained above. This method takes into account the differences that different characters present in human visual methods, and the calculated string similarity is more in line with human subjective feelings.

Description

technical field [0001] The present invention relates to the field of character string matching, and in particular to a method for calculating the similarity of character strings based on vision. The invention considers the differences in the similarity of character strings in human vision, and can calculate the similarity of character strings based on human visual characteristics. . Background technique [0002] For a long time, due to the prevalence of similar duplicate records, the complexity of performance, and the seriousness of the impact on subsequent data processing and auxiliary decision-making, how to detect and eliminate duplicate records with similar strings has always been one of the important topics in data cleaning research. . In addition, the calculation of string similarity has important applications in malicious domain name detection systems, plagiarism detection systems, automatic scoring systems, anti-code plagiarism systems, web search and other fields. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06K9/20
CPCG06V10/22G06F18/22
Inventor 柳厅文张洋亚静李全刚时金桥郭莉
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products