Vision-based character string similarity calculation method and similarity judgment method

A similarity calculation and visual similarity technology, applied in the field of string matching, which can solve the problem that the visual characteristics of characters are not well considered.

Active Publication Date: 2016-11-16
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The current various string matching methods consider their similarity more from the perspective of strings, and do not take into account the visual characteristics of characters.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Vision-based character string similarity calculation method and similarity judgment method
  • Vision-based character string similarity calculation method and similarity judgment method
  • Vision-based character string similarity calculation method and similarity judgment method

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0045] Example 1 Visual-based string similarity calculation method

[0046] Take the calculation of "g00gle" and Google "google" as an example to calculate their visual similarity:

[0047] 1. First, by configuring the font, grid size and other parameters, get the grid font picture, such as figure 1 , The example character is the lowercase character of the character "L":

[0048] 2. Then, convert all characters to vectors, the schematic process is as follows Figure 4 .

[0049] 3. Character similarity calculation

[0050] To calculate the cosine distance of any two characters, the calculation formula is as follows:

[0051] 1 - c o s θ = 1 - V i · V j | V i | X | V j |

[0052] The final result is as follows: (indicating the character similarity between two characters)

[0053]

[0054]

[0055] 4. The calculation of string similarity is calculated using the modified Levenshtein Distance formula, which is as follows:

[0056...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a vision-based character string similarity calculation method and similarity judgment method. The character string similarity calculation method comprises the steps of 1) converting each character in two to-be-compared character strings into a corresponding grayscale picture; 2) splicing each row of grayscale values of the picture obtained in the step 1) to obtain a one-dimensional vector of the corresponding picture; 3) calculating the similarity between two one-dimensional vectors corresponding to any two characters in the two to-be-compared character strings, and determining character similarity between the selected two characters according to the similarity; and 4) calculating visual similarity between the two to-be-compared character strings based on the obtained character similarity and character positions. According to the method, difference, presented in a human visual method, of different characters is considered, so that the calculated character string similarity better conforms to a human subjective feeling.

Description

Technical field [0001] The present invention relates to the field of string matching, and in particular to a visual-based string similarity calculation method. The invention takes into account the differences in human visual aspects of string similarity, and can calculate the similarity of strings based on human visual characteristics. . Background technique [0002] For a long time, due to the universality of similar duplicate records, the complexity of performance, and the severity of the impact on subsequent data processing and auxiliary decision-making, how to detect and eliminate duplicate records of similar strings has been one of the important themes of data cleaning research. . In addition, the calculation of string similarity has important applications in malicious domain detection systems, plagiarism detection systems, automatic scoring systems, anti-code plagiarism systems, web search and other fields. [0003] Currently, there are many methods for calculating string s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06K9/20
CPCG06V10/22G06F18/22
Inventor 柳厅文张洋亚静李全刚时金桥郭莉
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products