Unlock instant, AI-driven research and patent intelligence for your innovation.

A construction method of a universal string similarity measurement framework

A similarity measurement and construction method technology, applied in the field of data mining, can solve problems such as difficult expansion, limitations, and complex metrics

Active Publication Date: 2019-01-29
CHENGDU UNIV OF INFORMATION TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Many types of metrics have been proposed so far, but these metrics are either complex, not easy to flexibly extend, or have limitations in incorporating other semantic features (such as affixes)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A construction method of a universal string similarity measurement framework
  • A construction method of a universal string similarity measurement framework
  • A construction method of a universal string similarity measurement framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] A kind of construction method of general character string similarity measurement framework of the present invention, concrete process is:

[0028] (1) First set X={x 0 ,x 1 ,x 2 ,...} and Y={y 0 ,y 1 ,y 2 ,...} are two groups of strings to be compared, element x in X and Y i and y j sequence of characters with composed of with Respectively x i and y j The p-th and q-th characters in , m and n are x i and y j of length; string similarity measures are often used to find x i and y j The best mapping pair or evaluates a particular x i with each y in Y j similarity between.

[0029] (2) Secondly, the matched or similar set M={(x i ,y j ); x i =y j ,x i ∈X,y j ∈Y} and non-matching set N={(x i ,y j ); x i ≠y j ,x i ∈X,y j A set of character strings X×Y={(x i ,y j ); x i ∈X,y j ∈Y}.

[0030] (3) Then based on matching or similar set M={(x i ,y j ); x i =y j ,x i ∈X,y j ∈Y} and non-matching set N={(x i ,y j ); x i ≠y j ,x i ∈X,y ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a construction method of a general character string similarity measurement framework. The first step is to set up two string groups to compare with each other. A set of strings consisting of a matching or similar set and a mismatched set. Then, a set of comparison criteria is defined for each string similarity measure based on matching or similar set and mismatching set, and the accurate result of string similarity measure is obtained. Then, the posterior probability is estimated based on the maximum likelihood estimation method. Finally, a framework for string similarity measurement is proposed, which combines additional features. The invention based on the Fellegi-Sunter model is reasonable and simple, and provides guidance for the design of string similarity measurement system that needs to incorporate a large number of semantic features quickly and flexibly.

Description

technical field [0001] The invention belongs to the technical field of data mining, and in particular relates to a method for constructing a general character string similarity measurement framework. Background technique [0002] String similarity metrics are important techniques for detecting repeated and literal similar strings in a database. Various types of metrics have been proposed so far, but these metrics are either complex, not easy to expand flexibly, or have limitations in incorporating other semantic features (such as affixes). [0003] String similarity measure, also known as string distance measure, or string measure for short, measures the similarity (or distance) between strings by matching two strings to be compared. String similarity measures are widely used in many applications, such as record linking, entity normalization, information integration, ontology alignment, etc. So far, many string similarity measurement methods have been proposed, such as Dic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/903G06F17/22
CPCG06F40/194
Inventor 王亚强闫飞飞王晓峰舒红平唐聃
Owner CHENGDU UNIV OF INFORMATION TECH