Document similarity detecting method based on kernel function

A detection method and kernel function technology, which is applied in the field of information retrieval, can solve the problems of similarity detection accuracy, recall rate and comprehensive performance that need to be improved

Inactive Publication Date: 2013-11-20
JIANGSU UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

CLA compound kernel [Wang Xiuhong, Ju Guangguang. Fusion of distributed information retrieval results based on hybrid kernel function [J]. Journal of Communications, 2011, 32(4): 112-118, 125.] Although it is similar to latent semantic kernel and ANOVA kernel Compared with the improvement in the accuracy rate and recall rate of similar detection, the accuracy rate, recall rate and comprehensive performance of similar detection still need to be improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document similarity detecting method based on kernel function
  • Document similarity detecting method based on kernel function
  • Document similarity detecting method based on kernel function

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail in conjunction with the accompanying drawings and embodiments. The specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0048] As shown in Figure 1, the document similarity detection flow chart based on kernel function, the present invention includes

[0049] (1) Input and preprocessing steps

[0050] The two documents that need to be compared for similarity are dX and dZ, and the following content is shown in Table 1 after the statistical words.

[0051] wxya

A

B

C

F

P

M

B

dZ

B

C

D

G

L

D

[0052] There are 10 documents to form a corpus, and all concept terms in the corpus are composed of A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P to form a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document similarity detecting method based on a kernel function, belongs to the technical field of information retrieval and mainly solves the problem that the comprehensive evaluation effect of text retrieval is not good. The method comprises the following steps of: 1, creating a document set, wherein a set consisting of lexical items of the document set forms a dictionary, and the storage capacity is N; 2, indicating a document dX and a document dZ which are to be compared into document vector sum; 3, describing the similarity of the two documents through the product of the two document vectors and Euclidean distance between the two document vectors, wherein a new S_Wang kernel function suitable for document similarity detection is formed, sigma (sigma) 0) in a formula is a width parameter, and the width parameter is used for controlling the radial action range of the function, and adjusting the influence degree of the distance between the two documents on the similarity because words are different; and 4, finishing a document similarity detecting task through the formed kernel function. The method has the advantages of high detection precision, high recall rate and good comprehensive behavior, and can be applied to document similarity calculation, document classification, document information filtration, mode identification and artificial intelligence.

Description

technical field [0001] The invention relates to the field of information retrieval, specifically a method for using the S_Wang kernel function constructed by the invention for document similarity detection. Background technique [0002] The idea of ​​the kernel method is to convert a non-linearly separable problem in a low-dimensional space to a high-dimensional space, that is, to map it to a high-dimensional space so that it becomes linearly separable in a high-dimensional space, and then in the feature space Using a linear learning machine to establish an optimized hyperplane, using the inner product in the high-dimensional feature space to classify the problem in the low-dimensional space, so as to solve the problem. The most critical part of the transformation is to find the mapping method from x in the input space to φ(x) in the high-dimensional space. There is no systematic way to find this mapping φ. In fact, the mapping function is often not easy to find, and may no...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 王秀红鞠时光
Owner JIANGSU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products