Document similarity distinguishing method based on Fourier transform

A technology of Fourier transform and discriminant method, which is applied in the fields of instrumentation, calculation, electrical digital data processing, etc., and can solve the problems of reduced efficiency, increased similarity and complexity, and increased difficulty.

Active Publication Date: 2013-09-25
STATE GRID CORP OF CHINA +4
View PDF3 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Finally, the similarity between texts is finally calculated by the inner product and cosine formula, but the biggest disadvantage of this method is that when the corpus increases, the difficulty of using vectors to represent texts will increase. At the same time, as the vector dimension increases , the complexity of calculating the similarity will increase, and the efficiency will decrease accordingly.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document similarity distinguishing method based on Fourier transform
  • Document similarity distinguishing method based on Fourier transform
  • Document similarity distinguishing method based on Fourier transform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0054] Such as figure 1 , provide a kind of document similarity discrimination method based on Fourier transform, described method comprises the following steps:

[0055] Step 1: Obtain the keyword sequence Ks and the corresponding keyword frequency set Ns of the document set S, and detect the keyword sequence Ks' and the corresponding keyword frequency set Ns' of the document s' relative to the document set S;

[0056] Step 2: Calculate the weight coefficient of each keyword in the keyword sequence Ks and Ks', and the weight sequence FKs of the keyword sequence Ks and the weight sequence FKs' of the keyword sequence Ks';

[0057] Step 3: Perform Fourier transform on the weight sequence FKs and FKs', and calculate the threshold ω of the similarity distance for detecting whether the document s' is similar to any document in the document set S S ;

[0058] S...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a document similarity distinguishing method based on Fourier transform. The method comprises the following steps: acquiring the keyword sequence Ks of a document collection S and a corresponding keyword frequency collection Ns, as well as a keyword sequence Ks' of the detection document s' relative to the document collection S and s corresponding keyword frequency collection Ns'; calculating the weight coefficient of each of the keyword sequences Ks and Ks' as well as the weight sequence FKs of the keyword sequence Ks and the weight sequence FKs' of the keyword sequence Ks'; carrying out Fourier transform to weight sequence FKs and FKs'; calculating the threshold value Omega S of similarity distance of similarity of random document in the detection document s' and the document collection S; calculating the similarity distance D (s', si) between the documents si in the detection document s' and the document collection S, and comparing the similarity distance D with the threshold value Omega S; judging whether the detection document s' and the document collection S are similar or not. The distinguishing method of document similarity based on Fourier transform provided by the invention can not only reduce the requirement to a representing method of the document while calculating similarity, but also can reduce the complexity of calculation and improve the computational efficiency.

Description

technical field [0001] The invention belongs to the technical field of information retrieval and text mining, and in particular relates to a document similarity discrimination method based on Fourier transform. Background technique [0002] As people pay more and more attention to science and technology and social development, the academic field is gradually developing towards diversification, informationization and modernization. In this situation, people urgently need efficient, comprehensive and convenient retrieval of academic information more than ever. On the other hand, people also need to prevent academic plagiarism in order to achieve the purpose of supervising and regulating dissertations and academic journals. The key to paper retrieval and plagiarism check is the comparison and calculation of similarity of text information. Therefore, the calculation of text similarity is widely used in information retrieval, text mining and other fields. It is a very basic and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 张涛林为民马媛媛邓松时坚李伟伟汪晨陈亚东周诚
Owner STATE GRID CORP OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products