Fingerprint feature-based text copy detection system and method

A fingerprint feature and detection system technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of low efficiency of fingerprint feature extraction

Active Publication Date: 2016-08-31
吴国华
View PDF5 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to overcome the defect of low fingerprint feature extraction efficiency in the existing text copy detection technology, the present invention provides a text copy detection system and method based on fingerprint features. Select trigger conditions to extract fingerprints, overcome the shortcomings of low efficiency of fingerprint feature extraction, improve the efficiency of fingerprint feature extraction, thereby improving user satisfaction in text copy detection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fingerprint feature-based text copy detection system and method
  • Fingerprint feature-based text copy detection system and method
  • Fingerprint feature-based text copy detection system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The preferred embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

[0052] Such as figure 1 As shown, the text copy detection system based on fingerprint features in this embodiment contains the following modules:

[0053] The text preprocessing module is used to convert the format of the text, filter noises such as numbers, stop words, prepositions and special symbols in the text to be detected, normalize the words, and remove the interference of English letter case.

[0054] The word encoding module, according to the original characteristics of the word, encodes the word of the preprocessed text according to the set rules.

[0055] The dictionary sorting module sorts the encoded text in sentence units according to the dictionary, and removes the punctuation in the text.

[0056] The hash value mapping module uses the rolling hash function to calculate the hash value of the text sorted by the dictionary ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a fingerprint feature-based text copy detection system and method. The system comprises a text pretreatment module, a word coding module, a dictionary sorting module, hashed value mapping module, a fingerprint extraction module and a similarity calculation module, wherein the text pretreatment module is used for carrying out format conversion on a text, filtering noise in the text, normalizing words and removing the interferences of capital and small English letters; the word coding module is used for coding the words of the text after the pretreatment according to native characteristics of the words; the dictionary sorting module is used for carrying out sorting according to a dictionary manner by taking sentence as a unit, and removing punctuations in the text; the hashed value mapping module is used for carrying out hashed value calculation by utilizing a rolling hash function so as to obtain a hashed value sequence; the fingerprint extraction module is used for selecting a triggering condition on the basis of text content, carrying out blocking according to the triggering condition, calculating hash values of text blocks by utilizing a hash function, selecting a plurality of bits, at specific positions, of the hash values to be converted into ASCII codes and taking the ASCII codes as fingerprint features; the similarity calculation module is used for comparing the similarity of text fingerprints and calculating the similarity level of the text fingerprints by utilizing a similarity algorithm.

Description

technical field [0001] The invention belongs to the technical field of text duplication detection, and in particular relates to a text duplication detection system and method based on fingerprint features. Background technique [0002] Text copy detection technology has been widely used in many fields, such as digital library, information retrieval, academic papers, spam filtering, malicious code, etc., to reduce information redundancy for users, improve satisfaction of information retrieval, and prevent Academic papers, spam, malicious code, and deduplication of web pages offer effective solutions. However, with the rapid increase of the amount of text data, the detection efficiency of traditional text duplication detection technology is not high. In order to improve the efficiency of duplication detection, some detection methods introduce fingerprint technology. [0003] Text copy detection technology based on fingerprint features is a novel text copy detection method, w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22
CPCG06F40/194
Inventor 吴国华付二帅王玉娟
Owner 吴国华
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products