Unlock instant, AI-driven research and patent intelligence for your innovation.

Similar text calibration method

A calibration method and text technology, applied in the fields of instruments, electrical digital data processing, character and pattern recognition, etc., can solve the problems of disordered order and poor processing effect, and achieve the effect of improving accuracy and efficiency

Inactive Publication Date: 2020-08-25
SOUTH CHINA UNIV OF TECH +1
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The core of the heuristic matching algorithm is approximately continuous, which can only deal with plagiarized texts with only a small amount of interference information added. For plagiarized texts with disordered order and more interference information, the processing effect is not good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Similar text calibration method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The present invention will be further described below in conjunction with specific examples.

[0048] Such as figure 1 As shown, the similar text marking method provided in this embodiment includes the following steps:

[0049] Step 1, denoise the document and generate the original fingerprint vector, as follows:

[0050] Remove symbols that do not affect semantics in documents through regular expressions;

[0051] Case conversion, which uniformly converts the letters contained in the document to lowercase;

[0052] Replace the variables involved in the document with meaningless variable names;

[0053] Record the position information of each character before preprocessing after document preprocessing;

[0054] Use the k-gram method to generate original fingerprint vectors from the processed documents, and record the text position represented by each original fingerprint.

[0055] Step 2. Sampling the original fingerprint vector formed in step 1 to form a new sampl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a similar text calibration method. The method comprises the steps of performing denoising on a document, and generating an original fingerprint vector; sampling the original fingerprint vector; comparing the two sampling fingerprint vectors through a fast matching algorithm to obtain a matching fingerprint pair and the position of the fingerprint pair in the respective fingerprint vector, and obtaining all matching fingerprint serial number pairs between the two sampling fingerprint vectors; projecting all the matched fingerprint serial number pairs into a two-dimensional coordinate; clustering the matching fingerprint pairs meeting the approximation condition along the 45-degree direction by using a sequential approximation clustering method to form a class clusterset; clustering the class clusters meeting the density requirement along the 45-degree direction by using a slope density clustering method; performing post-processing on the clustering result, and discarding the class clusters which do not meet the slope requirement; and calculating the original document positions of the starting and ending fingerprints of each class cluster to form a calibration result of similar texts between the two documents. The method has extremely high accuracy and strong anti-interference capability, and can effectively improve the accuracy of similar text calibration.

Description

technical field [0001] The invention relates to the technical field of design text classification, in particular to a similar text marking method. Background technique [0002] Historically, the line between plagiarism and research has been blurred. In academia, all research is based on the further extension of existing results. In some research fields (such as economics, law), an academic paper can cite the opinions of hundreds of other people's articles to prove whether its own point of view is correct or not; in other research fields (such as mathematics), an academic paper can In the paper, the content related to the author's own conclusions may only occupy a small part of the entire paper, and other content comes from other people's related research results; in the fields of engineering and computer science, this phenomenon is particularly prominent, often for a A change of a parameter in an existing algorithm can form a paper. [0003] International research on plag...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/216G06F40/284G06K9/62
CPCG06F40/216G06F40/284G06F18/2321
Inventor 艾飞张凌邹杜卢伟健陈炳林黄基峰伍晓林
Owner SOUTH CHINA UNIV OF TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More