Method, device and apparatus for checking duplication of text

A text and device technology, applied in the field of computer-readable storage media, can solve the problems of low efficiency of duplication checking and large amount of calculation, and achieve the effect of saving the amount of calculation and improving the efficiency of text duplication checking.

Inactive Publication Date: 2019-03-15
LAUNCH TECH CO LTD
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of this application is to provide a text plagiarism check method, device, equipment, and computer-readable storage medium to solve the traditional plagiarism check method that needs to calculate the similarity between each source text and the target text, when the source text When the number is large, the amount of calculation is very large, resulting in the problem of low efficiency of duplicate checking

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, device and apparatus for checking duplication of text
  • Method, device and apparatus for checking duplication of text
  • Method, device and apparatus for checking duplication of text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0081] The following introduces Embodiment 1 of a text plagiarism checking method provided by the present application, see figure 1 , embodiment one includes:

[0082] Step S101: Obtain the target text.

[0083] The above-mentioned target text may specifically be a text input by a user, and the main purpose of this embodiment is to find a text similar to the target text from multiple texts to be checked for duplicates.

[0084] Step S102: Segment the target text to obtain a target text sequence including multiple words.

[0085] Text segmentation, also known as text segmentation, refers to the process of automatically identifying the boundaries between fragments with independent meaning in a text. As an optional implementation manner, in this embodiment, the target text may be interleavedly cut according to a preset text interval, and the interval size of the preset text interval may be specifically determined according to actual requirements.

[0086] Step S103: Calculate ...

Embodiment 2

[0099] The second embodiment is mainly used to find text similar to the target text from a large amount of repeated texts to be checked, such as figure 2 As shown, embodiment two specifically includes the following steps:

[0100] Step S201: Create a text fingerprint database in advance.

[0101] Specifically, a plurality of duplicate texts to be checked are determined in advance, and the fingerprint sequence of each duplicate text to be checked is obtained by calculation, and the fingerprint sequence is stored in the text fingerprint database, such as image 3 As shown, the fingerprint sequences of the text to be checked from 1 to the text to be checked are respectively calculated, and the calculated fingerprint sequences are stored in the text fingerprint database, where M is the number of texts to be checked, so as to facilitate the subsequent retrieval tasks. implement.

[0102] Step S202: Obtain the target text A input by the user.

[0103] Step S203: Interleave and c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method for checking duplication of text is disclosed, A fingerprint sequence of duplicate text to be checked can be stored in a text fingerprint database in advance, After the target text is obtained, the target fingerprint sequence is generated, and then the similar fingerprint sequence of each fingerprint in the target fingerprint sequence is calculated to obtain the similar fingerprint sequence. Finally, the fingerprint sequence including the target fingerprint sequence or the similar fingerprint sequence in the text fingerprint database is determined, and obviously, the text corresponding to the fingerprint sequence is the text similar to the target text. It can be seen that the method can generate similar fingerprint sequences of target fingerprint sequences, When judging whether the duplicated text and the target text are similar, only the fingerprint sequence of the duplicated text to be checked can be judged whether the fingerprint sequence of the duplicated text includes thetarget fingerprint sequence or the similar fingerprint sequence, and the similarity calculation of the duplicated text and the target text is not needed, thus saving the calculation amount and improving the duplication checking efficiency of the text. In addition, the present application also provides a text duplication checking apparatus, an apparatus, and a computer-readable storage medium, thefunctions of which correspond to the functions of the above-described method.

Description

technical field [0001] The present application relates to the field of text plagiarism checking, in particular to a text plagiarism checking method, device, equipment, and computer-readable storage medium. Background technique [0002] With the development of computer networks, information resources are increasing day by day. How to filter out duplicate content in a large amount of information has become a key issue. [0003] Text plagiarism checking is the process of finding duplicate texts from a large number of texts based on a certain similarity model. It is widely used in search engine construction, plagiarism detection, news classification and other fields. The traditional text plagiarism check is to judge whether the similarity between the target text and the source text is greater than a threshold, so as to draw the conclusion whether the target text is a duplicate text. [0004] However, this method of plagiarism checking needs to calculate the similarity between ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/33G06F17/22
CPCG06F40/194
Inventor 刘均秦文礼
Owner LAUNCH TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products