Unlock instant, AI-driven research and patent intelligence for your innovation.

Similar web page detection method, device, storage medium and electronic equipment

A web page, similarity rate technology, applied in the field of text recognition, can solve problems such as harming the interests of original authors and the impact of plagiarized content on websites

Active Publication Date: 2020-01-10
BEIJING BYTEDANCE NETWORK TECH CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Plagiarism of website content submissions is not uncommon. It is also a normal situation in online communities that multiple websites have similar content. This not only damages the interests of original authors, but also causes certain damage to websites that cannot identify plagiarized content. influences
Therefore, there is a need for a method to detect the similarity of texts on the entire network, so that plagiarism can be identified for submissions, so as to avoid the phenomenon that the submissions are plagiarized from other website content but cannot be detected

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Similar web page detection method, device, storage medium and electronic equipment
  • Similar web page detection method, device, storage medium and electronic equipment
  • Similar web page detection method, device, storage medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0066] Specific embodiments of the present disclosure will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present disclosure, and are not intended to limit the present disclosure.

[0067] figure 1 It is a flowchart of a method for detecting similar web pages according to an exemplary embodiment of the present disclosure. Such as figure 1 As shown, the method includes steps 101 to 104.

[0068] In step 101, a first preset number of target sentences is selected in the target text. When detecting web pages similar to the target text, you can first select some sentences from the target text to search, which will consume much less time than searching the entire target text, thereby improving the efficiency of similar web page detection. The value range of the first preset number should preferably be less than the total number of all sent...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a similar webpage detection method and device, a storage medium and electronic equipment. The method comprises the steps that selecting a first preset number of target sentences from a target text; searching each target sentence by using a second preset number of search engines, and selecting a third preset number of target webpages from the search result according to a second preset rule; obtaining webpage text information in all the target webpages; and calculating a matching rate of the target text and the webpage text information, and determining the webpage of which the matching rate is greater than a first preset threshold as a webpage similar to the target text. Therefore, the to-be-identified target text can be paragraph; searching by utilizing a search engine to obtain a target webpage with similar content with the target text; the text information in the target webpage is matched with the target text, so that the webpage similar to the target text isdetected, and whether the target text is plagiarized with other webpage contents or not can be easily detected.

Description

technical field [0001] The present disclosure relates to the field of text recognition, and in particular, relates to a similar web page detection method, device, storage medium and electronic equipment. Background technique [0002] Plagiarism of website content submissions is not uncommon. It is also a normal situation in online communities that multiple websites have similar content. This not only damages the interests of original authors, but also causes certain damage to websites that cannot identify plagiarized content. influences. Therefore, there is a need for a method for detecting the similarity of texts across the entire network, so that plagiarism can be identified for submissions, so as to avoid the phenomenon that the submissions are plagiarized from other website content but cannot be detected. Contents of the invention [0003] The purpose of this disclosure is to provide a similar web page detection method, device, storage medium and electronic equipment,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/953G06F16/33
Inventor 邹启波
Owner BEIJING BYTEDANCE NETWORK TECH CO LTD