Unlock instant, AI-driven research and patent intelligence for your innovation.

Web crawler thesis duplicate checking method

A web crawler and paper technology, applied in the field of web crawler, can solve the problems of large workload of checking, large amount of articles, low accuracy rate, etc.

Inactive Publication Date: 2016-04-06
上海尧博信息科技有限公司
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, college graduates must complete a qualified graduation thesis before they can successfully graduate from the university. Tutors need to check and evaluate the graduation thesis. In this process, plagiarism checking is the most important, and the workload is also a big one. Task, the traditional plagiarism check method is to upload student papers on the plagiarism check website, the system compares the uploaded articles with the articles in the paper database, and obtains a series of articles with high similarity, which is manually checked. There is no refinement of web search information, the output of articles is large, the workload of verification is large, and the accuracy rate is low. We can quickly and accurately obtain target information from the Internet for web crawler technology. It just meets the needs of our papers for plagiarism checking. For this reason, we propose a method for plagiarism checking of web crawler papers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web crawler thesis duplicate checking method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] A method for checking plagiarism of web crawler papers, the specific steps are as follows:

[0018] The first step is to upload the papers that need to be checked for plagiarism. The uploaded papers that need to be checked for plagiarism can be in various formats such as WORD and PDF.

[0019] The second step is to extract the keyword groups in the paper. The keyword groups are phrases, paragraphs and symbols that appear frequently in the article. There is also an input window here, and I can perform manual operations.

[0020] In the third step, the server searches for article URLs related to the keywords of the uploaded articles, and the initial webpage information of the URLs of the relevant information articles is retrieved, and there will be no information window to display the information.

[0021] The fourth step is to grab articles similar to the uploaded articles in the paper information database.

[0022] The fifth step is to compare the captured article with...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a web crawler thesis duplicate checking method, which comprises the following specific steps: 1) uploading a thesis which needs to be subjected to duplicate checking; 2) extracting a key word group in the thesis; 3) searching articles which contain information associated with the key words of the uploaded article by a server; 4) grabbing the articles which are similar to the uploaded article in a thesis information base; 5) comparing the grabbed articles with the uploaded article to obtain a similarity; and 6) displaying the articles in a gradient according to the similarity. The web crawler thesis duplicate checking method firstly intelligently extracts keywords in the thesis when a web crawler technology is adopted to carry out duplicate checking on the thesis, the keywords are utilized to compare the similarity between the articles in the information base and the uploaded article, an amount of information of the thesis information base is large, great manpower needs to be consumed if users want to match the articles similar to the uploaded article, but the user may not find integral similar articles, the method refines duplicate checking thesis information, and a convenient and effective method is provided for the user to carry out thesis duplicate checking.

Description

technical field [0001] The invention relates to the technical field of web crawlers, in particular to a plagiarism checking method for web crawler papers. Background technique [0002] A web crawler is a program that automatically extracts web pages. It downloads web pages from the World Wide Web for search engines and is an important component of search engines. The crawler starts from the URL of one or several initial webpages, obtains the URLs on the initial webpage, and continuously extracts new URLs from the current page and puts them into the queue during the process of crawling webpages, until a certain stop condition of the system is met. A web crawler is a program or script that automatically grabs information on the World Wide Web according to certain rules. Other less commonly used names include ant, autoindex, emulator, or worm. At present, college graduates must complete a qualified graduation thesis before they can successfully graduate from the university. T...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 姚王平
Owner 上海尧博信息科技有限公司