Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Theme webpage crawling method and theme crawler system

A theme crawler, theme technology, applied in special data processing applications, instruments, electronic digital data processing and other directions, can solve the problem that general search engines cannot meet the personalized needs of users, and achieve the effect of good user experience

Active Publication Date: 2018-12-07
JILIN UNIV
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the commonly used search engines such as Google and Baidu are all general search engines. This type of search engine tries to obtain all the resources on the Internet. However, people's needs are diverse. Get the webpage content with the specified topic, but general search engines cannot meet the personalized needs of users

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Theme webpage crawling method and theme crawler system
  • Theme webpage crawling method and theme crawler system
  • Theme webpage crawling method and theme crawler system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0057] The embodiment of the present invention provides a method for crawling theme webpages, which is used to crawl webpages related to a specified theme, please refer to figure 1 , showing a schematic flow chart of the web page crawling method, which may include:

[0058] Step S101: Obtain uncrawled links from the first set of links to be crawled.

[0059] Wherein, the first set of links to be crawled includes pre-acquired seed links.

[0060...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a theme webpage crawling method and a theme crawler system. The method comprises the steps of obtaining un-crawled links from a first to-be-crawled link set comprising seed links; determining first correlation degree and second correlation degree corresponding to target webpages corresponding to the obtained links, wherein the first correlation degree and the second correlation are the correlation degree between target text content in the target webpages and an appointed theme and between target links and the appointed theme; determining temperature values of the targetwebpages according to the first correlation degree and the second correlation, and storing to-be-displayed content of the target webpages; placing the target links into a second to-be-crawled link setif the temperature values of the target webpages are greater than or equal to a preset temperature value; and obtaining the un-crawled links of which correlation degree with the appointed theme is the highest, from the second to-be-crawled link set and crawling the obtained un-crawled links if un-obtained links do not exist in the first to-be-crawled link set. According to the method and the system, a user can obtain a great number of webpages correlated to the appointed theme from a network.

Description

technical field [0001] The present invention relates to the technical field of webpage crawling, in particular to a method for crawling theme webpages and a theme crawler system. Background technique [0002] With the rapid development of the Internet, people have ushered in an era of information explosion, and all kinds of information are flooding all aspects of life. In order to facilitate the acquisition of information, search engines have emerged. People can quickly retrieve information on many web pages through search engines, and search engines have improved the efficiency of people's access to information. At present, commonly used search engines such as Google, Baidu, etc. are all general search engines. This type of search engine tries to obtain all the resources on the Internet. However, people's needs are various. The content of webpages with specified topics can be obtained, but general search engines cannot meet the personalized needs of users. Contents of th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 彭涛包铁徐凯旋张雪松王上
Owner JILIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products