A method and device for web page information update discovery and statistics

A web page information, web page technology, applied in computing, special data processing applications, instruments, etc., can solve the problems of inability to track statistics of news sources, inability to obtain new news sources, etc.

Active Publication Date: 2017-11-28
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention provides a method and device for web page information update discovery and statistics aiming at the deficiencies of the prior art, which solves the problem that traditional RSS subscription news cannot obtain new news sources, and cannot track and count news sources. And estimate the update cycle and update frequency of news sources in the process of analyzing news sources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for web page information update discovery and statistics
  • A method and device for web page information update discovery and statistics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0034] Referring to the accompanying drawings, a method for web page information update discovery and statistics, the steps are as follows:

[0035] (1) Grab and input the total domain name to be searched by grabbing the URL module, such as: the steps of www.news.baidu.com;

[0036] (2) The news link module determines whether the webpage link is the step of the news link, if it is then proceeds to step (3), otherwise proceeds to the step (1), the step of judging whether the URL is the news link is as follows:

[0037] (21) Judge whether the captured web page link address string (www.news.baidu.com) contains the "news" keyword by regular expression matching, if so, go to step (22), otherwise continue to judge and record number of links;

[0038] (22) Grab the webpage interface it points to through the webpage link, and judge whether all the ch...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for updating, discovering and counting webpage information, which belongs to the field of computer applications and solves the problem that traditional RSS subscription news cannot obtain new news sources and track and count news sources. Concrete steps are as follows: (1) grab the step of URL module to grab webpage link; (2) news link module judges whether webpage link is the step of news link, if then go to step (3), otherwise go to step (1 ); (3) the news source module determines whether the web page link is the step of the news source, if it is then proceeds to step (4), otherwise proceeds to step (1); (4) the statistical news information module obtains the update time of the web page, calculates Steps in the update cycle of the newsfeed. The invention can dynamically capture and analyze the obtained news sources.

Description

technical field [0001] A method and device for discovering and counting webpage information updates, used for dynamically grabbing and analyzing obtained news sources, belonging to the field of computer applications. Background technique [0002] A web crawler is a program that automatically extracts web pages. It downloads web pages from the World Wide Web for search engines and is an important component of search engines. Traditional crawlers start from the URL of one or several initial webpages, obtain the URLs on the initial webpage, and continuously extract new URLs from the current page and put them into the queue during the process of crawling webpages until a certain stop condition of the system is met. The work flow of the focused crawler is relatively complicated. It needs to filter links that have nothing to do with the topic according to a certain webpage analysis algorithm, keep useful links and put them into the URL queue waiting to be crawled. Then, it will s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/9566
Inventor 张小松戴中印牛伟纳王标何永强
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products