A method and device for web page information update discovery and statistics

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A web page information, web page technology, applied in computing, special data processing applications, instruments, etc., can solve the problems of inability to track statistics of news sources, inability to obtain new news sources, etc.

Active Publication Date: 2017-11-28

UNIV OF ELECTRONICS SCI & TECH OF CHINA

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The present invention provides a method and device for web page information update discovery and statistics aiming at the deficiencies of the prior art, which solves the problem that traditional RSS subscription news cannot obtain new news sources, and cannot track and count news sources. And estimate the update cycle and update frequency of news sources in the process of analyzing news sources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0033] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0034] Referring to the accompanying drawings, a method for web page information update discovery and statistics, the steps are as follows:

[0035] (1) Grab and input the total domain name to be searched by grabbing the URL module, such as: the steps of www.news.baidu.com;

[0036] (2) The news link module determines whether the webpage link is the step of the news link, if it is then proceeds to step (3), otherwise proceeds to the step (1), the step of judging whether the URL is the news link is as follows:

[0037] (21) Judge whether the captured web page link address string (www.news.baidu.com) contains the "news" keyword by regular expression matching, if so, go to step (22), otherwise continue to judge and record number of links;

[0038] (22) Grab the webpage interface it points to through the webpage link, and judge whether all the ch...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method and device for updating, discovering and counting webpage information, which belongs to the field of computer applications and solves the problem that traditional RSS subscription news cannot obtain new news sources and track and count news sources. Concrete steps are as follows: (1) grab the step of URL module to grab webpage link; (2) news link module judges whether webpage link is the step of news link, if then go to step (3), otherwise go to step (1 ); (3) the news source module determines whether the web page link is the step of the news source, if it is then proceeds to step (4), otherwise proceeds to step (1); (4) the statistical news information module obtains the update time of the web page, calculates Steps in the update cycle of the newsfeed. The invention can dynamically capture and analyze the obtained news sources.

Description

technical field [0001] A method and device for discovering and counting webpage information updates, used for dynamically grabbing and analyzing obtained news sources, belonging to the field of computer applications. Background technique [0002] A web crawler is a program that automatically extracts web pages. It downloads web pages from the World Wide Web for search engines and is an important component of search engines. Traditional crawlers start from the URL of one or several initial webpages, obtain the URLs on the initial webpage, and continuously extract new URLs from the current page and put them into the queue during the process of crawling webpages until a certain stop condition of the system is met. The work flow of the focused crawler is relatively complicated. It needs to filter links that have nothing to do with the topic according to a certain webpage analysis algorithm, keep useful links and put them into the URL queue waiting to be crawled. Then, it will s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F17/30

CPCG06F16/9566

Inventor张小松戴中印牛伟纳王标何永强

OwnerUNIV OF ELECTRONICS SCI & TECH OF CHINA

A method and device for web page information update discovery and statistics

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology