Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and system for obtaining web page content

A technology of webpage content and acquisition method, applied in the field of information retrieval, can solve problems such as waste of resources, redundant crawling, and inability to achieve real-time information acquisition, and achieve the effect of relieving pressure, saving workload, and increasing the ability to acquire real-time information

Active Publication Date: 2015-09-09
CHINA TELECOM CORP LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, search engines mostly use crawlers to obtain web page information. As the links contained in web pages are continuously transmitted and obtained, the efficiency of information acquisition is reduced and a lot of resources are wasted.
Moreover, due to the huge amount of visits in this way of obtaining web pages, the update of the web page content at the same location cannot be obtained at the first time, and it is basically impossible to realize the presentation of real-time information
[0004] Specifically, the following problems that exist in the crawler crawling method have seriously affected the real-time acquisition of web page information: (1) crawler acquisition requires a large number of redundant crawling of irrelevant or repeated webpages, and the efficiency is very low; (2) It is impossible to obtain the content update of the webpage in real time; (3) In order to obtain the information of the same webpage, it is necessary to visit the webpage repeatedly, which puts a huge pressure on the server and bandwidth
[0005] It can be seen that the traditional web page information acquisition method cannot realize the acquisition of real-time information, so it cannot meet the needs of actual use

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for obtaining web page content
  • Method and system for obtaining web page content
  • Method and system for obtaining web page content

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention are illustrated. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention, but do not constitute an improper limitation of the present invention.

[0031] A very difficult problem in realizing real-time search is to find and obtain updated data of users from a large amount of network information. In order to be able to grasp the update of the content in the webpage at the first time, the webpage content acquisition method based on the reporting trigger condition of the present invention actively reports the information of the webpage when the webpage meets the reporting trigger condition, so that the search platform can acquire the webpage content. For example, when a blogger updates the content of an article of the day, the webpage will actively report infor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a system for obtaining webpage content, wherein the method comprises the following steps that: judging if a webpage in a website satisfies a reporting trigger condition; when the webpage satisfies the reporting trigger condition, reporting webpage information through the website; and arranging a crawler to capture the webpage content from the webpage by an allocation server according to the reported webpage information. In the method and the system for obtaining the webpage content, when the webpage satisfies the reporting trigger condition, the webpage information is reported; and the crawler captures the webpage content from the appointed webpage according to the webpage information. Through the method, the work amount of the crawler is reduced; the pressure of the target website is relieved; the ability of obtaining real-time information is improved; and a favorable condition is provided for real-time search.

Description

technical field [0001] The present invention relates to the field of information retrieval, and more specifically, to a method and system for acquiring web page content. Background technique [0002] With the emergence of a large number of blogs, microblogs and other websites on the Internet, users have high requirements for real-time acquisition of network content, and the management of sudden massive information pushes the acquisition of real-time information to the top. most important position. [0003] At present, search engines mostly use crawlers to obtain web page information. As the links contained in the web page are continuously transmitted and obtained, the efficiency of information acquisition is reduced and a lot of resources are wasted. Moreover, because of the huge amount of visits in this way of obtaining web pages, the update of the web page content at the same location cannot be obtained at the first time, and the presentation of real-time information cann...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): H04L29/06H04L29/08G06F17/30
Inventor 王爱宝张涛李屹杨德利
Owner CHINA TELECOM CORP LTD