Unlock instant, AI-driven research and patent intelligence for your innovation.

Web page acquisition method and device

A web page acquisition and web page technology, applied in the field of network processing, can solve problems such as low web page coverage and inability to effectively obtain web page information, and achieve the effect of precise processing operations and increasing web page coverage.

Active Publication Date: 2017-11-03
人民日报媒体技术股份有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved in this application is to provide a web page acquisition method to solve the technical problems in the prior art that the web page coverage rate is low and web page information cannot be obtained effectively

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page acquisition method and device
  • Web page acquisition method and device
  • Web page acquisition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0162] Corresponding to the method provided in Embodiment 1 of a webpage acquisition method of the present application, see Figure 4 , the present application also provides Embodiment 1 of a device for obtaining a webpage. In this embodiment, the device may include:

[0163] The first determining module 401 is configured to determine the first hub webpage in the crawled webpages.

[0164] The parsing module 402 is configured to parse the page-turning information contained in the first hub web page, and the page-turning information includes a page-turning link address.

[0165] The address generating module 403 is configured to generate a second hub webpage address related to the first hub webpage according to the page turning information.

[0166] The address of the second hub webpage may be addresses of all hub webpages related to the first hub webpage, or addresses of a preset number of hub webpages related to the first hub webpage.

[0167] Therefore, the address generat...

Embodiment 2

[0172] Corresponding to the method provided in Embodiment 2 of a webpage acquisition method of the present application, see Figure 5 , the present application also provides Embodiment 2 of a device for obtaining a webpage. In this embodiment, the device may specifically include:

[0173] The first determining module 501 is configured to determine the first hub webpage in the crawled webpages.

[0174] The parsing module 502 is configured to parse the page-turning information contained in the first hub webpage, and the page-turning information includes a page-turning link address.

[0175] Wherein, the parsing module 502 may include:

[0176] The parsing sub-module 5021 is configured to parse the webpage content of the first hub webpage, and determine the content of the page-turning area with page-turning keywords and repeated link content in the webpage content.

[0177] The page-turning information determining module 5022 is configured to determine page-turning information...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present application provides a web page acquisition method and device, the method comprising: determining the first central hub web page in the captured web page; parsing out the page turning information contained in the first hub web page, the page turning information It includes a page turning link address; according to the page turning information, a second hub webpage address related to the first hub webpage is generated; and a content webpage is obtained according to the second hub webpage address. Through the embodiment of the present application, the coverage rate of the webpage when obtaining the webpage is improved, so that more comprehensive network information can be obtained.

Description

technical field [0001] The present application relates to the technical field of network processing, in particular to a method and device for acquiring webpages. Background technique [0002] With the development of Internet technology, the amount of Internet information is increasing, and the update speed is also increasing. Therefore, how to obtain Internet information in a timely and comprehensive manner to provide better network services has become the focus of people's increasing research. [0003] In the field of network services such as network search, public opinion monitoring, and network mining, Internet information is obtained through web page acquisition. By obtaining the content web page, the Internet information carried by the content web page can be obtained. In the prior art, when web pages are obtained, usually by grabbing hub (center) web pages, that is, web pages whose content is based on the web page link address as the core, and then polling and grabbing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 于维纬刘卓
Owner 人民日报媒体技术股份有限公司