Web page acquisition method and device
A web page acquisition and web page technology, applied in the field of network processing, can solve problems such as low web page coverage and inability to effectively obtain web page information, and achieve the effect of precise processing operations and increasing web page coverage.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0162] Corresponding to the method provided in Embodiment 1 of a webpage acquisition method of the present application, see Figure 4 , the present application also provides Embodiment 1 of a device for obtaining a webpage. In this embodiment, the device may include:
[0163] The first determining module 401 is configured to determine the first hub webpage in the crawled webpages.
[0164] The parsing module 402 is configured to parse the page-turning information contained in the first hub web page, and the page-turning information includes a page-turning link address.
[0165] The address generating module 403 is configured to generate a second hub webpage address related to the first hub webpage according to the page turning information.
[0166] The address of the second hub webpage may be addresses of all hub webpages related to the first hub webpage, or addresses of a preset number of hub webpages related to the first hub webpage.
[0167] Therefore, the address generat...
Embodiment 2
[0172] Corresponding to the method provided in Embodiment 2 of a webpage acquisition method of the present application, see Figure 5 , the present application also provides Embodiment 2 of a device for obtaining a webpage. In this embodiment, the device may specifically include:
[0173] The first determining module 501 is configured to determine the first hub webpage in the crawled webpages.
[0174] The parsing module 502 is configured to parse the page-turning information contained in the first hub webpage, and the page-turning information includes a page-turning link address.
[0175] Wherein, the parsing module 502 may include:
[0176] The parsing sub-module 5021 is configured to parse the webpage content of the first hub webpage, and determine the content of the page-turning area with page-turning keywords and repeated link content in the webpage content.
[0177] The page-turning information determining module 5022 is configured to determine page-turning information...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


