Method and device for obtaining web page information, and computer-readable medium

A web page information and acquisition method technology, applied in the computer field, can solve problems such as low web page information acquisition efficiency, achieve the effect of improving acquisition efficiency and reliability, and ensuring normal execution

Pending Publication Date: 2018-12-18
SHANGHAI SHENGPAY E PAYMENT SERVICE CO LTD
View PDF4 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When the web crawler system wants to continue crawling webpage information after restarting, it needs to find the URL to be crawled again and load the URL to be crawled into memory, resulting in low efficiency in obtaining webpage information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for obtaining web page information, and computer-readable medium
  • Method and device for obtaining web page information, and computer-readable medium
  • Method and device for obtaining web page information, and computer-readable medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The application will be further described in detail below in conjunction with the drawings.

[0024] In a typical configuration of this application, the terminal, the equipment of the service network, and the trusted party all include one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0025] The memory may include non-permanent memory in computer readable media, random access memory (RAM) and / or non-volatile memory, such as read only memory (ROM) or flash memory (flashRAM). Memory is an example of computer readable media.

[0026] Computer-readable media includes permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic rand...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

It is an object of the present application to provide a method and device for obtaining web page information, and a computer-readable medium By putting the network crawler queue containing the UniformResource Locator (URL) to be crawled into the in-memory database before web page collection, the invention avoids the problem that the URL stored in the memory will disappear when the network crawlersystem needs to be restarted, and can ensure that after the network crawler system is restarted, the URL to be crawled can be quickly read from the network crawler queue of the memory database, and the normal execution of the network crawler system can be ensured. By adopting the content analysis tool to extract the web page content information from the obtained web page, the web page content iscleaned, and finally the web page content information is stored, and the web page content information is put into the warehouse, thereby improving the obtaining efficiency and reliability of the web page content information.

Description

Technical field [0001] This application relates to the computer field, and in particular to a method, device and computer-readable medium for obtaining webpage information. Background technique [0002] At present, when crawling web page information, the web crawler system usually stores the uniform resource locator (URL) to be crawled in the memory. When the web crawler system needs to be restarted, the URL stored in the memory to be crawled will disappear. When the web crawler system wants to continue crawling web page information after restarting, it needs to find the URL to be crawled and load the URL to be crawled into the memory, which results in low efficiency in obtaining web page information. Summary of the invention [0003] One purpose of this application is to provide a method, device and computer-readable medium for obtaining webpage information. [0004] According to one aspect of the present application, there is provided a method for obtaining web page information....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 孟祥祥陈冲
Owner SHANGHAI SHENGPAY E PAYMENT SERVICE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products