Network information batch acquisition method of combined text and picture information

A technology of picture information and collection method, which is applied in the Internet field, can solve the problems of incorrect identification of redundant codes on information pages, no processing, and no processing of pictures, etc., so as to avoid repeated development, improve efficiency, and improve development efficiency.

Active Publication Date: 2014-07-16
FOCUS TECH
View PDF4 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method does not deal with abnormal situations in the collection process, cannot correctly identify redundant codes in information pages, and does not process pictures in web pages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network information batch acquisition method of combined text and picture information
  • Network information batch acquisition method of combined text and picture information
  • Network information batch acquisition method of combined text and picture information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

[0051] Such as figure 1 , the processing flow of a method for batch collection of network information combining text and picture information in this embodiment includes:

[0052] Step 11. Determine the website that needs to collect information, and determine the specific URLs of the information list pages that need to be collected in the website, and the number of pages of these list pages.

[0053] Among them, multiple websites can be selected for batch collection of information. During the peak time of Internet access, it is set to the serial collection method, that is, after the information collection of one website is completed, the information collection of the next website starts. During the trough time of Internet access, it is set to parallel collection mode, that is, to collect information from multiple websites at the same...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a network information batch acquisition method of combined text and picture information. Through a series of configuration, according to the method, target network information can be acquired, replication removal of the target network information can be achieved, and the target network information can be stored into a database and can be sent to a place designated by a client according to a format designated by a client. The method includes the steps that websites in need of information acquisition and specific URLs and the page number of information list pages are determined; according to the URLs of the list pages, and a common part is found out and stored in list configuration information; when information acquisition is conducted, a system reads URL common part information in the list configuration information, serial number information of all the list pages is obtained according to the total number of the list pages, and therefore the URLs of all the list pages to be acquired of a target network are combined; detailed page content capturing is conducted according to detailed page link addresses stored in a linking library to be captured; processing of pictures in detailed page contents is conducted; after the detailed page contents are captured, content data are led out to a designated interface.

Description

technical field [0001] The invention is applied in the technical field of the Internet, and relates to a batch collection method of network information combining text and picture information. technical background [0002] With the rapid development of the Internet, a large amount of various information has accumulated on the Internet, such as news information, potential customer information, price information of competitive products, real-time financial information, statistical reports, industry analysis reports, supply and demand information, etc. For enterprises, the analysis of these information combined with the internal business data of the enterprise has a great auxiliary effect on the business decision-making of the enterprise. The content of the enterprise website is also very helpful to improve the experience of visitors. [0003] Now there are many tools that can realize the collection of webpage content, but the collection method of text information is the main m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/958
Inventor 唐宇波夏平嵩
Owner FOCUS TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products