RSS (Really Simple Syndication)-based multi-thread graphic information synchronization crawling control method and system
A control method and information synchronization technology, applied in the field of web crawlers, can solve problems such as lack of semantic information, lack of pertinence, difficulty in supporting semantic information query, etc., and achieve the effect of improving access efficiency, enhancing effectiveness, and matching user needs
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0058] The purpose of the present invention is: propose a kind of multi-threaded webpage information synchronous grasping method based on RSS technology, construct focusing crawler, carry out classification acquisition to the picture in the webpage, word information by breadth-first strategy, to utilize hyperlink information weight contribution, Improve the search strategy of the web crawler, effectively filter and extract, and maximize the matching and speed. Especially for the picture information data that traditional crawlers cannot solve well, carry out targeted analysis and processing, and ensure that the picture and text information is effectively synchronized and real-time acquisition, so that the information capture is more perfect.
[0059] In order to achieve the above object, the technical solution adopted by the present invention is: for the network information that needs to be grabbed, analyze the different characteristics of text and pictures, analyze the XML file...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 