System and method for real-time intelligent capturing of article
An article and intelligent technology, which is applied in the field of Internet technology to capture technology, can solve the problems of inability to accurately extract articles, low usability of captured articles, and consumption of network hardware resources, so as to improve news coverage and real-time performance, and improve Coverage and real-time performance, fast approximate weight-removal effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0080] The grabbing system consists of 5 modules or subsystems, such as figure 1 shown. Including: real-time crawling module, web page extraction system, document approximation deduplication module, document automatic classification module, and article publishing module.
[0081] The overall data flow of the system is as follows: figure 2 As shown, the specific steps are as follows:
[0082] Step 1, submit a job or a bunch of jobs to the real-time capture module of the system; the real-time capture module can be mainly divided into two main steps: a jobs analysis scheduling module and a crawler download module (task download module);
[0083] Step 2, the jobs parsing and scheduling module of the real-time crawling module is responsible for explaining each job to several rules stipulated by the cost system. These rules specify the specific crawling logic of the crawler module in the next step; A job schedule is distributed to a suitable server to achieve faster job capture ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com