Construction and network resource collection method of self-adaption network resource collection system
A network resource and collection system technology, applied in the construction of an adaptive network resource collection system and in the field of network resource collection, can solve problems such as limited scope of application and poor scalability, and achieve the effect of strong versatility and high scalability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0041] In this embodiment, the user needs to grab a kind of software project data of the Apache Lucene project: the SVN version library.
[0042] Firstly, a network resource collection system is constructed, and a unified network resource collection module is configured. The network resource collection module includes a unified crawler distribution device and at least one crawler execution unit to be called. The reptile distribution unit consists of:
[0043] The initial unit is used for preprocessing before grabbing information, including checking the validity of the SVN data interface, creating a file directory for storing data, writing log information, obtaining idle sub-threads for grabbing tasks, and creating resources for grabbing Task records, etc.
[0044] The collection unit is used to select different crawler programs to collect the data of the target network resource according to the data type of the target network resource. The specific steps include: finding the ...
Embodiment 2
[0055] In this embodiment, the user needs to grab another type of software project data of the Apache Lucene project: the user mailing list.
[0056] The difference between this embodiment and the first embodiment lies in the configuration of dependent modules, and the rest of the steps are the same as those of the first embodiment. When configuring the dependency module, first write a crawler execution unit for the mailing list according to the unified crawler interface, and add the dependency of the user mailing list resource in the crawler dependency module, as follows:
[0057]
[0058] Wherein, ApacheMailListCrawler is a crawler program capable of grabbing mail information from a mailing list management page under the Apache website.
[0059] The above embodiments are the general process of capturing certain software-related data by the method and the system constructed in the present invention, which can be applied to other data sets that have a clear data interface a...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com