Network information monitoring and analyzing system
A network information and analysis system technology, applied in the field of network information monitoring and analysis system, can solve the problems of inability to guarantee the timeliness and authority of information, low speed and efficiency of retrieval, and many "garbage" information, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Examples
Embodiment 1
[0028] Embodiment 1 About the network information collection and analysis of the automobile industry
[0029] (1) The "network information collection subsystem" collects the automobile channel URL, anchor text, and web pages of automobile industry websites or portal websites.
[0030] (2) Clean the collected web pages, eliminate the interference of noise content in the web pages, and take the subject content of the web pages as the processing object to improve the accuracy of the processing results; secondly, simplify the complexity of the label structure in the web pages and reduce the complexity of the web pages. size, thereby saving the time and space overhead of subsequent processing.
[0031] (3) "Intelligent analysis and pre-categorization subsystem" classifies the web pages collected in the system, and filters useless information according to the threshold
[0032] (4) "Automatic Summary and Retrieval Subsystem" completes the functions of in-station retrieval and autom...
Embodiment 2
[0033] Implementation process of embodiment 2 network information collection subsystem
[0034] In order to realize the automatic collection function of network information, we divide the entire processing process of the network information collection subsystem into four steps: initial URL selection, web page collection, web page preprocessing, and data storage. The main workflow of this subsystem is as follows: firstly, the Spider collects web pages from the Web according to the initial URL selection and theme definition, and then preprocesses the collected pages, and then sends the results to the specified database for storage.
[0035] (1) Selection of initial URL
[0036] A general web page collection system starts from a set of seed URLs and expands to the required pages on the Web through the Web protocol. The information collection system needs to select a high-quality theme URL as the initial seed URL. In this embodiment, the seed URL set is manually defined, and the...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More