Body-bused subject type network reptile system configuration method
A web crawler and construction method technology, which is applied in the field of theme-based web crawler system construction, can solve problems such as topic correlation evaluation deviation, high computational overhead and high-dimensional data maintenance, difficulty in describing topics or page content, etc., to achieve accuracy and Improved work efficiency and high intelligence
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0023] Such as figure 1 As shown, the web crawler system constructed by the method of the present invention includes a basic crawler working module, a topic relevance evaluation module and an ontology management system module. Among them, the subject correlation degree evaluation module also includes preprocessing and correlation degree calculation sub-modules.
[0024] The method process of the present invention is as figure 2 As shown, the following details:
[0025] Step (1): By parsing the HTML file of the current web page, the text information of the main content therein is separated.
[0026] Step (2): Preprocessing the separated text information. Here we usually count the number of occurrences of each keyword in the current document according to the keyword list preset by the system N(w i ).
[0027] Step (3): According to the keyword set corresponding to each ontology class in the ontology database, calculate the class frequency of the ontology class in the curre...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 