Method for classifying Chinese webpages based on keyword frequency analysis
A webpage classification and frequency analysis technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of non-standard webpage writing, time cost and high complexity of webpage classification, and achieve extensive significance and application value Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Examples
Embodiment Construction
[0027] A method for classifying Chinese webpages based on keyword frequency analysis is to perform fuzzy matching of Chinese webpage classification according to the Chinese classification thesaurus according to the keywords of the analyzed Chinese webpages, and the steps are as follows:
[0028] 1) Obtain the HTML source code of the Chinese webpage according to the website URL entered by the user, filter and denoise the acquired source code, and extract the Chinese text in the webpage; the purpose is to preprocess all kinds of encoded Chinese webpages, remove Noise information irrelevant to the topic, including redundant information such as various tags, script language codes, advertisement and image links, designer comments, function declarations, and copyright information. Noise information that has nothing to do with the topic will have a great impact on the speed and accuracy of extracting the content of the webpage text, so it is necessary to remove it.
[0029] 2) Carry ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More