Method for classifying Chinese webpages based on keyword frequency analysis

A web page classification and frequency analysis technology, applied in special data processing applications, instruments, electrical digital data processing and other directions, can solve the problems of no specification of web page writing, high time cost and complexity of web page classification, and achieve broad meaning and application value. Effect
CN101593200AInactive Publication Date: 2009-12-02HUAIHAI INST OF TECH

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
HUAIHAI INST OF TECH
Publication Date
2009-12-02
Estimated Expiration
Not applicable · inactive patent
Patent Text Reader

Abstract

The invention relates to a method for classifying Chinese webpages based on keyword frequency analysis. According to the analyzed keywords of the Chinese webpages and a Chinese classification subject thesaurus, the fuzzy matching of the classification of the Chinese webpages is carried out; and through the obtained HTML source code of the webpage, the webpage is pretreated. Through the testing and analysis, a regular expression filter is utilized to filter noise information; a Chinese text of the webpage is extracted; then, through a word classifier and a keyword frequency analyzer, the extracted Chinese text information is subjected to word classification; through the weighed ranking of the word in the text and fuzzy classification algorithm of the webpage, the class ranking of the class which the webpage keyword belongs to is obtained; and the keywords ranking in the several tops are selected and subjected to calculation of membership rate to obtain the fuzzy matching result of the class which the webpage belongs to. The method is favorable for organizing mass information on the Internet with high efficiency and is used for interestingness analysis of Internet users, catalogue updating of search engines, mining of Web contents, online document management, and digital library construction.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention is aimed at the research of the keyword frequency analysis of Chinese webpage and the webpage classification method based on the keyword frequency analysis, and mainly studies how to filter and extract the content of the Chinese webpage through technical means, word segmentation and frequency analysis of webpage keywords, It also studies how to classify webpages by weighted Chinese webpage keywords, involving technical fields such as automatic webpage acquisition, Chinese webpage preprocessing, Chinese word segmentation and keyword frequency analysis, and fuzzy classification of Chinese webpages. Background technique

[0002] With the rapid development of Internet technology and Web technology, the number of web pages on the Internet is constantly increasing. The increase of network information greatly facilitates people to obtain information, but the excessive amount of information also brings a lot of difficulties for people to ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More