The invention relates to a
web page automatic classification method based on a
training set. A classification process is the combination of methods of characteristic selection, characteristic
weight value determination, text vector comparison, and the like. The automatic classification method based on a classification
system mainly classes a document to be classified into a corresponding sort according to a beforehand established sort model, namely a
training set. Along with the development of the
multimedia technique, the content forms of
web page information are also rich and colorful, and contents not only comprise text information but also comprise much structural information and other form information, such as sound, figures, images, and the like. However, because web pages based on texts still possess larger proportions, the classification based on
web page texts still takes the precedence. The method has reliable theoretical support and favorable
extensibility and accuracy and is easy to be in
butt joint with application interfaces correlative to an operator.