A Web Page Classification Method Based on Semi-Supervised Multi-view Learning

A web page classification, multi-view technology, applied in the Internet field, can solve the problem of not fully considering all the information, web page classification errors, etc.

Active Publication Date: 2020-04-17
GUANGDONG UNIV OF PETROCHEMICAL TECH
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Although the above-mentioned methods have realized the classification of web pages, as time goes by, only part of the information of the multi-view data is used, and all the information between views, within views, and between classes and within classes are not fully considered, which will lead to web page classification. Classification error

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Web Page Classification Method Based on Semi-Supervised Multi-view Learning
  • A Web Page Classification Method Based on Semi-Supervised Multi-view Learning
  • A Web Page Classification Method Based on Semi-Supervised Multi-view Learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0086] The present embodiment provides a web page classification method for semi-supervised multi-view learning, including:

[0087] Step S1: Obtain data from the webpage and establish a training set;

[0088] Wherein, the training set includes a marked training set and an unmarked training set;

[0089] The marked training set is a data set that has undergone information recognition;

[0090] The unlabeled training set is a data set without information identification;

[0091] Step S2: train the classifier through the labeled training set, and use the verification set to calculate the accuracy of the classifier;

[0092] Step S3: Encoding the marked training set and the unmarked training set through the trained classifier to obtain sample features;

[0093] Step S4: Perform density clustering on the sample features to obtain clustering results;

[0094] Step S5: Classify the samples of the unlabeled training set according to the clustering results;

[0095] Step S6: If t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to the technical field of the Internet, and more specifically, relates to a webpage classification method for semi-supervised multi-view learning, comprising: obtaining data from webpages and establishing a training set; training a classifier through a marked training set; The machine codes the marked training set and the unmarked training set to obtain the sample features; performs density clustering on the sample features to obtain the clustering results; and classifies the samples of the unmarked training set according to the clustering results. This scheme uses the labeled training set to train the classifier, adds orthogonal constraints and confrontational similarity constraints on the basis of the existing multi-view classification method, and then uses the trained classifier to perform density aggregation on all the data in the training set. Finally, the accuracy of the classifier is verified, and the above-mentioned process can be iterated multiple times to improve the classification performance of the classifier.

Description

technical field [0001] The present invention relates to the technical field of the Internet, and more specifically, to a method for classifying webpages of semi-supervised multi-view learning. Background technique [0002] Computer technology is changing with each passing day, and the Internet has become an indispensable part of human society. With the rapid development of the mobile Internet and web2.0, the number of web pages on the Internet has grown explosively in the past few decades. The increasing amount of information on the Internet has brought more difficulties to the research of Web Information Retrieval and Analysis (TSIRA), and has higher requirements for Web Information Retrieval and Analysis. Webpage classification plays an important role in webpage information retrieval and analysis. How to classify a large number of webpages more quickly and accurately so that users can find the information they need more easily has become a difficult problem in this field....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/958G06K9/62G06N3/04G06N3/08
CPCG06F16/958G06N3/088G06N3/045G06F18/2321G06F18/2155G06F18/24137
Inventor 荆晓远贾晓栋訾璐黄鹤姚永芳彭志平
Owner GUANGDONG UNIV OF PETROCHEMICAL TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products