Webpage classification method for semi-supervised multi-view learning

A web page classification, multi-view technology, applied in the Internet field, can solve the problem of web page classification errors, not fully considering all the information, etc.

Active Publication Date: 2019-11-05
GUANGDONG UNIV OF PETROCHEMICAL TECH
View PDF6 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Although the above-mentioned methods have realized the classification of web pages, as time goes by, only part of the information of the multi-view data is used, and all the information between views, within views, and between classes and within classes are not fully considered, which will lead to web page classification. Classification error

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage classification method for semi-supervised multi-view learning
  • Webpage classification method for semi-supervised multi-view learning
  • Webpage classification method for semi-supervised multi-view learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0085] The present embodiment provides a web page classification method for semi-supervised multi-view learning, including:

[0086] Step S1: Obtain data from the webpage and establish a training set;

[0087] Wherein, the training set includes a marked training set and an unmarked training set;

[0088] The marked training set is a data set that has undergone information recognition;

[0089] The unlabeled training set is a data set without information identification;

[0090] Step S2: train the classifier through the labeled training set, and use the verification set to calculate the accuracy of the classifier;

[0091] Step S3: Encoding the marked training set and the unmarked training set through the trained classifier to obtain sample features;

[0092] Step S4: Perform density clustering on the sample features to obtain clustering results;

[0093] Step S5: Classify the samples of the unlabeled training set according to the clustering results;

[0094] Step S6: If t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of Internet, in particular to a webpage classification method for semi-supervised multi-view learning which comprises the following steps of: obtaining data from a webpage, and establishing a training set; training a classifier through the marked training set; encoding the marked training set and the unmarked training set through a trained classifier toobtain sample features; performing density clustering on the sample features to obtain a clustering result; and classifying the samples of the unmarked training set according to a clustering result.According to the scheme, the marked training set is used for training the classifier; orthogonal constraints and adversarial similarity constraints are added on the basis of an existing multi-view classification method, density clustering marking is carried out on all data in a training set through a trained classifier, finally, accuracy verification is carried out on the classifier, and the classification performance of the classifier can be improved through multiple iterations of the process.

Description

technical field [0001] The present invention relates to the technical field of the Internet, and more specifically, to a method for classifying webpages of semi-supervised multi-view learning. Background technique [0002] Computer technology is changing with each passing day, and the Internet has become an indispensable part of human society. With the rapid development of the mobile Internet and web2.0, the number of web pages on the Internet has grown explosively in the past few decades. The increasing amount of information on the Internet has brought more difficulties to the research of Web Information Retrieval and Analysis (TSIRA), and has higher requirements for Web Information Retrieval and Analysis. Webpage classification plays an important role in webpage information retrieval and analysis. How to classify a large number of webpages more quickly and accurately so that users can find the information they need more easily has become a difficult problem in this field....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/958G06K9/62G06N3/04G06N3/08
CPCG06F16/958G06N3/088G06N3/045G06F18/2321G06F18/2155G06F18/24137
Inventor 荆晓远贾晓栋訾璐黄鹤姚永芳彭志平
Owner GUANGDONG UNIV OF PETROCHEMICAL TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products