Website classification method and device

A classification method and technology of a classification device, which are applied in the computer field, can solve the problems of coarse classification of clustering algorithms, poor classification results of machine learning algorithms, and inability to meet actual needs, etc., and achieve the effect of accurate website classification and small workload.

Active Publication Date: 2018-11-23
BEIJING KNOWNSEC INFORMATION TECH +1
View PDF10 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the categories generated by the clustering algorithm are relatively coarse, which may not meet the actual needs. Similarly, if the content of the web page is small, the classification result of the machine learning algorithm is very poor.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Website classification method and device
  • Website classification method and device
  • Website classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. The components of the embodiments of the application generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

[0046] Accordingly, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of the application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

[0047] It should be noted that like ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention provide a website classification method and device. The method comprises the following steps of: obtaining a to-be-classified website; crawling a page text and a keyword in the to-be-classified website; calculating an occurrence frequency, in the keyword, of each preset website category label so as to obtain a first classification result set, wherein each website category label comprises a label name and synonyms thereof; inputting the page text and the keyword into a pre-configured Bayesian classification model so as to obtain a second classification result set, wherein the second classification result set comprises a predicted probability value of each website category label, and training samples of the Bayesian classification model are obtained through website crawling; and outputting a classification result on the basis of the first classification result set and the second classification result set. By adoption of the method and device, manual processing is not needed, the workload is small, and correct website classification can be realized when webpage text data is less.

Description

technical field [0001] The present application relates to the field of computer technology, in particular, to a website classification method and device. Background technique [0002] The website classification methods in the prior art mainly generate training samples by manual labeling, extract features based on web page content, and use machine learning algorithms to learn training samples to obtain a Bayesian classification model, thereby realizing website classification. However, the above method requires manual labeling of training samples, which is a huge workload, and if the content of the webpage is small, the error of the classification result of the machine learning algorithm is relatively large. [0003] In addition, there is also a method of crawling a large number of websites, generating training samples through clustering algorithms and manual annotation, thereby extracting features based on web page content and using machine learning algorithms to learn traini...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F18/24155G06F18/214
Inventor 蔡自彬刘哲理叶金辉梁爽
Owner BEIJING KNOWNSEC INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products