Student browsed webpage classification method

A technology for browsing webpages and classification methods, applied in network data navigation, network data retrieval, instruments, etc., can solve problems such as reducing classification accuracy

Active Publication Date: 2017-12-22
HUAIYIN INSTITUTE OF TECHNOLOGY
View PDF15 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method does not use N-Gram to increase the connection between words and words, which will reduce the accuracy of classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Student browsed webpage classification method
  • Student browsed webpage classification method
  • Student browsed webpage classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0076] The present invention will be further explained below in conjunction with the accompanying drawings and specific embodiments.

[0077] Step 1: Crawl the URL, URL description content, URL primary classification and URL secondary classification from the navigation website, and save them in the URL collection, build a four-category corpus, and express the URL description content text in the corpus as uni-gram and In the form of bi-gram, use TF-IDF as the weight of the text feature, and use the naive Bayesian classification algorithm to obtain the classifier, specifically as figure 2 Shown:

[0078] Step 1.1: Define textual stop word set SWORD={sword 1 ,sword 2 ,...,sword num}, among them, sword swi is the swi-th stop word, and nun is the total number of stop words; define the Naive Bayesian smoothing parameter Alpha, where Alpha∈(0,1); define four categories of the corpus, namely entertainment and leisure, computer network, and life Service and Cultural Education, G ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a student browsed webpage classification method based on N-Gram and a naive Bayesian classifier. The method comprises the specific implementation steps that first, URL description information is crawled from a navigation website, four classification corpora are constructed, corpus texts are expressed in the forms of uni-gram and bi-gram, TF-IDF is used as a weight of text characteristics, and a naive Bayesian classification algorithm is used to construct the classifier; and URLs in student browsed records are segmented according to set rules, URL categories are determined through matching of the classifier and a URL category base, and if the URL categories determined through the classifier conform to set confidence, the URL categories are added into the URL category base. Through the method, the URLs in the student browsed records are effectively classified, and therefore the webpage recognition rate and the classification accuracy rate are increased.

Description

technical field [0001] The invention belongs to the field of webpage classification, in particular to a method for classifying webpages browsed by students based on N-Gram and naive Bayesian classifiers. Background technique [0002] The classification method of students' web browsing plays an important role in the analysis of students' interest in surfing the Internet. Traditional taxonomic corpora have only a small vocabulary of web page titles. Therefore, it is necessary to find a classification corpus suitable for students to browse, and use the classifier combined with the URL category library to jointly determine the URL category. [0003] From 2009 to 2017, Zhu Quanyin and others gave the method of Web text processing and push (Li Xiang, Zhu Quanyin. Collaborative filtering recommendation for joint clustering and scoring matrix sharing. Computer Science and Exploration. 2014.Vol.8(6 ):751-759; Suqun Cao, Quanyin Zhu, Zhiwei Hou. Customer Segmentation Based on a Nove...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9535G06F16/954G06F16/955
Inventor 肖绍章朱全银李翔钱凯于柿民潘舒新瞿学新唐海波邵武杰高阳江丽萍
Owner HUAIYIN INSTITUTE OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products