Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Web Page Classification Method Based on Distributed Computing

A technology of distributed computing and webpage classification, applied in computing, special data processing applications, instruments, etc., can solve the problem of low efficiency of classification algorithms, achieve the effect of improving classification accuracy and accuracy

Active Publication Date: 2016-10-19
TONGJI UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, with the exponential growth of network information, the processing speed of most traditional web page classification algorithms cannot cope with the growth rate of information in the network, so the efficiency of many classification algorithms in practical applications is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Web Page Classification Method Based on Distributed Computing
  • A Web Page Classification Method Based on Distributed Computing
  • A Web Page Classification Method Based on Distributed Computing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The process of web page classification algorithm is as follows: figure 1 shown. Webpage classification algorithm includes two processes of classification model establishment and webpage classification. The establishment of the classification model mainly includes: preprocessing the web pages in the training set; calculating the TFIDF of the category feature words according to the web page data; calculating the association relationship between the feature words; and calculating the position information of the feature words in the document. Among them, TFIDF is the weight calculation method used in the traditional naive Bayesian classification model, and the relationship and location information are the calculation contents added in the present invention. The web page classification process includes: preprocessing of web pages; calculating the posterior probability of categories according to the classification model; establishing and updating the dynamic lexicon. Finall...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a webpage classification algorithm based on distributed computation. The webpage classification algorithm based on distributed computation comprises the following steps: step 1, building of a classification model, consisting of (1) pretreatment of webpage, (2) associated information of feature words, and (3) position information of feature words; step 2, classification process of webpage, consisting of (1) pretreatment of webpage, (2) class computation of webpage, and (3) dynamic lexicon; according to the webpage classification algorithm based on distributed computation provided by the invention, network information growing exponentially in realistic network can be coped, and the information treatment speed is ensured to be obviously improved along with the increase of cluster quantity in the distributed system, so that the webpage classification algorithm based on distributed computation has a great application prospect.

Description

technical field [0001] The invention relates to the classification of web pages in the field of information service network. Background technique [0002] In recent years, with the popularity of the Internet, network information has grown exponentially, and the Internet has developed into a huge global information service network with sites all over the world. It has become an important means for people to search and obtain information. In the face of such massive and complicated network information, it is often impossible to accurately locate the information they want. People are facing the contradiction of "information explosion" and "knowledge poverty". Methods and means of extracting refined knowledge that meets the requirements. Through the classification of web pages, the information that users are interested in can be quickly and accurately obtained from the massive network information, so it can deal with the problem of "poor knowledge" caused by the complexity of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3335G06F16/353
Inventor 蒋昌俊陈闳中闫春钢丁志军王鹏伟孙海春邓晓栋王昕
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products