Web page classification method based on von Mises-Fisher probability model

A probabilistic model and webpage classification technology, applied in the field of Internet and machine learning, can solve the problems of expanding the scope, unable to store all knowledge, and no clear rules to follow, etc., to achieve high classification accuracy, high efficiency, and high webpage classification accuracy. Effect

Active Publication Date: 2016-05-04
BEIJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] There are two problems at present: on the one hand, the grammar so far is limited to the analysis of an isolated sentence, and there is still a lack of systematic research on the constraints and influences of the context and the conversation environment on this sentence. There are no clear rules to follow, such as the different meanings of the same sentence in different occasions or by different people, and it is necessary to strengthen linguistic research to gradually solve it.
On the other hand, people understand a sentence not only by grammar, but also by using a lot of relevant knowledge, including life knowledge and professional knowledge, which cannot be stored in the computer
Therefore, a written comprehension system can only be established within a limited vocabulary, sentence patterns and specific topics; only after the computer's storage capacity and operating speed are greatly improved can it be possible to appropriately expand the scope

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page classification method based on von Mises-Fisher probability model
  • Web page classification method based on von Mises-Fisher probability model
  • Web page classification method based on von Mises-Fisher probability model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

[0018] The present invention provides a method for classifying webpages based on the vonMises-Fisher probability model, adopting a vonMises-Fisher probability model that has not been used in the field of natural language processing, and performing feature extraction and modeling on the preprocessed webpage text content, Classification based on the obtained probability density function achieves higher classification accuracy of web pages, and has the advantages of high efficiency and high classification accuracy. The von Mises-Fisher probability model is also called von Mises-Fisher probability model, see reference [1]: Sra, S. 'A short note on parameter approximation for von Mises-Fisher distributions: Andafast implementation of Is(x)'. Computational Statistics 27:177-190.

[0019] Implementation platform: Python

[0020] The web page classification ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a web page classification method based on a von Mises-Fisher probability model, and belongs to the technical field of the Internet and machine learning. The method comprises the following steps: at first, carrying out data preprocessing, feature extraction and feature screening on a training sample, modeling, and then, substituting a feature vector to a web page to be classified in the model to realize final classification. The web page classification method disclosed by the invention is used for carrying out two-norm normalization on the obtained feature vector to prepare for modeling the von Mises-Fisher model while eliminating the influence of a text length on the feature vector; and the von Mises-Fisher probability model is used for modeling the text feature vector, and the model is applied to the field of natural language processing for the first time.

Description

technical field [0001] The invention belongs to the technical field of Internet and machine learning, relates to natural language processing, in particular to a method for classifying webpages based on text content. Background technique [0002] Natural language processing technology research can realize various theories and methods for effective communication between humans and computers using natural language. A Chinese text or a string of Chinese characters (including punctuation marks, etc.) may have multiple meanings. It is a major difficulty and obstacle in natural language understanding. Therefore, there is a many-to-many relationship between the form (string) of natural language and its meaning. But from a computer processing point of view, we have to disambiguate. The widespread existence of ambiguity requires a lot of knowledge and reasoning to eliminate them, which brings great difficulties to the methods based on linguistics and knowledge. Therefore, the resea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 马占宇黄迪周环宇
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products