Intelligent web page classifier based on user behaviors

A webpage classification and classifier technology, applied in the field of intelligent learning, can solve problems that are not suitable for scientific literature and do not consider user search behavior interaction

Inactive Publication Date: 2008-07-02
SHANGHAI XINSHENG ELECTRONICS TECH
View PDF0 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the disadvantage is that it does not take into account the interaction with user search b

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent web page classifier based on user behaviors

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0009] In the vector space model, text generally refers to various machine-readable records, represented by D (Document). The feature item T (Term) refers to the basic language unit that appears in the document D and can represent the content of the document. It is mainly composed of words or phrases. The text can be expressed as D(T 1 ,T 2 ,...,Tn), where Tk is a feature item, k∈1,2,...,N. For a text containing n feature items, a certain weight is usually given to each feature item to indicate its importance. That is, D=D(T1, W1; T2, W2;..., Tn, Wn), abbreviated as D=D(W1, W2,..., Wn), which is a vector representation of the text, where Wk is the weight of Tk , k ∈ 1, 2, ..., N. In the vector space model, two arbitrary texts D i and D j The content correlation Sim(D i ,D j ) is represented by the cosine value of the angle between commonly used vectors, and the formula is:

[0010] Sim ( Di , Dj ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An intelligent web page classification device based on the user's behaviors: (1) Perform background input with an initial classification sample group for training, so as to gain a clustering center of each classification in the characteristic space. (2) Receive a URL input by the user input before catching and analyzing corresponding pages on the background; then, output texts with index value in the page. Moreover, extract a characteristic set according to the user-input content and web page contents; then, perform feedback modification for the characteristic space in the initial classification sample group and adjust the characteristic weight factor in the vector space. (3) Use the user-selected classification device to perform automatic classification for the texts in the previous step of the created texts and output results. When the user executes a search, the classification device can automatically determine the classification of each result and perform gradual adjustment for the classification device; the more times the user executes the search, the more accurate the classification of the web page classification device will be, so as to help different users effectively reduce the size of the set of the search result before locating necessary information more accurately.

Description

technical field [0001] The invention relates to a technology for intelligently classifying webpages, in particular, the intelligent learning of classifiers is carried out in combination with user behavior characteristics and webpage content. Background technique [0002] Existing webpage classifiers mainly include two categories: manual classification. For example, YAHOO's directory search is to manually classify the files stored in the local database. Although the classification accuracy is high, the efficiency is very low, the update speed is slow, and the maintenance workload is heavy. Automatic classification. Using a computer system to replace manual classification of web pages mainly includes two implementation methods: a classifier based on knowledge engineering and a classifier based on statistics. The former mainly relies on linguistic knowledge and needs to compile a large number of inference rules as classification knowledge. The search results are very accurat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 蔡阳波陈勇
Owner SHANGHAI XINSHENG ELECTRONICS TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products