Method and system for modeling Web in weblog excavation

A log and network technology, applied in the field of Web user modeling, can solve problems such as performance degradation, and achieve the effects of increased accuracy, good convergence effect, and improved effect

Inactive Publication Date: 2011-11-23
BEIJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As the dimensionality of the raw input data for processing tasks increases dramatically, the performance of existing Web user modeling techniques also gradually decreases.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for modeling Web in weblog excavation
  • Method and system for modeling Web in weblog excavation
  • Method and system for modeling Web in weblog excavation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] Embodiment 1. A Web modeling method in network log mining

[0053] Such as figure 1 As shown, this embodiment mainly includes the following steps:

[0054] Step S110, preprocessing the network log to obtain a credible network log; the preprocessing mainly includes data cleaning, user identification and session identification; the data cleaning includes filtering pictures in web pages, filtering dynamic web pages and click rate lower than preset The webpage that sets the click threshold.

[0055] In this embodiment, the preset click-through rate threshold for a web page is 2, and a web page with a click-through rate lower than the click-through rate threshold generally reflects the user's transient action, but cannot represent the user's attention and browsing interest.

[0056] Step S120, according to the user's access interest and the trusted web log, select the characteristic web page and segment the URL, and establish a user browsing access pattern matrix based on ...

Embodiment 2

[0078] Embodiment 2, a Web modeling system in network log mining

[0079] to combine figure 1 Examples shown, such as figure 2 The shown embodiment mainly includes a preprocessing module 210, a first building module 220, a second building module 230 and a pre-extraction module 240, wherein:

[0080] A preprocessing module 210, configured to preprocess the network log to obtain a trusted network log;

[0081] The first building module 220 is connected with the preprocessing module 210, and is used for selecting characteristic web pages and segmenting URLs according to the user's access interest and the credible web log, and establishing a user browsing access pattern matrix based on a weighted random index method;

[0082] The second building module 230, connected with the first building module 220, is used to optimize the clustering of the user's browsing access pattern matrix using a clustering algorithm based on chaotic ant colony optimization, and mark the user's categor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for modeling a web in weblog excavation to fulfill the aim of improving website service quality. The method comprises the following steps of: pre-processing a weblog so as to acquire a credible weblog; selecting a characteristic webpage and sectioning a website according to access interests of users and the credible weblog, and building a user browsing access mode array based on a weighted random index method; performing optimal clustering on the user access mode array by using a clustering algorithm based on chaotic ant swarm optimization, marking categories of users according to predetermined category labels, and building user public files; and extracting the web with a pre-fetching probability exceeding a predetermined pre-fetching probability threshold value and storing the web in a buffer memory of a server according to the user public files and the predetermined pre-fetching probability threshold value. Compared with conventional per-fetching technology, the method has the advantage that the accuracy rate is greatly improved.

Description

technical field [0001] The invention relates to Web user modeling technology, in particular to a Web modeling method and system in network log mining. Background technique [0002] With the rapid development and popularization of the Internet, the contradiction between the rapid growth of information and the limited attention of people is increasing, and network users are increasingly concerned about how to find the most suitable information in the shortest time. Operators of various websites also increasingly hope to understand the activities of visitors on the website, mine customer activity information from the data ocean of the huge user group, improve the website structure according to the user's browsing mode, so as to improve the quality of Web services, and ultimately Realize the personalized recommendation of the website, so as to provide users with better services. [0003] In order to facilitate the application of web log mining, it is necessary to formalize the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 李丽香彭海朋沈红斌钮心忻
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products