Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Web log data preprocessing method

A preprocessing and log technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of inaccurate web log data, and achieve the effect of improving accuracy and credibility

Inactive Publication Date: 2012-03-28
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF3 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1. Web log mining may only mine part of the useful data in the Web log data. To this end, it is necessary to extract useful data and find ways to eliminate the noise in the Web log data;
[0005] 2. The requests of multiple users through the proxy have the same mark in the log, that is, the IP address of the proxy server, resulting in inaccurate web log data;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web log data preprocessing method
  • Web log data preprocessing method
  • Web log data preprocessing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0025] figure 1 It is the overall frame diagram of the specific implementation method of the preprocessing method of Web log data in the present invention.

[0026] Such as figure 1 As shown, the user accesses the website through the Internet, and the website server stores the user's access information as Web log data in the Web log database. In this embodiment, the default rule base is first used to clean up the Web log data, and useless information, that is, unnecessary records are deleted, and the update of the rules is completed by modifying the default rule base; User identification is carried out based on formula rules, and the Web log data is grouped according to different users; finally, the session identification of users is completed by considering the comprehensive consideration of the home page and navigation page, and necessary path supplements are made to obtain the final user access Web The session sequence of the page completes the preprocessing of the web lo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Web log data preprocessing method for Web log mining systems, which comprises the following steps of: firstly, deleting useless information (namely, needless records) by using a default rule library, and completing the cleaning of Web log data through correcting the rule library; then, solving user identification problems caused by agencies and firewalls by using a heuristic rule, and completing the session identification on users through comprehensively considering home pages and navigation pages; and finally, based on a web page reference relation, supplementing access paths so as to obtain a final page access sequence of each user accessing Web pages, thereby completing the preprocessing of Web log data. Compared with the traditional session identification implemented according to simple time thresholds, by using the method provided by the invention, the accuracy and reliability of access behavior mining of users are obviously improved.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence in computer networks, and more specifically relates to a method for preprocessing Web log data in a Web log mining system. Background technique [0002] In order to solve the problems of information overload brought about by the rapid development of science and technology, data mining technology came into being. Entering the 21st century, the Internet has spread all over the world. Driven by the specific needs in the network environment, a new research field - Web mining has emerged. According to different purposes and data sources of Web mining, existing Web mining technologies can be divided into Web content mining, Web structure mining, and Web usage mining. [0003] Web log mining is the most commonly used Web usage mining technology. Web log mining is to use the idea of ​​data mining to analyze and process Web server logs. In order to optimize the organizational structure ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 孙健隆克平李志谢发川黄悦
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products