Mass web data mining method based on Hadoop

A data mining and massive technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as management that ignores the processing speed of massive data

Inactive Publication Date: 2015-07-29
INSPUR GROUP CO LTD
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the rise of "cloud computing" technology, the existing data mining methods are integrated with the "cloud computing" platform to improve the efficiency of data mining, b...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass web data mining method based on Hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The present invention will be further explained with reference to the drawings.

[0026] A method of massive web data mining based on Hadoop:

[0027] Set up a data mining environment: The Hadoop platform consists of 6 Proud PR2310N servers, of which the NameNode in HDFS and the JobTracker in MapReduce are served by one server, and the remaining 5 serve as computing nodes and data storage nodes. The test data set comes from the server log of the web server room of Antfang Software. The test program is developed using the Eclipse for Java developer platform;

[0028] ① Data mining job submission: users submit jobs written based on the MapReduce programming specification;

[0029] ②Task assignment: Calculate the required number of Map tasks M and Reduce tasks R, and assign the Map tasks to the task execution node TaskTracker; at the same time assign the corresponding TaskTracker to execute the Reduce task; the specific process is: the job control node JobTracker according to the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mass web data mining method based on Hadoop, and belongs to the field of computer data processing. A genetic algorithm is fused with the MapReduce of Hadoop, and mass Web data in a Hadoop-based distributed file storage system (HDFS) is mined to further verify the high efficiency of a platform, and a preferred access route of a user in a Web log is mined with a fused algorithm on the platform. As proved by an experiment result, the efficiency of Web data mining can be remarkably increased by processing of a large amount of Web data with a distributed algorithm in Hadoop.

Description

Technical field [0001] The invention discloses a method for mining massive web data, which belongs to the field of computer data processing, in particular to a method for mining massive web data based on Hadoop. Background technique [0002] In response to the rapid growth of the current Web data scale, the computing power of a single node is no longer competent for the analysis and processing of large-scale data. In recent years, with the rise of "cloud computing" technology, people have turned their attention to the storage and processing of massive data. Emerging technology. The biggest advantage of the Hadoop "cloud computing" platform is that it implements the idea of ​​"computing close to storage". The traditional "move data close to computing" model has too much system overhead when the data scale reaches massive amounts, while "mobile computing close to storage" The large overhead of network transmission of massive data can be saved, and processing time can be greatly re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 王之滨孙海峰崔乐乐
Owner INSPUR GROUP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products