Mass web data mining method based on Hadoop

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A data mining and massive technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as management that ignores the processing speed of massive data

Inactive Publication Date: 2015-07-29

INSPUR GROUP CO LTD

View PDF3 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

With the rise of "cloud computing" technology, the existing data mining methods are integrated with the "cloud computing" platform to improve the efficiency of data mining, but the current research on data mining is mainly focused on improving the effectiveness of the mining system, while Ignoring the management of the processing speed of massive data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0025] The present invention will be further explained with reference to the drawings.

[0026] A method of massive web data mining based on Hadoop:

[0027] Set up a data mining environment: The Hadoop platform consists of 6 Proud PR2310N servers, of which the NameNode in HDFS and the JobTracker in MapReduce are served by one server, and the remaining 5 serve as computing nodes and data storage nodes. The test data set comes from the server log of the web server room of Antfang Software. The test program is developed using the Eclipse for Java developer platform;

[0028] ① Data mining job submission: users submit jobs written based on the MapReduce programming specification;

[0029] ②Task assignment: Calculate the required number of Map tasks M and Reduce tasks R, and assign the Map tasks to the task execution node TaskTracker; at the same time assign the corresponding TaskTracker to execute the Reduce task; the specific process is: the job control node JobTracker according to the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a mass web data mining method based on Hadoop, and belongs to the field of computer data processing. A genetic algorithm is fused with the MapReduce of Hadoop, and mass Web data in a Hadoop-based distributed file storage system (HDFS) is mined to further verify the high efficiency of a platform, and a preferred access route of a user in a Web log is mined with a fused algorithm on the platform. As proved by an experiment result, the efficiency of Web data mining can be remarkably increased by processing of a large amount of Web data with a distributed algorithm in Hadoop.

Description

Technical field [0001] The invention discloses a method for mining massive web data, which belongs to the field of computer data processing, in particular to a method for mining massive web data based on Hadoop. Background technique [0002] In response to the rapid growth of the current Web data scale, the computing power of a single node is no longer competent for the analysis and processing of large-scale data. In recent years, with the rise of "cloud computing" technology, people have turned their attention to the storage and processing of massive data. Emerging technology. The biggest advantage of the Hadoop "cloud computing" platform is that it implements the idea of "computing close to storage". The traditional "move data close to computing" model has too much system overhead when the data scale reaches massive amounts, while "mobile computing close to storage" The large overhead of network transmission of massive data can be saved, and processing time can be greatly re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

Inventor王之滨孙海峰崔乐乐

OwnerINSPUR GROUP CO LTD

Mass web data mining method based on Hadoop

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology