Mass web data mining method based on Hadoop

A data mining and massive technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as management that ignores the processing speed of massive data
CN104809231AInactive Publication Date: 2015-07-29INSPUR GROUP CO LTD

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
INSPUR GROUP CO LTD
Publication Date
2015-07-29
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
Patent Text Reader

Abstract

The invention discloses a mass web data mining method based on Hadoop, and belongs to the field of computer data processing. A genetic algorithm is fused with the MapReduce of Hadoop, and mass Web data in a Hadoop-based distributed file storage system (HDFS) is mined to further verify the high efficiency of a platform, and a preferred access route of a user in a Web log is mined with a fused algorithm on the platform. As proved by an experiment result, the efficiency of Web data mining can be remarkably increased by processing of a large amount of Web data with a distributed algorithm in Hadoop.
Need to check novelty before this filing date? Find Prior Art

Description

Technical field

[0001] The invention discloses a method for mining massive web data, which belongs to the field of computer data processing, in particular to a method for mining massive web data based on Hadoop. Background technique

[0002] In response to the rapid growth of the current Web data scale, the computing power of a single node is no longer competent for the analysis and processing of large-scale data. In recent years, with the rise of "cloud computing" technology, people have turned their attention to the storage and processing of massive data. Emerging technology. The biggest advantage of the Hadoop "cloud computing" platform is that it implements the idea of ​​"computing close to storage". The traditional "move data close to computing" model has too much system overhead when the data scale reaches massive amounts, while "mobile computing close to storage" The large overhead of network transmission of massive data can be saved, and processing time can be greatly re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More