Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Hadoop-based mass Web data mining genetic method

A data mining and massive technology, applied in the Hadoop-based massive Web data mining genetic field, can solve problems such as loose coupling of data contexts, achieve the effects of overcoming disadvantages, improving mining efficiency, and high execution efficiency

Inactive Publication Date: 2018-09-04
山东爱城市网信息技术有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Through the above expression, MapReduce has realized the division of data, but the premise is that the data context is loosely coupled

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop-based mass Web data mining genetic method
  • Hadoop-based mass Web data mining genetic method
  • Hadoop-based mass Web data mining genetic method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The content of the present invention is described in more detail below:

[0027] The expression of the present invention is as follows:

[0028] Step 1 Data segmentation processing. According to the characteristics of web data, web data is segmented, such as web log files are segmented by user and access date, and transmitted to different sub-nodes, and user-defined support S is obtained at the same time.

[0029] Step 2 initializes the population. Each sub-node uses Map and Reduce operations under the Hadoop platform to convert the data set into a 1-itemset form of a preferred sub-path that meets the user-defined support degree, which is used as the initial population of the genetic algorithm.

[0030] Step 3: Calculation of fitness value. The frequency of an access path is used to measure whether it is the user's preferred access path. Therefore, the fitness function is defined as follows:

[0031]

[0032] Among them, S' is the access frequency of the path. I...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Hadoop-based mass Web data mining genetic method and belongs to the field of data mining and analysis. According to the Hadoop-based mass Web data mining genetic method, a genetic algorithm is fused with a MapReduce and is used for performing Web data analysis in a Hadoop cluster environment. Experimental results show that a platform can obtain implicit information with apractical value and is high in execution efficiency, and can not only improve the mining efficiency but also overcome the disadvantage of a network environment.

Description

technical field [0001] The invention relates to data mining and data analysis technology, in particular to a Hadoop-based massive Web data mining genetic method. Background technique [0002] At present, with the rapid expansion of data scale, the computing power of a single node can no longer meet the requirements of large-scale data analysis and processing, and the "cloud computing" technology that can be used for massive data storage and processing has emerged as the times require. "Cloud Computing" is Internet-based computing in which shared resources, software and information, etc. are provided to computers and devices in an on-demand manner. With the help of powerful computing resources in the network, "cloud computing" technology distributes complex calculations that consume a large amount of computing resources to multiple nodes for calculation through the network, which is currently an effective solution. The Internet is the world's largest data collection, and Web...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06N3/12
CPCG06N3/126
Inventor 王利鑫
Owner 山东爱城市网信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products