Massive high-dimension data clustering method for MapReduce platform
Patent Information
- Authority / Receiving Office
- CN ยท China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- FUDAN UNIV
- Publication Date
- 2013-02-27
- Estimated Expiration
- Not applicable ยท inactive patent
Smart Images
Figure 1 Figure 2 Figure 3
Abstract
Description
technical field
[0001] The invention belongs to the technical field of cloud computing and data mining, and in particular relates to a method for clustering massive high-dimensional data using a MapReduce distributed computing framework. Background technique
[0002] The analysis of high-dimensional data has always been a difficult problem in data mining. When the dimension reaches a certain height, many clustering methods that are effective for low-dimensional data are no longer applicable. For massive high-dimensional data, analysis and mining are more related to the limitations of memory and hard disk.
[0003] In recent years, research on MapReduce and its open-source version Hadoop has been very active. Many stand-alone algorithms are re-implemented on Hadoop, which provides high availability and scalability for various algorithms to process massive data.
[0004] Mahout is an open source project based on Hadoop under Apache, which provides the implementation of some ...