A large data set clustering method based on MapReduce
A data set and data technology, applied in the field of big data processing, can solve problems such as low accuracy rate, inability to effectively dig out hidden information, and clustering method calculation overhead can not meet the actual needs, etc., to achieve the effect of performance improvement
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0033] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
[0034] The application principle of the present invention will be further described below in conjunction with the accompanying drawings.
[0035] Such as figure 1 As shown, the MapReduce-based large-scale data set clustering method provided by the embodiment of the present invention includes the following steps:
[0036] S101: Input and format conversion of raw data; Hadoop defines three input data formatting methods: TextInputFormat, KeyValueInputFormat and SequenceFileInputFormat, the data for cluster analysis is in the form of high-dimensional vector, select SequenceFileInputFormat; call the InputDriver class that comes...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


