Spark-based multi-feature combined efficient Chinese text clustering method
A clustering method and multi-feature technology, applied in the field of machine learning, can solve the problems of not considering semantic similarity, increase of computational complexity and time complexity, loss of semantic information, etc., achieve good text clustering effect and reduce computing cost and time cost, the effect of reducing complexity
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0085] combine figure 1 , a Spark-based multi-feature combined Chinese text efficient clustering method, the specific implementation steps include:
[0086] Step 1: Build the Spark platform and HDFS file system on the physical server;
[0087] Step 2: Upload the original text data set to the HDFS file system, use the ICTCLAS Chinese word segmentation system and the Hadoop parallel computing platform to perform parallel word segmentation processing on the original text data set, and re-upload it to the HDFS file system;
[0088] Step 3: The Spark platform reads the word-divided data set from the HDFS file system, converts it into an elastic distributed data set RDD, and starts a certain number of concurrent data sets according to the number of partitions in the RDD set in the user program. The thread reads the data and stores it in system memory;
[0089] Step 4: According to the interdependence between the partitions in the RDD, the Spark job scheduling system splits the wri...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com