Parallelized method for defective text classification of power equipment
A text classification and power equipment technology, which is applied in text database clustering/classification, unstructured text data retrieval, electronic digital data processing, etc., to achieve the effect of reducing time consumption and improving reliability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
comparative approach 1
[0085] Comparison scheme 1: tfidf representation + naive Bayesian;
comparative approach 2
[0086] Comparison scheme 2: tfidf means +SVM;
comparative approach 3
[0087] Comparison scheme 3: word2vec+SVM based on general prediction training;
[0088] The present invention: word2vec+SVM based on domain prediction training;
[0089] Table 1 Comparison of classification results of different schemes
[0090]
[0091] Through the comparison of the above results, it can be found that the scheme based on word2vec+SVM is generally better than other schemes. Among them, the word2vec vector based on domain corpus training can better adapt to the classification task of this scene than that based on general corpus.
[0092] In order to verify the improvement in running speed of the parallelized algorithm, we divided the data set into 200K, 20M, 500M, and 1G scales. For parallelism based on the Spark framework, it is considered that each executor has a fixed number of cores, and the number of cores directly leads to the number of parallel tasks in each executor. Therefore, the more total execution cores set here, the more the parallelism of th...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


