Unlock instant, AI-driven research and patent intelligence for your innovation.

Parallelized method for defective text classification of power equipment

A text classification and power equipment technology, which is applied in text database clustering/classification, unstructured text data retrieval, electronic digital data processing, etc., to achieve the effect of reducing time consumption and improving reliability

Inactive Publication Date: 2018-05-11
ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER COMPANY +2
View PDF3 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Since the process is based on the Spark parallel framework in order to achieve efficient calculations for large data input forms, and the SVM classification algorithm package in the platform mllib is a binary classifier, it is difficult to perform multi-classification scenarios encountered in this scenario. deal with

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallelized method for defective text classification of power equipment
  • Parallelized method for defective text classification of power equipment
  • Parallelized method for defective text classification of power equipment

Examples

Experimental program
Comparison scheme
Effect test

comparative approach 1

[0085] Comparison scheme 1: tfidf representation + naive Bayesian;

comparative approach 2

[0086] Comparison scheme 2: tfidf means +SVM;

comparative approach 3

[0087] Comparison scheme 3: word2vec+SVM based on general prediction training;

[0088] The present invention: word2vec+SVM based on domain prediction training;

[0089] Table 1 Comparison of classification results of different schemes

[0090]

[0091] Through the comparison of the above results, it can be found that the scheme based on word2vec+SVM is generally better than other schemes. Among them, the word2vec vector based on domain corpus training can better adapt to the classification task of this scene than that based on general corpus.

[0092] In order to verify the improvement in running speed of the parallelized algorithm, we divided the data set into 200K, 20M, 500M, and 1G scales. For parallelism based on the Spark framework, it is considered that each executor has a fixed number of cores, and the number of cores directly leads to the number of parallel tasks in each executor. Therefore, the more total execution cores set here, the more the parallelism of th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a parallelized method for defective text classification of power equipment. The parallelized method includes the steps of adding a field lexicon to a user dictionary, preprocessing defect cases, and performing word segmentation and removing stop words; using a crawler algorithm to collect text corpora of power grid fault cases, using word2vec of Spark for training and obtaining word vector representations in the field; performing vectorized representation on the acquired defect cases and word vectors, expressing the defect cases in text, and forming a matrix; inputtingthe matrix into an SVM multi-classifier, conduct training and classification, and obtaining classification results.

Description

technical field [0001] The invention relates to a parallelized electric equipment defect text classification method. Background technique [0002] The text classification algorithm mainly includes four steps: preprocessing, text feature extraction, text representation, and classification calculation. Among them, the preprocessing steps of Chinese text mainly include word segmentation and stop word removal; text feature extraction mainly includes methods based on word frequency statistics represented by tfidf and textrank and methods based on topic models represented by lda; text representation mainly includes methods that do not consider context The one-hot method of context and the method based on word2vec; the final classification step can be considered for general classification mining algorithms. In the text classification task of a specific field, the main problem to be considered is to combine the language and professional characteristics of the field, and make corres...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/242G06F40/284
Inventor 杨祎宇文梦柯王智翔白德盟辜超郭志红陈玉峰闫丹凤李贞林颖李程启秦佳峰郑文杰李娜
Owner ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER COMPANY