A text data stream classification method based on word vectors and an integrated SVM

A text data and integrated classifier technology, applied in text database clustering/classification, unstructured text data retrieval, etc., can solve problems such as complex construction scheme, low classification accuracy of weak classifiers, and high time complexity

Active Publication Date: 2019-06-28
HEFEI UNIV OF TECH
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The above algorithms have made some improvements to the existing problems of dynamic learning and low classification accuracy of weak classifiers. However, integrated learning has shortcomings such as complex construction schemes, the need to use a large amount of labeled data, and high time complexity. Can be solved very well, need further improvement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text data stream classification method based on word vectors and an integrated SVM
  • A text data stream classification method based on word vectors and an integrated SVM
  • A text data stream classification method based on word vectors and an integrated SVM

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] In this example, if figure 1 As shown, a text data stream classification method based on word vector and integrated SVM is carried out as follows:

[0055] 1, a kind of text data flow classification method based on word vector and integrated SVM, it is characterized in that carry out as follows:

[0056] Step 1. Obtain a text data set, and mark part of the text in the text data set to obtain a labeled text set and use it as a seed text set; the seed text is obtained by randomly selecting about 10% of the total text data set.

[0057] Step 2. Perform word vector expansion processing on the seed text set to obtain the corresponding feature dictionary and noise dictionary; the word vector algorithm is obtained by training the deep learning word vector algorithm proposed by Google from Wikipedia corpus.

[0058] Step 2.1, segment the seed text in the seed text set into words;

[0059] Step 2.2, sort the words after the segmentation according to the word frequency, and fil...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text data stream classification method based on word vectors and an integrated SVM (Support Vector Machine). The text data stream classification method comprises the following steps: 1, obtaining a seed text set from a text data set; 2, performing word vector expansion processing on the seed text set to obtain a corresponding feature dictionary and a noise dictionary; 3,performing feature weighted vectorization processing on the text data set to obtain a corresponding text vector set; and 4, constructing an integrated classifier to obtain classification results of all the texts. According to the method, the accuracy of the classification result can be improved by fully utilizing the data features under the condition that the calculation complexity is reduced, sothat the requirement for solving the actual problem is met.

Description

technical field [0001] The invention relates to the field of text data stream classification; specifically, it is a text data stream classification method based on word vectors and integrated SVM. Background technique [0002] With the continuous development of self-media and social networks, it has become a hot research field to identify features and classify data from unstructured short text data that is generated in real time, has a large amount of data, and has a complex structure. This can help users quickly extract valuable information and knowledge from it. However, traditional KNN, SVM, NB, deep learning and other methods require large training samples for multi-classification algorithms, and the accuracy rate is low, and the dynamic adaptability of the above algorithms is not strong. The following problems still exist: [0003] Most of the information disseminated in social media streams is invalid information; classification algorithms in social media streams hav...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35
Inventor 倪丽萍夏千姿倪志伟朱旭辉夏平凡李想
Owner HEFEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products