An Online Short Text Data Stream Classification Method Based on Feature Expansion

A classification method and short text technology, applied in text database clustering/classification, text database query, unstructured text data retrieval, etc., can solve the impact of stability integrity, model portability, poor stability, text classification Technology is difficult to be effective and other problems, to achieve the effect of reducing sparsity problems, improving accuracy, and improving processing capacity

Active Publication Date: 2021-08-17
HEFEI UNIV OF TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] One of the challenges: Traditional short text classification due to the high-dimensional sparseness of short texts makes traditional text classification techniques difficult to be effective; the current solution: one is to use external corpora to expand short texts, and then use traditional classification methods for classification , such as Naive Bayes Bayes), support vector machine (SVM), decision tree and other classifiers; one is to use its own hidden statistical information to expand short text for short text classification, such as LDA, KNN, etc.
However, the stability of these models is greatly affected by the integrity of the external corpus, resulting in poor portability and stability of the model.
[0004] Challenge 2: Due to the massive and infinite nature of continuous data, traditional multi-iterative deep learning frameworks based on static data sets (such as Text-CNN, RNN, etc.) cannot handle continuous data well, and the model cannot obtain better performance
[0005] Challenge 3: Short text streams have characteristics such as dynamic changes. Due to the limitation of the static framework of the network layer, the current mainstream deep learning framework cannot quickly adapt to changing data streams, resulting in the inability of traditional neural network models to process short texts well. dynamics of book

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An Online Short Text Data Stream Classification Method Based on Feature Expansion
  • An Online Short Text Data Stream Classification Method Based on Feature Expansion
  • An Online Short Text Data Stream Classification Method Based on Feature Expansion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0074] In this example, if figure 2 As shown, an online short text data stream classification method based on feature expansion is carried out as follows:

[0075] Step 1: Build the Word2vec model based on the external corpus, and obtain the word vector set Vec:

[0076] Step 1.1: According to the sliding window mechanism, the given short text data stream Stream={d 1 ,d 2 ,...,d e ,...,d E} is divided into T sets of data blocks according to time, recorded as D={D 1 ,D 2 ,...,D t ,...,D T}, where d e Indicates the e-th short text in the short text data stream Stream; D t Represent the data block at time t in the short text data stream Stream, e=1,2,...,E, t=1,2,...,T;

[0077] Step 1.2: Obtain the text external corpus from the knowledge base for the short text data stream Stream, denoted as C'={d' 1 ,d' 2 ,...,d′ m ,...,d′ M}, m=1,2,...,M, where M represents the total number of texts in the external corpus C′, d′ m represents the mth text, and has d' m ={w' 1 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an online short text data stream classification method based on feature expansion, the steps of which include: 1 constructing a Word2vec model according to an external corpus, and obtaining a word vector set Vec; 2 using Vec to vectorize the short text data stream and performing Text vectorization extension; 3 pairs of extended text vectors to build an online deep learning network; 4 pairs of neurons in the LSTM network to introduce concept drift semaphores and detect changes in the distribution of short text streams; 5 Complete the online deep learning network model update and short text Prediction of this data stream. The invention can effectively improve the classification accuracy rate of the short text data flow, correctly detect concept drift and adjust the model, thereby achieving the purpose of quickly adapting to the environment of the short text data flow.

Description

technical field [0001] The invention belongs to the field of practical application to short text data flow mining and online deep learning, and in particular relates to the classification problem of constantly changing, fast and infinite short text data flow. Background technique [0002] With the rise of information technology such as mobile development and micro-service framework, a kind of massive, high-speed and dynamic data-data flow has emerged in practical application areas such as social networking, online shopping, and sensor networks. In the social field, due to the popularity of social network media and forums, very short texts flood into our lives, such as Weibo, tweets, Facebook and other user comments and interactions on forums. Short essays contain a lot of information in various fields such as sports, education, science, etc. Compared with ordinary texts, short texts are sparse, real-time, massive, irregular, and dynamic, which leads to thematic evolution. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F16/35G06F40/30G06K9/62G06N3/04G06N3/08
CPCG06F16/3344G06F16/35G06N3/084G06N3/044G06N3/045G06F18/241
Inventor 李培培胡阳胡学钢
Owner HEFEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products