Incremental naive Bayes text classification method based on lifelong learning

A text classification and incremental technology, applied in text database clustering/classification, unstructured text data retrieval, character and pattern recognition, etc., can solve the problem that performance cannot be improved, and achieve new feature processing and field Adaptive ability, the effect of improving accuracy

Active Publication Date: 2018-05-22
NANJING UNIV OF SCI & TECH
View PDF2 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the latter, relatively little work has been done to change the parameters of the Naive Bayesian model, but such methods are still based on the existing Naive Bayesian model, and their performance cannot be improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Incremental naive Bayes text classification method based on lifelong learning
  • Incremental naive Bayes text classification method based on lifelong learning
  • Incremental naive Bayes text classification method based on lifelong learning

Examples

Experimental program
Comparison scheme
Effect test

experiment example

[0044] In this embodiment, the performance analysis of the incremental naive Bayesian text classification method based on lifelong learning is carried out by using the classic text classification dataset 3 classification tasks, Movie review dataset and Multi-domain sentiment datasets; wherein the 3 classification tasks include movie3, network3 And health3, Multi-domain sentiment datasets include book, dvd, electronics and kitchen.

[0045] Experiments are further divided into two data conditions, domain-specific and domain-variant. Domain-specific means that historical data and incremental data come from the same domain, which is the most common text classification task. In domain changes, historical data and incremental data come from different related domains, which is the classification task of domain adaptation.

[0046](1) Domain-specific text classification

[0047] The domain-specific text classification and sentiment classification are respectively carried out on the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an incremental naive Bayes text classification method based on lifelong learning. The method comprises: obtaining an initial text; extracting feature words of the text, storing a feature word list of a text set and text number; calculating word frequency of each feature word in the text, dividing the text set into two parts of a training set and a validation set; through anaive Bayes classifier, training a training set vector model generated in a previous step, to obtain prior probability and feature class condition probability of the naive Bayes model and storing; ifa new text exists, training the new text in an incremental manner, and updating the prior probability and feature class condition probability of the naive Bayes model, if no new text exists, selecting a test corpus from the validation set, and according to the naive Bayes model, obtaining a prediction text category of the text corpus, and calculating accuracy rate of prediction. The method can guide learning of new tasks in an incremental manner by using knowledge learned in previous tasks, and has new feature processing ability and field adaptive ability.

Description

technical field [0001] The invention belongs to the field of data mining and machine learning, in particular to an incremental naive Bayesian text classification method based on lifelong learning. Background technique [0002] With the advent of the information age, the information we can obtain is increasing day by day, and how to process and utilize these massive data is particularly important. Although the performance of today's hardware is getting higher and higher, the amount of information is also growing explosively. Many traditional classification methods read all the data into the memory at one time when processing data, which greatly limits the generality of the algorithm. scalability and scalability. In addition, in most natural language processing tasks, the training set is incomplete. In order to improve the performance of the model, the training set needs to continuously add and update training samples. Traditional classification methods need to reintegrate a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/35G06F18/24155
Inventor 夏睿潘振春
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products