Incremental naive Bayes text classification method based on lifelong learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A text classification and incremental technology, applied in text database clustering/classification, unstructured text data retrieval, character and pattern recognition, etc., can solve the problem that performance cannot be improved, and achieve new feature processing and field Adaptive ability, the effect of improving accuracy

Active Publication Date: 2018-05-22

NANJING UNIV OF SCI & TECH

View PDF2 Cites 17 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In the latter, relatively little work has been done to change the parameters of the Naive Bayesian model, but such methods are still based on the existing Naive Bayesian model, and their performance cannot be improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

experiment example

[0044] In this embodiment, the performance analysis of the incremental naive Bayesian text classification method based on lifelong learning is carried out by using the classic text classification dataset 3 classification tasks, Movie review dataset and Multi-domain sentiment datasets; wherein the 3 classification tasks include movie3, network3 And health3, Multi-domain sentiment datasets include book, dvd, electronics and kitchen.

[0045] Experiments are further divided into two data conditions, domain-specific and domain-variant. Domain-specific means that historical data and incremental data come from the same domain, which is the most common text classification task. In domain changes, historical data and incremental data come from different related domains, which is the classification task of domain adaptation.

[0046](1) Domain-specific text classification

[0047] The domain-specific text classification and sentiment classification are respectively carried out on the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an incremental naive Bayes text classification method based on lifelong learning. The method comprises: obtaining an initial text; extracting feature words of the text, storing a feature word list of a text set and text number; calculating word frequency of each feature word in the text, dividing the text set into two parts of a training set and a validation set; through anaive Bayes classifier, training a training set vector model generated in a previous step, to obtain prior probability and feature class condition probability of the naive Bayes model and storing; ifa new text exists, training the new text in an incremental manner, and updating the prior probability and feature class condition probability of the naive Bayes model, if no new text exists, selecting a test corpus from the validation set, and according to the naive Bayes model, obtaining a prediction text category of the text corpus, and calculating accuracy rate of prediction. The method can guide learning of new tasks in an incremental manner by using knowledge learned in previous tasks, and has new feature processing ability and field adaptive ability.

Description

technical field [0001] The invention belongs to the field of data mining and machine learning, in particular to an incremental naive Bayesian text classification method based on lifelong learning. Background technique [0002] With the advent of the information age, the information we can obtain is increasing day by day, and how to process and utilize these massive data is particularly important. Although the performance of today's hardware is getting higher and higher, the amount of information is also growing explosively. Many traditional classification methods read all the data into the memory at one time when processing data, which greatly limits the generality of the algorithm. scalability and scalability. In addition, in most natural language processing tasks, the training set is incomplete. In order to improve the performance of the model, the training set needs to continuously add and update training samples. Traditional classification methods need to reintegrate a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06K9/62

CPCG06F16/35G06F18/24155

Inventor 夏睿潘振春

Owner NANJING UNIV OF SCI & TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Incremental naive Bayes text classification method based on lifelong learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

experiment example

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology