Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Optimization method and device for text categorization model

A technology of text classification and optimization method, which is applied in the computer field and can solve problems such as inability to classify short texts

Active Publication Date: 2018-07-17
ADVANCED NEW TECH CO LTD
View PDF2 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method is usually only suitable for classifying long texts, because in long texts, some words will appear multiple times, that is, TF can play a corresponding role
For short text (the text usually only includes a few words), most of the words appear only once in the text, TF has no meaning, so according to the above text classification model, the short text cannot be accurately classified

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optimization method and device for text categorization model
  • Optimization method and device for text categorization model
  • Optimization method and device for text categorization model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Embodiments of the present application are described below in conjunction with the accompanying drawings.

[0031] The text classification model optimization method and device provided in the embodiments of the present application are applicable to the scene where the text classification model is automatically optimized according to the pre-collected text. The text classification model here includes: Naive Bayesian model, KNN model and Maximum entropy model, etc. The optimized text classification model is suitable for classifying address text. The address text here has the following characteristics: a, the content is short, that is, a word in the text appears only once; b, important words are at the end of the text. For example, the above address text may be the delivery address of the user.

[0032] figure 1 It is a flowchart of an optimization method for a text classification model provided by an embodiment of the present application. The subject of execution of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of computers, in particular to an optimization method and device for a text categorization model. The optimization method for the text categorization modelcomprises the steps that a text set is firstly acquired, preprocessing is carried out on each text in the text set, and a term set corresponding to each text is obtained. Terms in the term set are matched with various category feature words in a preset feature word set, according to a matching result, a term category in the term set is determined. According to the term category, the preset feature word set is expanded. According to the expanded preset feature word set, terms in the tern set are filtered. Weighted values of the filtered terms in the term set are determined, and according to the filtered terms in the term set and the corresponding weighted values, the optimization is carried out on a preset text categorization model. Therefore, the text categorization model which can accurately categorize texts can be obtained.

Description

technical field [0001] The present application relates to the field of computer technology, in particular to a text classification model optimization method and device. Background technique [0002] In traditional technology, texts are usually classified by the following two methods: [0003] The first method is a rule-based method, which collects some commonly used category keywords in advance, and the categories corresponding to these category keywords are known. When the text to be classified matches a certain category keyword, the The text is classified into the category corresponding to the category keyword. However, this method has great limitations. When a text does not match any category keywords, the text cannot be classified. Furthermore, this method usually cannot accurately classify text. For example, assume that the pre-collected category keyword is: "Tmall", and its corresponding category is the Internet industry. Since the text "Tmall Service Station" cont...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/355G06F18/24
Inventor 陈帅徐峰陈明星郑霖陈弢
Owner ADVANCED NEW TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products