Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text classification method

A text classification and text technology, applied in the field of data classification, can solve the problem of low classification accuracy and achieve the effect of reducing dimensions and speeding up

Inactive Publication Date: 2016-11-09
量子云未来(北京)信息科技有限公司 +1
View PDF4 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the above analysis, the present invention aims to provide a text classification method to solve the problem that the existing text classification methods require the participation of experts in the field and are easily affected by human subjective cognition, resulting in low classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method
  • Text classification method
  • Text classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Preferred embodiments of the present invention will be specifically described below in conjunction with the accompanying drawings, wherein the accompanying drawings constitute a part of the application and are used together with the embodiments of the present invention to explain the principles of the present invention.

[0025] A specific embodiment of the present invention discloses a method for text classification of express delivery comments, which specifically includes the following steps:

[0026] The data about express delivery comments in the network is randomly obtained as a text collection, and multiple staff members mark the categories of each express delivery comment in the text collection, and mark them as five categories: fast, fast, slow, very slow, and invalid. After counting the marked results, determine the final category according to the number of marked categories for each express comment. Then, according to the ratio of training sample:test set = 10...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a text classification method. The text classification method comprises the steps that text sets pre-marked in category are obtained as training samples, and texts in the training samples are preprocessed to obtain feature word sets for training; feature words are extracted to obtain a feature word dictionary; the feature word dictionary generates feature vectors of the texts in the training samples, and feature vector sets of the training samples are obtained; an SVM classifier is trained by utilizing the feature vectors; the texts to be classified are preprocessed to obtain the feature word sets of the texts to be classified; the feature vectors of the texts to be classified are generated according to the feature word dictionary; the feature vectors are input into the trained SVM classifier to obtain categories of the texts to be classified.

Description

technical field [0001] The invention relates to the technical field of data classification, in particular to a method for text classification. Background technique [0002] Text information is a kind of data that widely exists in various fields, and using classification models to classify text has a broad application market. When classifying text, the quality of feature extraction has a great impact on the classification accuracy. If all words are used as feature words, it will cause two adverse effects: 1. The feature dimension is too high and sparse; 2. Many words generally exist in various categories, and the distinction is not strong. If these words are used as features, the classification effect will be reduced. . Therefore, it is necessary to select feature words for the text. Considering that the feature words in various fields are not the same, there is no general feature word, and the common method is to select feature words by domain experts. The method of sele...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 李甫
Owner 量子云未来(北京)信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products