Classification method of short text

A classification method and short text technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve problems such as unbalanced distribution of text data sets, and achieve high-dimensional sparseness The Effect of Sex and Class Imbalance

Active Publication Date: 2017-11-21
TONGJI UNIV
View PDF9 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in real application scenarios, there is an obvious problem of unbalanced class distribution in text datasets, especially the natural high-dimensional sparse nature of short texts, which poses challenges to existing classification algorithms

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classification method of short text
  • Classification method of short text
  • Classification method of short text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The specific embodiments of the present invention will be described in further detail below in conjunction with the drawings and embodiments. The following examples are used to illustrate the present invention, but not to limit the scope of the present invention.

[0039] A short text classification method of the present invention provides a combination of dimensionality reduction algorithm and weighted under-sampling SVM algorithm to deal with the problems of high-dimensional sparsity and category imbalance in text classification. The classification method adopts a combination The dimensionality reduction algorithm divides the two types of samples on the hyperplane, calculates the geometric distance between each multi-type sample and the hyperplane, and divides multiple subdomains according to the geometric distance. Each subdomain interval is given a different weight, and the distance from the hyperplane is greater. The far subdomain, the smaller the weight, in the under...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a classification method of short text. The classification method is characterized in that a hyperplane cuts two classes of samples, then the geometric distance between each multiclass sample and the hyperplane is calculated, multiple subdomains are divided according to the geometric distance, each subdomain is endowed with unique weight, the weight of the subdomains decreases gradually along with the increasing of the distance from the subdomains to the hyperplane, sub-sampling is performed on data according to the weight in a sub-sampling stage, and obtained sampled samples are imported into an SVM algorithm to perform classification. By the classification method, the problems of high-dimension sparsity and class imbalance in text classification can be solved effectively.

Description

Technical field [0001] The invention relates to a short text classification method, which belongs to the field of machine learning and data mining. Background technique [0002] In recent years, big data and artificial intelligence technologies have developed rapidly, and speech and image recognition, natural language processing, and knowledge graphs have become hot research areas. Text Categorization is the most typical problem in the field of machine learning and data mining. It has many classification algorithms, such as naive Bayes algorithm K nearest neighbor algorithm (K-NN), neural network algorithm and support vector machine (Support Vector Machine, SVM), etc. SVM is a representative classification method with strong generalization ability based on statistical learning theory. It aims at minimizing structural risk and overcomes the problem of dimensionality disaster through the introduction of kernel functions, and has become a classic candidate for text classification p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/355
Inventor 康琦张量
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products