Text classification method and device

一种文本分类、文本分的技术,应用在仪器、计算、电数字数据处理等方向,能够解决计算量大、分类准确度不是很好等问题,达到计算量减少、提高准确度的效果

Inactive Publication Date: 2011-08-03
ALIBABA GRP HLDG LTD
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the prior art, it is necessary to calculate the similarity between each text vector and all the feature vectors of the candidate category. Each calculation needs to be measured by the cosine of the included angle. The calculation amount is very large, and the existing technology does not have any constraints on the semantics of the text. The classification accuracy is not very good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and device
  • Text classification method and device
  • Text classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In the embodiment of the present application, a spherical space model is pre-constructed, and texts are classified based on the spherical space model. During the classification process, the vectors of each vocabulary in the text and the distances from the vectors of each category are calculated, so as to determine the category into which the text should be classified. category. The embodiment of the present application implements text classification, and compared with the included angle cosine algorithm in the prior art, the amount of calculation is significantly reduced. And in the embodiment of the present application, the spherical space model takes the unit length as the radius, then the sum of the squares of the normalized vocabulary vectors of a vocabulary on various categories is also the unit length, which is equivalent to equivalently equating the semantic information of a vocabulary is the unit length, and constrains the amount of semantic information, so it c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method, which is used for realizing text classification, simplifying classification operation and improving the text classification accuracy. The method comprises the following steps of: segmenting the obtained text contents to obtain a plurality of words; determining a word vector of a word in a spherical space model for each word in the plurality of words, wherein the word vector of the word comprises a normalized word frequency value which is normalized by a word frequency value of the word on each category, the spherical space model is a multi-dimensional spherical model which takes unit length as a radius, the dimensionality of the spherical space is equal to the number of the categories, and the category corresponds to a category vector in the spherical space; determining a distance from the sum of the word vectors of the plurality of words to the category vector of the category for each category; and introducing the text into a category corresponding to the shortest distance. The invention also discloses a device for implementing the method.

Description

technical field [0001] The present application relates to the field of computer and communication, in particular to a method and device for text classification. Background technique [0002] Text classification is an important content of text mining, which refers to determining a category for each document in a document collection according to a predefined subject category. Classifying documents through an automatic text classification system can help people better find the information and knowledge they need. Classification is seen as the most basic form of cognition of information. The traditional literature classification research has rich research results and a considerable practical level. However, with the rapid growth of text information, especially the surge of online text information on the Internet, automatic text classification has become a key technology for processing and organizing large amounts of document data. Now, text classification is widely used in va...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/21G06F17/30
CPCG06F17/30707G06F16/353
Inventor 孙翔
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products