Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Automatic text classification method based on classification concept space

An automatic text and classification method technology, applied in the field of content and information analysis and processing, can solve the problems that the orthogonal concept space cannot measure and set the transformation threshold, and the high-dimensional vector space cannot accurately describe the text, so as to overcome the non-orthogonal The effect of high efficiency of characteristics and classification

Inactive Publication Date: 2007-03-28
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] So far, various research reports have shown that the high-dimensional vector space based on word orthogonality cannot accurately describe the text, and the orthogonal concept space obtained through matrix transformation also has problems such as the inability to measure and set the transformation threshold.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic text classification method based on classification concept space
  • Automatic text classification method based on classification concept space
  • Automatic text classification method based on classification concept space

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] This article uses d ‾ = t f 1 , t f 2 , … , t f n > Indicates the word frequency of a document vector, where tf j Indicates the occurrence frequency of word j in the document d; use C ‾ m = tc f 1 , tc f 2 , … , tc f n > Indicates category m The term frequency vector of , where tcf n Indicates the occurrence frequency of the nth word in the mth category.

[0022] The method steps of Fig. 1 are as follows:

[0023] During the training phase,

[0024] Step...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Being divided into training and classifying two phases, the method includes steps: (1) constructing data of classified words and expressions (WE) matrix; (2) based on the said matrix to build frequency data table of inverse sorted classes; (3) based on the said table to build effective set of WE; (4) based on the said set to rebuild data of classified WE matrix; (5) based on the said rebuilt matrix to build frequency data table of inverse sorted WE in each class; (6) based on classified WE matrix, and frequency data table of inverse sorted WE to build vector representations of WE based on space of class concept; (7) based on frequencies of words and frequencies of sorted classes to construct vector data of document to be classified in vector space of class concept; (8) based on magnitude of each component in vector of document to be classified to obtain class of the document. The invention is suitable to information classifying, filtering, and monitoring etc.

Description

technical field [0001] The invention belongs to the field of content and information analysis and processing, in particular to an automatic text classification method based on category concept space. Background technique [0002] Automatic text classification (Auto Text Classification) is a technology that studies the computer automatic classification of a large number of documents under a given category. The basis of this technology is the vector space model, in which the vector space is a high-dimensional vector space with the dimension of words or transformed concepts. In this space, various classification methods are applied to classify documents. [0003] So far, various research reports have shown that the high-dimensional vector space based on word orthogonality cannot accurately describe the text, and the orthogonal concept space obtained through matrix transformation also has problems such as the inability to measure and set the transformation threshold. Therefore,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 鲁松
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products