A Computer Text Classification System

A text classification and computer technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc. Space complexity, ensuring efficiency, and improving the effect of accuracy

Active Publication Date: 2019-05-10
JIANGSU UNIV OF TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a computer text classification system to solve technical problems such as text space representation coefficients and serious redundancy of text feature items

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Computer Text Classification System
  • A Computer Text Classification System
  • A Computer Text Classification System

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] Such as figure 1 As shown, the present embodiment 1 provides a computer text classification system, including:

[0053] Text preprocessing module, text formalization module, text weight calculation module, model training module, noise reduction module;

[0054] In the text preprocessing module, a dual method is used to remove stop words. The text often reflects the content of the text through content words such as nouns, verbs, and adjectives, while function words and words that often appear in the text but do not indicate the content of the text are called stop words. Since these stop words do not represent the actual meaning of the text, they have no contribution to text classification, on the contrary they will increase the time and space complexity of the classification algorithm to process the text. Therefore, in order to reduce the storage space and improve the classification efficiency and classification accuracy of the text classification algorithm, it is nece...

Embodiment 2

[0058] Text classification is to divide a large number of text documents into one or a group of categories, so that each category represents a different conceptual theme. Text classification is actually a pattern classification task, and pattern classification algorithms can be applied to text classification. Text classification applies natural language processing to it, which is closely related to the semantics of documents, so it has many uniqueness compared with ordinary pattern classification tasks.

[0059] In the high-dimensional feature space, there are a large number of candidate features when extracting document features. If words are used as document features, even a small training document set will generally generate tens of thousands of candidate features. If one item is used as a feature, more candidate features will be generated. Feature Semantic Correlation One solution to avoid bad selection results is to assume that most features are independent of each othe...

Embodiment 3

[0062] Such as figure 2 As shown, the present embodiment 3 provides a computer text classification system, including:

[0063] A text preprocessing module, a text feature extraction module, a text training processing module, a classification processing module, a text type labeling module and an effect improvement module are sequentially connected.

[0064] Specifically, the text preprocessing module is adapted to remove punctuation marks and spaces in the input text, segment it into word sets, and remove meaningless words; that is, form a simplified word set.

[0065] Specifically, the text feature extraction module is adapted to generate a subset of feature words from the reduced set of words, and obtain a mapping table between feature words and the frequency of occurrence of the feature words.

[0066] Specifically, the text training processing module is suitable for processing the mapping table; that is, randomly select other texts, calculate the inverse text frequency in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a computer text classification system and a text classification method thereof. The computer text classification system comprises a text preprocessing module, a text formalizing module, a text weight calculation module, a model training module and a noise reduction module. The computer text classification system has the advantages that the time for computer text classification and space complexity can be effectively reduced, and accordingly computer text classification is quick, effective and accurate.

Description

technical field [0001] The invention relates to a computer text classification system, a system and a text classification method thereof. Background technique [0002] With the rapid development of information technology, especially the popularity of the Internet, computer texts are growing explosively, and people urgently need a system to organize and manage text information efficiently. As a key technology to organize and process a large amount of text information, text classification can solve the problem of messy information to a large extent. research direction. At present, the text classification system has been widely used in many fields, and has made great progress. However, text classification has also encountered unprecedented challenges. There are also a large number of synonyms in the text, resulting in redundant text feature items, objectively making the text space extremely sparse, and thus causing great difficulties for text classification. This requires a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 钱进吕萍
Owner JIANGSU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products