Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Computer text classification system, system and text classification method thereof

A text classification and computer technology, applied in text database clustering/classification, computing, unstructured text data retrieval and other directions, can solve the problem of text space representation coefficient text feature redundancy and other problems, to reduce time and space complexity , Improve the accuracy and ensure the effect of efficiency

Active Publication Date: 2017-03-15
JIANGSU UNIV OF TECH
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a computer text classification system to solve technical problems such as text space representation coefficients and serious redundancy of text feature items

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Computer text classification system, system and text classification method thereof
  • Computer text classification system, system and text classification method thereof
  • Computer text classification system, system and text classification method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] Such as figure 1 As shown, this embodiment 1 provides a computer text classification system, including:

[0053] Text preprocessing module, text formal module, text weight calculation module, model training module, noise reduction module;

[0054] In the text preprocessing module, a dual method is used to remove stop words. The text often uses substantive words such as nouns, verbs and adjectives to reflect the content of the text, while function words and words that often appear in the text but do not represent the text content are called Stop words. Since these stop words do not represent the actual meaning of the text, they do not contribute to the text classification. On the contrary, they will increase the time and space complexity of the text classification algorithm. Therefore, in order to reduce the storage space and improve the classification efficiency and classification accuracy of the text classification algorithm, it is necessary to remove stop words from the t...

Embodiment 2

[0058] Text classification is to divide a large number of text documents into one or a group of categories, so that each category represents different conceptual topics. Text classification is actually a pattern classification task, and pattern classification algorithms can be applied to text classification. The application of natural language processing to text classification is closely related to the semantics of the document, so compared with ordinary pattern classification tasks, it has many unique characteristics.

[0059] In the high-dimensional feature space, there are a large number of candidate features when extracting document features. If words are used as document features, even a small training document set will generally produce tens of thousands of candidate features. If one item is used as a feature, more candidate features will be generated. Feature semantic correlation A solution to avoid bad selection results is to assume that most of the features are indepen...

Embodiment 3

[0062] Such as figure 2 As shown, this embodiment 3 provides a computer text classification system, including:

[0063] The text preprocessing module, the text feature extraction module, the text training processing module, the classification processing module, the text type marking module and the effect improvement module are connected in sequence.

[0064] Specifically, the text preprocessing module is suitable for removing punctuation marks and spaces in the input text, dividing it into word sets, and removing meaningless words; that is, forming a simplified word set.

[0065] Specifically, the text feature extraction module is adapted to generate a subset of feature words from the condensed word set, and obtain a mapping table between the feature words and the frequency of occurrence of the feature words.

[0066] Specifically, the text training processing module is suitable for processing the mapping table; that is, other texts are randomly selected, the inverse text frequency in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a computer text classification system and a text classification method thereof. The computer text classification system comprises a text preprocessing module, a text formalizing module, a text weight calculation module, a model training module and a noise reduction module. The computer text classification system has the advantages that the time for computer text classification and space complexity can be effectively reduced, and accordingly computer text classification is quick, effective and accurate.

Description

Technical field [0001] The invention relates to a computer text classification system, a system and a text classification method. Background technique [0002] With the rapid development of information technology, especially the popularization of the Internet, computer text is growing explosively, and people urgently need a system to efficiently organize and manage text information. As a key technology for organizing and processing a large amount of text information, text classification can solve the problem of information clutter to a large extent. It has extremely practical significance for the efficient management and effective use of information, and has become an important part in the field of data mining. research direction. At present, the text classification system has been widely used in many fields and has made great progress. However, text classification has also encountered unprecedented challenges. There are also a large number of synonyms in the text, resulting i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 钱进吕萍
Owner JIANGSU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products