Computer text classification system, system and text classification method thereof
A text classification and computer technology, applied in text database clustering/classification, computing, unstructured text data retrieval and other directions, can solve the problem of text space representation coefficient text feature redundancy and other problems, to reduce time and space complexity , Improve the accuracy and ensure the effect of efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0052] Such as figure 1 As shown, this embodiment 1 provides a computer text classification system, including:
[0053] Text preprocessing module, text formal module, text weight calculation module, model training module, noise reduction module;
[0054] In the text preprocessing module, a dual method is used to remove stop words. The text often uses substantive words such as nouns, verbs and adjectives to reflect the content of the text, while function words and words that often appear in the text but do not represent the text content are called Stop words. Since these stop words do not represent the actual meaning of the text, they do not contribute to the text classification. On the contrary, they will increase the time and space complexity of the text classification algorithm. Therefore, in order to reduce the storage space and improve the classification efficiency and classification accuracy of the text classification algorithm, it is necessary to remove stop words from the t...
Embodiment 2
[0058] Text classification is to divide a large number of text documents into one or a group of categories, so that each category represents different conceptual topics. Text classification is actually a pattern classification task, and pattern classification algorithms can be applied to text classification. The application of natural language processing to text classification is closely related to the semantics of the document, so compared with ordinary pattern classification tasks, it has many unique characteristics.
[0059] In the high-dimensional feature space, there are a large number of candidate features when extracting document features. If words are used as document features, even a small training document set will generally produce tens of thousands of candidate features. If one item is used as a feature, more candidate features will be generated. Feature semantic correlation A solution to avoid bad selection results is to assume that most of the features are indepen...
Embodiment 3
[0062] Such as figure 2 As shown, this embodiment 3 provides a computer text classification system, including:
[0063] The text preprocessing module, the text feature extraction module, the text training processing module, the classification processing module, the text type marking module and the effect improvement module are connected in sequence.
[0064] Specifically, the text preprocessing module is suitable for removing punctuation marks and spaces in the input text, dividing it into word sets, and removing meaningless words; that is, forming a simplified word set.
[0065] Specifically, the text feature extraction module is adapted to generate a subset of feature words from the condensed word set, and obtain a mapping table between the feature words and the frequency of occurrence of the feature words.
[0066] Specifically, the text training processing module is suitable for processing the mapping table; that is, other texts are randomly selected, the inverse text frequency in...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com