Feature extraction method for text categorization based on improved mutual information and entropy
A text classification and feature extraction technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of text classification accuracy and recall rate that need to be further improved
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0043] For the convenience of description, we assume the following application example: Now there are overwhelming news on the Internet every day, and we want to determine which aspect of a network news document is mainly about, that is, to determine the category of the document. In the document classification process, the feature extraction method proposed by the present invention can be used to extract features and determine text vectors, and then a classifier can be used for text classification.
[0044] The specific embodiment of the present invention is:
[0045] (1) Manually find a certain number of articles of each category from the Internet as the training data set for the text classification system;
[0046] (2) Preprocess these articles, remove stop words after word segmentation, obtain feature words, count the frequency of words and inverse document frequency, calculate the weight of feature words according to TF-IDF, and express each article as two-tuple as a mult...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com