Text classification method based on WordNet and latent semantic analysis

A technology of semantic analysis and text classification, applied in the computer field, can solve problems such as not considering semantics

Active Publication Date: 2015-11-11
BEIJING UNIV OF TECH
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Although the above method tries to find the best feature matrix, it does not consider the semantics, and the impact of synonyms and hyponyms on the feature matrix from the beginning, and only uses the LSA method for feature extraction, but the current research shows that two methods Feature extraction together outperforms one approach

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method based on WordNet and latent semantic analysis
  • Text classification method based on WordNet and latent semantic analysis
  • Text classification method based on WordNet and latent semantic analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The features and exemplary embodiments of various aspects of the present invention will be described in detail below. The following description covers many specific details in order to provide a comprehensive understanding of the present invention. However, it is obvious to those skilled in the art that the present invention can be implemented without some of these specific details. The following description of the embodiments is only to provide a clearer understanding of the present invention by showing examples of the present invention. The present invention is by no means limited to any specific configuration and algorithm proposed below, but covers any modification, replacement and improvement of related elements, components and algorithms without departing from the spirit of the present invention.

[0035] In view of the fact that the above-mentioned traditional text classification method cannot solve the multi-word synonymous problem well, the present invention prop...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A text classification method based on WordNet and latent semantic analysis relates to the field of a computer. The text classification method based on WordNet and latent semantic analysis considers synonyms, hypernyms and hyponyms of words in a text and word frequencies of the synonyms, the hypernyms and the hyponyms are increased according to the similarity, so that influence of synonymy of a plurality of words on classification is reduced. Different from a common method of carrying out feature extraction on a feature matrix by a single method, the text classification method based on WordNet and latent semantic analysis obtains a plurality of feature matrices by regulating a WordNet invocation parameters and uses a genetic algorithm (GA) to assist latent semantic analysis (LSA) to complete feature extraction together so as to obtain better feature matrices, thereby improving a classification effect.

Description

Technical field [0001] The present invention relates to the computer field, and more specifically to a text classification method based on WordNet and latent semantic analysis. Background technique [0002] Text categorization refers to the process of automatically determining text categories based on text content under a given classification system, and categorizing documents according to pre-specified standards so that users can not only browse documents conveniently but also query by category Before the 1990s, the dominant text classification method has always been a classification method based on knowledge engineering, which is manually classified by professionals. Manual classification is very time-consuming and very inefficient. Since the 1990s, numerous statistical methods and machine learning methods have been applied to automatic text classification, and the research on text classification technology has aroused great interest among researchers. At present, research on...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/355
Inventor 赵旭李建强刘璐许泽文莫豪文
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products