Document semantic representation method based on thematic word class similarities and text classification method and device
Patent Information
- Authority / Receiving Office
- CN Β· China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INST OF INFORMATION ENG CAS
- Publication Date
- 2018-09-28
Smart Images

Figure 1 
Figure 2 
Figure 3
Abstract
Description
technical field
[0001] The invention belongs to the field of information technology, and in particular relates to a document semantic representation method, a text classification method and a corresponding device based on the similarity of subject parts of speech. Background technique
[0002] Text vector representation is one of the key technologies in the fields of text mining and natural language processing. A good document semantic representation method can improve the performance of tasks such as information retrieval and text classification.
[0003] The present invention is a document semantic representation method based on the similarity of subject parts of speech, and is an improvement proposed for the high-dimensional sparseness and no semantics of the bag-of-words model. At present, the document representation methods based on the bag-of-words model are as follows:
[0004] 1) The traditional bag of words (Bag of words, BOW) model represents the frequency of words...