A method for realizing Chinese text classification and related equipment
A text classification, Chinese technology, applied in the direction of text database clustering/classification, unstructured text data retrieval, semantic tool creation, etc., can solve problems such as many spelling errors, and achieve the effect of high accuracy and single dimension.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment approach 1
[0030] figure 1 It is a schematic flowchart of the method for realizing Chinese text classification provided by Embodiment 1 of the present invention. Such as figure 1 As shown, the method includes:
[0031] Step 101, using the Chinese pinyin sequence to expand the semantics of the Chinese short text, and using word vectors to establish a character mapping matrix and a word-level mapping matrix;
[0032] Step 102, performing convolution and down-sampling operations on the character mapping matrix and word-level mapping matrix to automatically extract the local feature vectors of the short Chinese text;
[0033] In step 103, the local feature vectors are concatenated and fused, and then added to a normalized Softmax classifier to classify the Chinese short text.
[0034] Wherein, the Chinese pinyin sequence is used to expand the semantics of the Chinese short text, and the word vector is used to establish a character mapping matrix and a word-level mapping matrix, including:...
Embodiment 1
[0048] figure 2 A schematic flow diagram of the method for realizing Chinese text classification provided by Embodiment 1 of the present invention, as figure 2 As shown, the method includes:
[0049] Step 201, use the Chinese pinyin sequence to expand the semantics of the original text, and establish a character-level and word-level double-input matrix by using word vectors;
[0050] Wherein, the double-input matrix refers to the character mapping matrix w C and phrase mapping matrix w p .
[0051] Step 202, inputting the local feature vectors of the automatically extracted text through convolution and downsampling operations;
[0052] Step 203, adding the concatenated and fused feature vectors to the Softmax classifier to realize the classification of Chinese short texts.
Embodiment 2
[0054] image 3 It is a schematic flow diagram of the specific implementation of step 201 in Embodiment 1 of the present invention, as image 3 As shown, step 201 in the first embodiment includes:
[0055] Step 301, preprocessing the text, including removing a large number of meaningless symbols, and retaining mixed comments;
[0056] Wherein, the mixed comments may be comments in Chinese, English or other languages.
[0057] Step 302, use the word embedding vector set obtained from large-scale corpus training, denoted as VT; perform vectorized representation for each component unit in CF and PF, and obtain the character mapping matrix w C and phrase mapping matrix w p .
[0058] Among them, the character level feature (Char Level Feature, CF): that is, pinyin represents a sequence; the word level feature (Phrase Level Feature, PF): that is, a phrase represents a sequence.
[0059] Among them, the calculation formula is as follows:
[0060] W C =VT·idx(CF),W P =VT·idx(...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com