A text classification method and text classification device

A text classification and text technology, applied in the field of text processing, can solve the problems of limited space for algorithm improvement, sparse data, difficult to achieve domain migration, etc., and achieve the effect of good domain adaptability

Active Publication Date: 2021-12-07
CHINA UNIONPAY
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, both statistical learning methods and deep learning methods have their flaws
The former's over-reliance on feature selection leads to limited improvement space for subsequent algorithms, and discretized features often lead to problems such as data sparseness and semantic gaps; the latter has a black-box structure, and end-to-end learning methods are difficult to generalize and are difficult to Difficult to achieve domain transfer, extremely dependent on the size of training data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text classification method and text classification device
  • A text classification method and text classification device
  • A text classification method and text classification device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] Introduced below are some of the various embodiments of the invention, intended to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of protection.

[0055]The purpose of this program is to propose a text classification method and text classification system based on multi-dimensional feature selection for the existing text classification methods such as data sparseness and model generalization. The main technical idea of ​​the present invention is to first perform conventional NLP preprocessing on the user dialogue text, such as word segmentation, part-of-speech tagging, and stop word removal, and then extract the n-gram features, Word Embedding features, and dependent syntactic relationship triplets in the dialogue text. The group features are spliced ​​and input into the neural network classification system, and finally the probability corresponding to the classification l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a text classification method and a text classification device. The method comprises the following steps: an NLP preprocessing step, which analyzes the user dialogue text with a natural language processing method, and obtains a word set and semantic annotation results about the user dialogue text; a multidimensional feature selection step, for the word set and Semantic annotation results are combined according to various rules to obtain a vectorized representation of the semantic information contained in the user dialogue text; and a classification step is to calculate a probability estimation value for the user dialogue classification obtained in the multi-dimensional feature selection step. According to the text classification method and text classification system of the present invention, the advantages of statistics and deep learning methods can be integrated, and a text classification solution oriented to customer needs can be realized through multi-dimensional feature selection.

Description

technical field [0001] The invention relates to text processing technology, in particular to a text classification method and a text classification device. Background technique [0002] At present, the implementation schemes of text classification technology are mainly divided into statistical learning methods and deep learning methods. The former is mainly based on the feature selection method. The word and sentence-level features of the text are selected through indicators such as TF-IDF, PMI, and chi-square value, and the feature vector representing the text is obtained, and the feature vector is obtained by machine learning. The probability of each label is used as the final classification standard; the latter is mainly based on model construction, using the discrete information of the text as input, through the serial and parallel structure of the multi-layer neural network, supplemented by the back propagation algorithm to update the network weight , directly get the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/36G06F40/30G06K9/62
CPCG06F16/35G06F16/36G06F40/30G06F18/241
Inventor 佘萧寒姜梦晓万四爽费志军王宇张莉敏张琦邱雪涛乐旭刘想
Owner CHINA UNIONPAY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products