Text classification method and text classification device

A text classification and text technology, applied in the field of text processing, can solve problems such as limited algorithm improvement space, semantic gap, data sparseness, etc., and achieve the effect of good domain adaptability

Active Publication Date: 2018-07-20
CHINA UNIONPAY
View PDF11 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, both statistical learning methods and deep learning methods have their flaws
The former's over-reliance on feature selection leads to limited improvement space for subsequent algorithms, and discretized features often lead to problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and text classification device
  • Text classification method and text classification device
  • Text classification method and text classification device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] Introduced below are some of the various embodiments of the invention, intended to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of protection.

[0055]The purpose of this program is to propose a text classification method and text classification system based on multi-dimensional feature selection for the existing text classification methods such as data sparseness and model generalization. The main technical idea of ​​the present invention is to first perform conventional NLP preprocessing on the user dialogue text, such as word segmentation, part-of-speech tagging, and stop word removal, and then extract the n-gram features, Word Embedding features, and dependent syntactic relationship triplets in the dialogue text. The group features are spliced ​​and input into the neural network classification system, and finally the probability corresponding to the classification l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a text classification method and a text classification device. The method comprises the following steps: an NLP (Natural Language Processing) pre-processing step, wherein analysis of a natural-language processing method is carried out on user dialogue text to obtain a word set and semantic labeling results about the user dialogue text; a multi-dimensional-feature selectionstep, wherein combination is carried out for the word set and the semantic labeling results according to a plurality of rules to obtain a vectorized characterization form of semantic information contained by the user dialogue text; and a classification step, wherein probability estimation values are calculated for user dialogue classes obtained by the multi-dimensional-feature selection step. According to the text classification method and the text classification system of the invention, the advantages of counting and a deep-learning method can be integrated, and a customer demand-oriented text classification solution can be realized through multi-dimensional-feature selection.

Description

technical field [0001] The invention relates to text processing technology, in particular to a text classification method and a text classification device. Background technique [0002] At present, the implementation schemes of text classification technology are mainly divided into statistical learning methods and deep learning methods. The former is mainly based on the feature selection method. The word and sentence-level features of the text are selected through indicators such as TF-IDF, PMI, and chi-square value, and the feature vector representing the text is obtained, and the feature vector is obtained by machine learning. The probability of each label is used as the final classification standard; the latter is mainly based on model construction, using the discrete information of the text as input, through the serial and parallel structure of the multi-layer neural network, supplemented by the back propagation algorithm to update the network weight , directly get the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27G06K9/62
CPCG06F16/35G06F16/36G06F40/30G06F18/241
Inventor 佘萧寒姜梦晓万四爽费志军王宇张莉敏张琦邱雪涛乐旭刘想
Owner CHINA UNIONPAY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products