Method and device for classifying texts, storage medium and processor

A text and classification model technology, applied in the computer field, can solve problems such as few words, sparse feature matrix, lack of semantics, word order, etc., and achieve the effect of improving the effect

Active Publication Date: 2020-04-10
BEIJING GRIDSUM TECH CO LTD
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since query questions are generally short and irregular, there are very few words that can be provided after preprocessing such as word segmentation and stop word removal, and the constructed feature matrix is ​​very sparse, coupled with the lack of information such as semantics and word order, which leads to classification The effect is not ideal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for classifying texts, storage medium and processor
  • Method and device for classifying texts, storage medium and processor
  • Method and device for classifying texts, storage medium and processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The specific implementation manners of the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific implementation manners described here are only used to illustrate and explain the embodiments of the present invention, and are not intended to limit the embodiments of the present invention.

[0026] An aspect of an embodiment of the invention provides a method for classifying text. figure 1 It is a flowchart of a method for classifying text provided by an embodiment of the present invention. Such as figure 1 As shown, the method includes the following contents.

[0027] In step S10, the text to be classified is segmented. Wherein, the text to be classified may be a short text.

[0028]In step S11, the word vectors corresponding to each word obtained by segmenting the text to be classified are determined based on the word vector model, and the word vectors correspon...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a method and a device for classifying texts, a storage medium and a processor, and belongs to the technical field of computers. The method comprises the stepsof performing word segmentation on a to-be-classified text; determining a word vector corresponding to each word obtained by performing word segmentation on the to-be-classified text based on a word vector model, and forming a matrix by the word vectors corresponding to the words belonging to one sentence; processing each matrix based on the sentence vector model to obtain a sentence vector corresponding to each matrix; and processing each sentence vector based on the sentence classification model to obtain a category score vector corresponding to each sentence vector, and determining the typeof the sentence vector corresponding to the category score vector according to each category score vector to realize classification of the to-be-classified text. Therefore, the defects that the constructed word frequency or feature matrix is very sparse and the relationship between words is ignored when the short texts are classified are overcome, and the effect of classifying the texts is improved.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a method and device, a storage medium and a processor for classifying text. Background technique [0002] Text classification is a basic task in natural language processing, including sentence-level and chapter-level text classification, namely short text classification and long text classification. Text classification is widely used, and common application scenarios include spam classification, sentiment analysis, and news topic classification. For short text classification, the main application is automatic question answering system and query classification in search engines. Traditional text classification methods first perform text preprocessing, then feature extraction, and then classifier selection and training. Text preprocessing is usually to segment the text, remove stop words, part-of-speech tagging, etc. Conventional features usually use TF-IDF, and also i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F40/289
CPCY02D10/00
Inventor 戚成琳
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products