Short text classification method and apparatus

A classification method and short text technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of sparse features, large noise, and inaccurate classification of short texts, to overcome sparse features and reduce complexity. Effect

Active Publication Date: 2016-08-31
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF5 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides an efficient short text classification method and device to solve the technical problems of inaccurate classification caused by sparse short text features and large noise in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text classification method and apparatus
  • Short text classification method and apparatus
  • Short text classification method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0047] The embodiment of the present invention provides a short text classification method, including the following steps:

[0048] Step 1, perform word segmentation preprocessing on the short text to be classified, and obtain the extended words of each word obtained by word segmentation;

[0049] Step 2, obtain the weight value of each word and its extended words according to the constructed word item set;

[0050] Step 3, according to the weig...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a short text classification method and apparatus. The method comprises the steps of performing word segmentation preprocessing on to-be-classified short texts and obtaining an extended word of each word obtained by word segmentation; obtaining weight values of each word and the extended word of each word according to a pre-constructed lexical item set; according to the weight values, obtaining a probability of each type that a short text belongs to by utilizing a plurality of SVM classification models; and determining the type that the short text belongs to according to a preset probability classification model. According to the short text classification method, the problem of short text characteristic sparsity is solved, the complexity due to the adoption of multiple classification models is effectively lowered, and actual application requirements are better met.

Description

technical field [0001] The invention relates to the field of computer natural language processing, in particular to a short text classification method and device. Background technique [0002] With the rapid development of network technology, the Internet has become the carrier of massive information, and the content created by users has become an important data source on the Internet. Especially after the promotion of mobile applications such as Weibo, WeChat, and shopping, the number of short texts based on Weibo, WeChat, QQ chat, and product reviews is growing explosively. Various forms of short texts have become channels of information communication and means of emotional communication for all walks of life in our country, and have profoundly changed the communication methods and living habits of hundreds of millions of Chinese. [0003] The amount of short text data is extremely large, and the data contains people's various views and positions on various social phenome...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/353G06F18/2411G06F18/2415
Inventor 佟玲玲杜翠兰钮艳李鹏霄易立段东圣查奇文刘晓辉柳毅
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products