Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A short text classification method and device

A classification method and short text technology, applied in text database clustering/classification, unstructured text data retrieval, instruments, etc., can solve the problems of short text features sparse, noisy, inaccurate classification, etc., to reduce complexity , to overcome the effect of sparse features

Active Publication Date: 2019-09-10
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides an efficient short text classification method and device to solve the technical problems of inaccurate classification caused by sparse short text features and large noise in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A short text classification method and device
  • A short text classification method and device
  • A short text classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0047] The embodiment of the present invention provides a short text classification method, including the following steps:

[0048] Step 1, perform word segmentation preprocessing on the short text to be classified, and obtain the extended words of each word obtained by word segmentation;

[0049] Step 2, obtain the weight value of each word and its extended words according to the constructed word item set;

[0050] Step 3, according to the weight value, use ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short text classification method and device. The method includes: performing word segmentation preprocessing on the short text to be classified, and obtaining the extended words of each word obtained by word segmentation; obtaining the weight value of each word and its extended words according to the pre-built word item set; according to the weight value, using The multi-category SVM classification model obtains the probability of each category that the short text belongs to; determines the category to which the short text belongs according to the preset probability classification model. The short text classification method provided by the present invention overcomes the problem of sparse features of short texts, effectively reduces the complexity of adopting multi-classification models, and is more in line with practical applications.

Description

technical field [0001] The invention relates to the field of computer natural language processing, in particular to a short text classification method and device. Background technique [0002] With the rapid development of network technology, the Internet has become the carrier of massive information, and the content created by users has become an important data source on the Internet. Especially after the promotion of mobile applications such as Weibo, WeChat, and shopping, the number of short texts based on Weibo, WeChat, QQ chat, and product reviews is growing explosively. Various forms of short texts have become channels of information communication and means of emotional communication for all walks of life in our country, and have profoundly changed the communication methods and living habits of hundreds of millions of Chinese. [0003] The amount of short text data is extremely large, and the data contains people's various views and positions on various social phenome...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06K9/62
CPCG06F16/353G06F18/2411G06F18/2415
Inventor 佟玲玲杜翠兰钮艳李鹏霄易立段东圣查奇文刘晓辉柳毅
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products