Unlock instant, AI-driven research and patent intelligence for your innovation.

Short text classification method and device

A classification method and short text technology, applied in text database clustering/classification, unstructured text data retrieval, character and pattern recognition, etc., can solve problems such as inability to classify correctly, inability to expand and build short text, and achieve improvement The effect of classification efficiency and accuracy

Active Publication Date: 2020-04-07
TENCENT TECH (SHENZHEN) CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, on the one hand, existing technical solutions are limited by the contextual mapping relationship between short texts and external corpora. When the mapping between short texts and external corpora is inaccurate, the accuracy of short text classification will be affected.
In addition, the accuracy of short text classification is also subject to the accuracy of the classification effect of the external corpus itself
At present, the category system for classifying short texts needs to be established in advance based on external corpora, and cannot be extended for short texts; every time a short text is classified, it needs to be mapped to a larger external corpus for classification, which can only be calculated offline. It is impossible to classify short texts in real time. When the data in short texts is unevenly distributed, the mapping process between short texts and external corpora will be seriously affected, resulting in complete inability to classify correctly
[0005] For the problem that short texts can only be classified offline in related technologies, no effective solution has been proposed so far

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text classification method and device
  • Short text classification method and device
  • Short text classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] According to an embodiment of the present invention, an embodiment of a short text classification method is provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and, Although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.

[0031] The method embodiment provided in Embodiment 1 of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Take running on a computer terminal as an example, figure 2 It is a block diagram of the hardware structure of the computer terminal according to the short text classification method of the embodiment of the present invention. Such as figure 2 As shown, the computer terminal 100 may include one or more (only one is shown in the figure)...

Embodiment 2

[0108] According to an embodiment of the present invention, a device for implementing the above short text classification method is also provided. Figure 11 is a schematic diagram of a short text classification device according to the first embodiment of the present invention, such as Figure 11 As shown, the device includes: a word segmentation unit 10 , an extraction unit 20 , a vector unit 30 , a clustering unit 40 and a classification unit 50 .

[0109] The word segmentation unit 10 is configured to perform word segmentation processing on the target short text to obtain the word segmentation of the target short text.

[0110] The extraction unit 20 is configured to extract keywords of the target short text according to the word segmentation of the target short text.

[0111] The vector unit 30 is configured to perform vectorization processing on the target short text according to the keywords of the target short text to obtain the vectorized short text.

[0112] The clu...

Embodiment 3

[0132] The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the above-mentioned storage medium may be used to store the program code executed by the short text classification method in the above-mentioned embodiment.

[0133] Optionally, in this embodiment, the foregoing storage medium may be located in at least one network device among multiple network devices of the computer network.

[0134]Optionally, in this embodiment, the storage medium is configured to store program codes for performing the following steps:

[0135] The first step is to perform word segmentation processing on the target short text to obtain the word segmentation of the target short text.

[0136] The second step is to extract the keywords of the target short text according to the word segmentation of the target short text.

[0137] The third step is to perform vectorization processing on the target short text according to the keywords of the target short tex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short text classification method and device. The method includes: performing word segmentation processing on the target short text to obtain the word segmentation of the target short text; extracting keywords of the target short text according to the word segmentation of the target short text; performing vectorization processing on the target short text according to the keywords of the target short text to obtain Vectorizing the short text; performing cluster calculation on the vectorized short text to obtain a clustering result; and classifying the target short text according to the clustering result. The invention solves the technical problem in the related art that short texts can only be classified offline.

Description

technical field [0001] The present invention relates to the field of text classification, in particular to a short text classification method and device. Background technique [0002] At present, due to the characteristics of loose structure, random grammar, and large proportion of stop words in short texts, classification methods for long texts are often not applicable. The existing short text classification technology schemes mainly carry out feature expansion according to the characteristics of the short text itself. For example, by using distributed representation first, the words in the short text are projected into the external corpus using the semantic similarity model, and then the short text is enriched with contextual information. The external corpus is a large text corpus. Although the classification accuracy of this kind of short text method has been improved to a certain extent, there are great limitations in feature expansion by only using the characteristics ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289G06K9/62
CPCG06F16/35G06F40/289G06F18/24155
Inventor 钟黎
Owner TENCENT TECH (SHENZHEN) CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More