Short text clustering equipment and short text clustering method

A clustering method and short text technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of not being able to find and use, reduce the classification or clustering effect, etc., and achieve the effect of accurate clustering

Active Publication Date: 2012-12-19
数据堂(北京)科技股份有限公司
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the method described in Reference 1 cannot discover and utilize emerging intrinsic themes in short texts, which will reduce the effect of classification or clustering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text clustering equipment and short text clustering method
  • Short text clustering equipment and short text clustering method
  • Short text clustering equipment and short text clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] In the following, the principle and implementation of the present invention will become apparent by describing specific embodiments of the present invention in conjunction with the accompanying drawings. It should be noted that the present invention should not be limited to the specific examples described below. In addition, detailed descriptions of well-known technologies not directly related to the present invention are omitted for brevity.

[0055] image 3 is a block diagram showing a short text clustering device 30 according to an embodiment of the present invention. Such as image 3 As shown, the short text clustering device 30 includes a topic analysis unit 310 , a vector generation unit 320 and a clustering unit 330 .

[0056] The theme analysis unit 310 performs theme analysis on each text in the auxiliary text set and the short text set to obtain respective themes. In a specific embodiment, the topic analysis unit 310 adopts such as Figure 4 The DLDA mod...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides short text clustering equipment which comprises a subject analysis unit, a vector generating unit and a clustering unit, wherein the subject analysis unit is used for conducting subject analysis on each text in an auxiliary text collection and a short text collection, thereby obtaining the possibilities that each short text in the short text collection is corresponding to a subject of the auxiliary text collection and the subject of the short text collection; the vector generating unit is used for conducting normalization on the possibilities that each short text is corresponding to the subject of the auxiliary text collection and the subject of the short text collection so as to generate a vector; and the clustering unit is used for clustering the short texts in the short text collection based on the generated vector. Meanwhile, the invention further provides a short text clustering method. According to the short text clustering equipment and the short text clustering method, the independent finding of the auxiliary text subject and the short text subject can be realized, thereby clustering the short texts more accurately.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a short text clustering device and method. Background technique [0002] With the wide application of SMS, Weibo, search engines, online advertisements, etc., short texts are used more and more frequently by people. These texts are usually short. For example, a text message cannot exceed 70 characters, and the results returned by search engines Usually only a few dozen words. [0003] There is a big difference between short text and long text (such as news). For example, in a long text environment, a topic can be fully described, so people can learn almost everything about the topic from this long text. Different from this, because the number of words in the short text is limited, only the core content of the topic is usually described, and a lot of relevant information is omitted. [0004] Traditional text mining methods are usually aimed at long texts, but they wil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 赵凯胡长建王大亮许洪志
Owner 数据堂(北京)科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products