Text clustering method based on weak supervised deep learning

A text clustering and deep learning technology, applied in the field of text clustering based on weakly supervised deep learning, can solve problems such as wrong input of similar words, large proportion of irrelevant words, high cost, unreasonable, etc., to improve image recognition Accuracy, improved representation, and enhanced description

Inactive Publication Date: 2019-04-05
HANGZHOU DIANZI UNIV
View PDF3 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention has carried out sufficient experiments on Microsoft's click data Clickture. The data set contains more than 480,000 texts. In these texts, wrong input, similar words, and irrelevant words account for a very large proportion. If manual cleaning and screening are considered, it is obviously costly. huge and unreasonable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text clustering method based on weak supervised deep learning
  • Text clustering method based on weak supervised deep learning
  • Text clustering method based on weak supervised deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The present invention will be further described in detail below in conjunction with the accompanying drawings.

[0049] Such as figure 1 As shown, the present invention provides a text clustering method based on deep learning with weak supervision.

[0050] The image data set with text click data described in step (1), using image visual information and category labels, using image amplification and image clustering operations to construct an image class click feature matrix for each text, as follows:

[0051] 1-1. We conducted experiments on the dog classification data set (Clickture-Dog) provided by Microsoft; in addition, we also constructed a bird classification data set (Bird) in the Clickture data set, and fully verified it on the two data our method. We cleaned the raw data. Take the Clickture-Dog dataset as an example. The dataset has 344 pictures of dogs. We filter the categories with less than 5 pictures, and finally get 283 groups with a total of 95041 pic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text clustering method based on weak supervised deep learning. The method comprises the following steps: (1) by means of an image data set with text click information, imagevisual information and image category labels are utilized, and adopting image amplification and clustering to construct an image category click characteristic matrix of each text; And (2) obtaining asmooth image click feature map on the initial class click matrix by using a sorting and propagation method. Performing text clustering on the feature map to obtain an initial text category, and initializing text weight by utilizing click priori; (3) under the condition of minimizing an intra-class mean square error, building a deep text clustering model to learn deep text characteristics; (4) performing joint optimization on the depth model and the text weight by using a weak supervised learning method, and iteratively updating the depth model and the text weight; (5) deep text features are extracted through the deep text model, and K-based text feature extraction is achieved. And clustering the means method. The method has very high universality, and the semantic gap in image recognitionis effectively solved.

Description

technical field [0001] The invention relates to the fields of fine-grained image classification and text clustering, in particular to a text clustering method based on deep learning with weak supervision. Background technique [0002] Text is a word, phrase or sentence that contains semantic information. In life and work, people often use text to express their thoughts and feelings. Exploding text information makes text clustering tasks especially important. The present invention proposes an improved text clustering method and applies this method in fine-grained image classification to learn compact text representations for images. This method solves the "semantic gap" problem of images to a certain extent and improves the accuracy of image recognition. [0003] Recently, researchers proposed to model the relationship between text and images using the click data class. Click data is images and corresponding text feedback information collected by search engines (such as G...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/045G06F18/23213G06F18/214
Inventor 谭敏俞俊张海超
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products