Label extraction method based on short text clustering technology

A short text and label technology, applied in the field of information processing, can solve problems such as imperfect technical semantics and illogicality, and achieve the effect of avoiding incomplete semantics, accurate and stable clustering

Active Publication Date: 2020-07-14
BEIJING ZHICHI BOCHUANG TECH
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] Another purpose of the present invention is to provide a label extraction method based on short text clustering, which can represent all categories of short text meanings while short text clustering The la

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Label extraction method based on short text clustering technology
  • Label extraction method based on short text clustering technology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be described in detail below in conjunction with the accompanying drawings, so that those of ordinary skill in the art can implement it after referring to this specification.

[0042] Such as figure 1 As shown, a label extraction method based on short text clustering, including: S1, extracting and obtaining all useful words of short text;

[0043] S2. According to the text features of the useful vocabulary of each short text, use word2vec to calculate the similarity between the short texts, that is, assuming that each short text is a cluster heart, and calculate the similarity between each cluster heart and all other short texts, If the similarity is greater than the preset threshold T1, attribute the corresponding short text to the cluster center;

[0044] S3, performing the first pruning on each cluster formed in S2;

[0045] S4, performing a merge operation on all clusters after the first pruning;

[0046] S5, carry out the second prunin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a label extraction method based on short text clustering. The label extraction method comprises: S1, extracting and obtaining all useful vocabularies of a short text; S2, calculating the similarity among the short texts by utilizing word2vec according to the text characteristics of the useful vocabularies of each short text; S3, assuming that each short text is a cluster center, calculating the similarity between each cluster center and all other short texts, and if the similarity is greater than a preset threshold T1, classifying the corresponding short text into the cluster center; S4, trimming each cluster formed in the S3 for the first time; S5, combining all the clusters trimmed for the first time; S6, performing secondary pruning on the clusters merged in theS5; and S7, extracting the cluster center of each cluster after the second trimming as a label of each cluster. Labels which can represent meanings of all types of short texts and have complete meanings are generated while the short texts are clustered, and the problems that an existing clustering technology depends on center point selection and an existing label extraction technology is incomplete in semantics, does not accord with logic and the like are solved.

Description

technical field [0001] The invention relates to the technical field of information processing, in particular to a label extraction method based on short text clustering. Background technique [0002] With the development of the Internet and information technology, various network information shows an exponential growth trend, especially the rise of network platforms such as Weibo, which makes short text information explode again. Short text data information is sparse but focused, and cannot be eliminated as spam. How to obtain effective information from a large amount of short text data requires an effective method to improve the effect of short text clustering and hotspot discovery. At present, many platforms on the Internet use manual methods for label planning, which is not only time-consuming and laborious, but also has great limitations. For example, the coverage of manually customized labels is limited and can only contain text with fixed meanings. The text needs to ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F40/279
CPCG06F16/35
Inventor 郑赛乾吴立楠吴科
Owner BEIJING ZHICHI BOCHUANG TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products