Text clustering method and device, electronic equipment and storage medium

A text clustering and text technology, applied in the field of big data, can solve the problems of complex calculation and low efficiency, and achieve the effect of solving complex calculation, simple operation and efficient clustering

Pending Publication Date: 2021-11-26
大箴(杭州)科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the above problems, the present invention proposes a text clustering method and device, electronic equipment, and storage media to at least solve the technical problems of complex calculation and low efficiency in traditional text clustering methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text clustering method and device, electronic equipment and storage medium
  • Text clustering method and device, electronic equipment and storage medium
  • Text clustering method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0032] It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that such terms are interchangeable under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text clustering method and device, electronic equipment and a storage medium. The method comprises the following steps: generating a corresponding hash signature for each text in a corpus; searching a plurality of groups of similar text pairs in the corpus based on the Hash signature; constructing the incidence relation between the multiple sets of similar text pairs, whereinany two connected texts in the incidence relation form a set of similar text pairs; and clustering the texts in the corpus by utilizing the association relationship and the text identification codes corresponding to the texts to obtain one or more category clusters corresponding to the corpus. Through the text clustering method and device, the technical problems that a traditional text clustering method is complex in calculation, low in efficiency and the like are solved.

Description

technical field [0001] The present invention relates to the field of big data, in particular to a text clustering method and device, electronic equipment, and a storage medium. Background technique [0002] Today, with the rapid increase in the amount of information on the Internet, there are more and more large-scale cluster computing resources. Among them, text data is one of the important carriers of Internet data, and the information carried on it is also extremely rich. How to get from Extracting the most valuable text from large-scale cluster computing resources has also become a key issue in today's society. Commonly used methods of operation include text clustering, which uses search engines to remove repetitive texts with high similarities in these texts, extracts and presents a variety of texts; in addition, text clustering is also widely used in scenarios such as spam detection and recommendation systems. Applications. [0003] However, in the application of lar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/33G06F40/289G06K9/62
CPCG06F16/35G06F16/3344G06F40/289G06F18/23
Inventor 迟明航
Owner 大箴(杭州)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products