Processing method for text clustering, server and system

A processing method, text clustering technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of high resource consumption, network bottleneck, large memory overhead, etc., to achieve memory reduction and resource reduction Consumption and time consumption, size reduction effect

Active Publication Date: 2016-11-23
SHENZHEN TENCENT COMP SYST CO LTD
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] In view of this, the present invention provides a text clustering processing method and system, which ar...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Processing method for text clustering, server and system
  • Processing method for text clustering, server and system
  • Processing method for text clustering, server and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to further explain the technical means and effects of the present invention to achieve the intended purpose of the invention, the specific implementation, structure, features and effects of the present invention will be described in detail below in conjunction with the accompanying drawings and preferred embodiments.

[0041] see Figure 1a , is a schematic diagram of the server system in the embodiment of the present invention, including a first server and a plurality of second servers, wherein the first server randomly assigns a topic to each word in each text in the preprocessed text collection from the topic collection , distributing each text in the text set after the subject is assigned to a plurality of second servers, and establishing an initial mapping relationship of each word in the text distributed by the plurality of second servers, and sending them to the plurality of second servers respectively The server, the initial mapping relationship includes...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a processing method for text clustering, a server and a system. The method comprises the steps that one theme is randomly distributed for each word in texts of a preprocessed text set from a theme set, the texts in the text set with distributed themes are distributed to multiple second servers, the initial mapping relation of each word in the texts distributed for the multiple second servers is established, clustering results of the texts in the text set are determined according to the updated theme of each word of the texts distributed for the second servers and fed back by the second servers, and the updated theme of each word is sampled and calculated by the second servers based on an improved Gibbs sampling algorithm according to the initial mapping relation of the words on the second servers. The data volume processed by the second servers and memory consumption can be effectively reduced and network bottlenecks can be avoided by determining the word mapping relation and using a matrix of a dense data structure and the improved Gibbs sampling algorithm.

Description

technical field [0001] The invention relates to the field of text clustering, in particular to a text clustering processing method, server and system. Background technique [0002] With the popularization and development of Internet technology and database technology, people can easily acquire and store large amounts of data. Most of the data in reality exists in the form of text. As a means, text clustering can organize, summarize and navigate text information, which helps to accurately obtain the required information from a large number of text information resources. Therefore, text clustering It has gained widespread attention in recent years. [0003] Text clustering algorithm is a kind of main text data mining processing method in the field of machine learning, information retrieval, etc., and it is one of the main ways to solve the overload of Internet text information. Its purpose is to organize Internet text collections according to the principle of "like flock tog...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 邓雪娇陆中振
Owner SHENZHEN TENCENT COMP SYST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products