Parallel data processing method based on latent dirichlet allocation model

A technique of implicit Dirichlet and distribution model, applied in the direction of electric digital data processing, special data processing applications, instruments, etc., can solve problems such as high data sparsity, large loss of information, unfavorable text information processing, etc.
CN101359333AInactive Publication Date: 2009-02-04INST OF SOFTWARE - CHINESE ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
INST OF SOFTWARE - CHINESE ACAD OF SCI
Publication Date
2009-02-04
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a parallel data processing method based on the hidden Dirichlet distribution model, which belongs to the data mining field. The method includes three solutions, including the multi-process parallel processing, the multi-thread parallel processing and the composite multi-process multi-thread processing; the data DM for being processed is divided into data segments in equal or unequal length in the three solutions; each data segment is provided with an index; each computer process / thread processes the corresponding data segment through applying the index, so as to obtain the subject information of each data item and generate the local sufficient statistic; when the whole DM is processed, the global sufficient statistic is obtained through the merge of the local sufficient statistics so that the current Mi model is obtained through the estimation until the model becomes convergence. The parallel data processing method can utilize the multi-kernel parallel frame of a single computer and the cluster large-scale parallel capability of multi-computer to realize the high-speed processing of the large-scale text sets and effectively reduce the memory usage during the parallel processing process.
Need to check novelty before this filing date? Find Prior Art

Description

Technical field

[0001] The invention relates to a text data mining method, in particular to an efficient data processing method based on implicit topic text representation, and belongs to the field of computer data mining. Background technique

[0002] Computer data mining

[0003] Computer data mining refers to the intelligent information processing process that uses computers to obtain effective, useful and understandable information or knowledge from a large amount of data. The early computer data mining mainly focused on the mining of regular numerical data in the database system. With the continuous expansion of the Internet scale and the great enrichment of applications, computer data mining has gradually turned to Internet information processing. The data carried on the Internet is very different from the data in the database system: First, the data on the Internet is mainly text written in natural language, while the data in the database system is mainly numerical; second...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More