Distributed nonnegative matrix decomposition method

A non-negative matrix decomposition and distributed technology, applied in the field of distributed non-negative matrix decomposition, can solve problems such as lock waiting, algorithm convergence and execution efficiency reduction, achieve good decomposition effect, reduce lock waiting time, and improve convergence sexual effect

Inactive Publication Date: 2017-01-04
CENT SOUTH UNIV
View PDF2 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The traditional SGD parallelization algorithm is faced with a very important problem: the lock waiting problem, such as: DSGD algorithm must wait for all nodes to wait for the slowest node to finish processing before entering the next iteration, which results in the convergence and reduction in execution efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed nonnegative matrix decomposition method
  • Distributed nonnegative matrix decomposition method
  • Distributed nonnegative matrix decomposition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] Concrete realization process of the present invention is as follows:

[0026] Step 1: Read the original matrix data from the file system into the spark platform, and generate an elastic distributed dataset RDD through SparkContext. Perform a map operation on the data in the RDD, map each row of data "row_id, col_id, value" into triplet numerical data (row_id, col_id, vakue), and generate a new RDD (RDD1). Do some statistical operations on the new RRD, the basic information matrix_info of the statistical matrix, including the number of data and the total number of data contained in each row and column (total_col, total_row, total_N), and the maximum row id value of the word (max_row ) and the maximum column id (max_col).

[0027] Step 2: Divide the data blocks. If the number of computing nodes is S, the number of generated data blocks is 2S×2S. And pre-generate 2S mutually independent patterns. Read the data in RDD1, perform map() operation, and generate mode RDD2, th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed nonnegative matrix decomposition method. An existing distributed matrix decomposition algorithm DSGD (Distributed Stochastic Gradient Descent) is improved, and data skew in a program distributed execution process is reduced to effectively lower the time consumption of locking wait in the DSGD so as to improve algorithm execution efficiency. On the basis, nonnegative control and the dynamic step size therefore are added to select and design the distributed nonnegative matrix decomposition algorithm. Meanwhile, the algorithm is realized on a Spark platform. The distributed nonnegative matrix decomposition method is high in execution efficiency is high and good in astringency, and the distributed nonnegative matrix decomposition method can be favorably applied to each application field including collaborative filtering and the like.

Description

technical field [0001] The invention relates to distributed machine learning technology, in particular to a distributed non-negative matrix decomposition method. Background technique [0002] With the rapid development of Internet information technology, a large amount of network data is generated every day, how to quickly and effectively extract useful information from massive data is becoming more and more important to people. The information on the Internet includes text information such as user purchase information and news WeChat. For the convenience of processing, these information are usually processed in the form of a matrix. [0003] Non-negative matrix factorization NMF is to decompose matrix X into two non-negative factor matrices W and H, and the product of W and H is as close as possible to the original matrix X. NMF was proposed by D.D.Lee and H.S.Seung in "Nature" in 1999 and used for face recognition. At present, it is used by more and more data scientists ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/16
CPCG06F17/16
Inventor 高琰邢小兵顾磊张绍兴
Owner CENT SOUTH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products