Distributed pattern recognition training method and system

a pattern recognition and distribution pattern technology, applied in the field of distribution pattern recognition training system and method, can solve the problems of large amount of data required for training these speech recognition systems, large amount of data required for speech recognition applications, and enormous quantity of data

Active Publication Date: 2006-01-19
AURILAB
View PDF11 Cites 157 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Training these speech recognition systems requires a large amount of data.
In addition to the speech recognition itself, some applications of speech recognition also require large amounts of data.
Fortunately, an enormous quantity of data is potentially available.
Unfortunately, most pattern recognition methods are not able to cope with such enormous quantities of data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed pattern recognition training method and system
  • Distributed pattern recognition training method and system
  • Distributed pattern recognition training method and system

Examples

Experimental program
Comparison scheme
Effect test

first embodiment

[0050] Secondly, the first embodiment is capable of using a continuing, on-going data collection process. In many applications, this on-going data collection takes place at many, physically separated sites. In such a case, it is more efficient if much of the processing can be done locally at the data collection sites.

[0051] In large scale implementations, each peripheral site and especially the central processing node may themselves be large networked data centers with many processors each. The functional characteristic that distinguishes the peripheral data sites from the central node is that, as part of the first embodiment, peripheral sites do not need to communicate directly with each other, whereby they only need to communicate with the central node. Within any one site or the central node, all the processors within a multiprocessor implementation may communicate with each other without restriction.

[0052] The first embodiment permits a large collection of data observations at ...

second embodiment

[0108] In block 330, in the second embodiment, the functionals φj(.) have already been communicated, so a candidate solution can be specified just by communicating its set of weights {wj}. The central node communicates the weights to the peripheral sites.

[0109] In block 340, a plurality of data items are obtained at each peripheral site. In the second embodiment, data collection is an on-going process, so that in a given iteration there may be new data items at some peripheral sites that were not known during previous iterations.

[0110] In block 350, which is an instance of block 150 from FIG. 1, statistics are computed at the peripheral sites, to be communicated back to the central node. In particular, for each peripheral node P, block 350 computes for the second embodiment shown in FIG. 3 the quantity ∑i∈IP⁢ ⁢g′⁡(wj⁢φj⁡(x->)⁢ ⁢yi-di)⁢∂ξi∂wj.(0.7)

[0111] Essentially, expression (0.7) tells the central node the net change in the objective function for a change in the weight wj, s...

third embodiment

[0119] In the third embodiment, the optimization problem takes the form Minimize:E=∑j⁢ ⁢wj+C⁢∑i⁢ ⁢ξi⁢⁢SubjectTo:∀i⁢∑j⁢ ⁢wj⁢ai,j+ξi≥di;ξi≥0; ⁢ai,j=φj⁡(x->i)⁢ ⁢yi(0.9)

[0120] Thus, the optimization problem is a (primal) linear programming problem whose dual is given by Maximize:D=∑i⁢ ⁢di⁢λi⁢⁢SubjectTo: ⁢∀j⁢∑i⁢ ⁢λi⁢ai,j≤ ⁢1; ⁢∀i⁢0≤ ⁢λi≤ ⁢C;⁢⁢ai,j= ⁢φj⁡(x->i)⁢ ⁢y⁢ i(0.10)

[0121] Block 110 of FIG. 4 is the same as block 110 of FIG. 1. It provides for data communication between a central node and a plurality of peripheral sites in order to compute solutions to a series of problems of the form (0.9) and (0.10).

[0122] In block 415, at least one kernel function is communicated from the central node to the peripheral sites. These kernel functions are communicated to the peripheral sites in this embodiment so that each particular peripheral site will be able to form new functionals from the data items obtained at that particular peripheral site.

[0123] In block 420, an initial set of da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A distributed pattern recognition training method includes providing data communication between at least one central pattern analysis node and a plurality of peripheral data analysis sites. The method also includes communicating from the at least one central pattern analysis node to the plurality of peripheral data analysis a plurality of kernel-based pattern elements. The method further includes performing a plurality of iterations of pattern template training at each of the plurality of peripheral data analysis sites.

Description

RELATED APPLICATIONS [0001] This application claims priority to provisional patent application 60 / 587,874 entitled “Distributed Pattern Recognition Training System and Method,” filed Jul. 15, 2004, which is incorporated in its entirety herein by reference.BACKGROUND OF THE INVENTION [0002] 1. A. Field of the Invention [0003] The invention relates to a distributed pattern recognition training system and method. [0004] 2. B. Description of the Related Art [0005] In recent years, speech recognition systems have become capable of recognizing very large vocabularies, exceeding 200,000 words in some cases. Training these speech recognition systems requires a large amount of data. Thousands of hours of spoken training data may be used to train the acoustic models for a large vocabulary speech recognizer and billions of words of text may be used to train the language context models. In addition to the speech recognition itself, some applications of speech recognition also require large amou...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G10L15/28
CPCG10L15/063G10L15/30G10L15/08
Inventor BAKER, JAMES K.
Owner AURILAB
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products