A neural network embedding system for a speaker-free confirmation text
A technology of speaker confirmation and neural network, which is applied in the direction of biological neural network model, neural architecture, neural learning method, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0020] A neural network embedding system for text without speaker confirmation, such as figure 1 shown, includes a feed-forward DNN that computes speaker embeddings from variable-length segments. The structure is based on an end-to-end system. However, end-to-end approaches require a large amount of in-domain data to be effective. The end-to-end loss is replaced by a multi-class cross-entropy objective. Additionally, a separately trained PLDA backend is used to compare pairs of embeddings. This enables DNNs and similarity measures to be trained on potentially different datasets. This network can be implemented using the nnet3 neural network library in the Kaldi Speech Recognition Toolkit. The DNN can be characterized as 20-dimensional MFCCs with a frame length of 25 ms, average normalized over a sliding window of up to 3 seconds; the same energy-based VAD from segment 2 can filter out non-speech frames ; instead of stacking frames at the input, the short-term temporal con...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


