Convolutional long-and-short-term-memory end-to-end deep neural network for voice cheating detection

A deep neural network and convolutional neural network technology, applied in the field of end-to-end deep neural network based on convolutional long short-term memory, can solve the problem of low matching degree of optimal feature terminal classifiers, simplify the pipeline and improve the detection results. Effect

Inactive Publication Date: 2017-06-20
AISPEECH CO LTD
View PDF4 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] In view of the defects in the prior art that the features in the prior art can only be extracted outside the task, and are not the optimal features that can reflect the spoofing task, and the matching degree between the terminal classifier and the detected features is not high, a method for voice spoof

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Convolutional long-and-short-term-memory end-to-end deep neural network for voice cheating detection
  • Convolutional long-and-short-term-memory end-to-end deep neural network for voice cheating detection
  • Convolutional long-and-short-term-memory end-to-end deep neural network for voice cheating detection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] Such as figure 1 As shown, this embodiment relates to an end-to-end speech deception detection system based on convolutional long-short-term memory, including a CLDNN as a joint feature extractor and a classifier, and the CLDNN includes: CNN, LSTM and DNN.

[0024] In this embodiment, the original wave is used to train the classifier. Torch is used as the deep learning library of this model, and the RNN package needs to be installed to complete the LSTM model.

[0025] Each input raw wave file is first divided into 560 frames of the same size, which is equivalent to a 35ms frame window. There is 17.5ms overlap between adjacent frames (ie 50% overlap rate).

[0026] The CLDNN in this embodiment includes: the first layer CNN using 64 feature maps for standard feature extraction and the second layer CNN using 128 feature maps, the third and fourth layer LSTM for label prediction using 128 nodes and DNN layers as direct classifiers of neural networks.

[0027] This embod...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is a convolutional long-and-short-term-memory end-to-end deep neural network for voice cheating detection. The deep neural network comprises a convolutional neural network front end with long-and-short-term memory sequence mapping and a neural network direct classifier. The convolutional neural network front end includes at least two convolutional neural networks (CNN) for standard feature extraction and at least one CNN for label prediction; and the CNN for label prediction carries out label prediction according to a long-and-short-term memory sequence way. According to the invention, comprehensive capacities of all existing features are examined, so that the feature extraction process is avoided; and adaptability to different tasks by the method architecture is also improved.

Description

technical field [0001] The invention relates to a technology in the field of speech processing, in particular to a convolutional long-short-term memory-based end-to-end deep neural network (CLDNN) for speech deception detection with original wave input. Background technique [0002] Spoof detection is a branch of speaker verification that distinguishes between real (human) and artificial (deceptive) spoken utterances. The main purpose of deception detection is to compute a score for each utterance and use the score to distinguish between these two (deceptive, human) utterance categories. The score is used to calculate a threshold by which an utterance can be classified as genuine (if its score is greater than the defined threshold) or deceptive (if its score is below the threshold). [0003] Detecting deceptive speech requires features: artificial vectors aiming to represent a given utterance in a lower dimensional space, uniqueness being paramount. In traditional speech-r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/04
CPCG06N3/045
Inventor 钱彦旻俞凯D·海因里希
Owner AISPEECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products