Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Vocal print recognition method based on self-attention and transfer learning

A transfer learning and voiceprint recognition technology, applied in speech analysis, instruments, etc., can solve the problems of low accuracy of voiceprint recognition, lack of generalization ability of real-world applications, etc., to achieve strong generalization ability, expand generalization ability, The effect of reducing the amount of audio data

Active Publication Date: 2020-02-28
CHINA SCI INTELLICLOUD TECH CO LTD
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, the accuracy of voiceprint recognition based on traditional methods is low, while voiceprint recognition based on deep learning relies too much on massive, high-latitude, high-quality voice data, and both are vulnerable to environmental noise, reverberation and audio channels. Influence, lack of generalization ability for real-world applications

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Vocal print recognition method based on self-attention and transfer learning
  • Vocal print recognition method based on self-attention and transfer learning
  • Vocal print recognition method based on self-attention and transfer learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] A voiceprint recognition method based on self-attention and transfer learning, which obtains open-source English speech data and constructs a first-level basic data set; obtains open-source Chinese speech data and constructs a second-level basic data set; collects voice data of application scenarios and constructs application scenarios dataset; such as Image 6 As shown, based on the attention model and the first-level basic data set, the first-level basic model is trained; then, on the second-level basic data set, the first-level basic model is migrated and fine-tuned to obtain the second-level basic model; finally, in the specific Based on the application scenario data, migrate and fine-tune the secondary basic model to obtain the final model suitable for the specific application scenario. Cascade fine-tuning not only learns the robustness of noise, reverberation, and channels, but also learns the pronunciation characteristics of Chinese and the recognition ability th...

Embodiment 2

[0032] This embodiment optimizes on the basis of embodiment 1, obtains massive open-source English voice data (sitw, voxceleb1, voxceleb2, etc.), and builds a first-level voiceprint basic data set; this data set is collected under unconstrained conditions and has a large Good noise, reverberation, channel robustness.

[0033] Obtain a large amount of open source Chinese speech data (aishell, primewords, st-cmds, thchs30, etc.), and construct a secondary voiceprint basic data set; this data set is a Chinese data set, which can better adapt to the pronunciation characteristics of Chinese.

[0034] Collect a small amount of voice data in application scenarios to build an application scenario voiceprint data set; this data set is collected in real application scenarios, which can better match the actual application scenarios.

[0035] Other parts of this embodiment are the same as those of Embodiment 1, so details are not repeated here.

Embodiment 3

[0037] This embodiment is optimized on the basis of embodiment 1 or 2, as figure 1 , figure 2 As shown, data enhancement in the time domain and frequency domain is performed on the first-level basic data set, the second-level basic data set, and the application scenario data set. Such as figure 1 As shown, the time domain audio data is enhanced; in the time domain, the rhythm and pitch are controlled, the audio speed is adjusted, and random noise is added. Such as figure 2 As shown, the audio data in the frequency domain is enhanced; in the frequency domain, Vocal Tract Length Perturbation is used to apply a random distortion factor to the spectral characteristics of each audio.

[0038] The invention obtains English and Chinese public data sets, collects a small amount of application scene data sets, and enhances them from two dimensions of time domain and frequency domain. For all data sets, data enhancement in the time domain and frequency domain is carried out, which...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a vocal print recognition method based on self-attention and transfer learning. The vocal print recognition method comprises the steps that open source English speech data areobtained, and a first-level basic data set is constructed; open source Chinese speech data are obtained, and a second-level basic data set is constructed; application scene speech data are collected,and an application scene data set is constructed; a first-level basic model is trained based on an attention model and the first-level basic data set; then, on the second-level basic data set, migration fine-tuning training is carried out on the first-level basic model to obtain a second-level basic model; and finally, on the concrete application scene data, the second-level basic model is migrated and fine-tuned to obtain a final model adapting a concrete application scene. According to the vocal print recognition method based on self-attention and transfer learning, the robustness of noise,reverberation and channels is learned, Chinese pronunciation features and the recognition ability which is more adapted to a real application scene are learned, the robustness of the noise, the reverberation and the channels is provided, and the application of the real scene is met well.

Description

technical field [0001] The invention belongs to the technical field of voiceprint recognition, in particular to a voiceprint recognition method based on self-attention and transfer learning. Background technique [0002] Biometric technology is an identification technology that relies on human body characteristics for identity verification. Because of its characteristics of no loss, no forgetting, uniqueness, invariance, good anti-counterfeiting performance and convenient use, it is widely used in access control, time attendance, finance, public safety and terminal electronic equipment. [0003] Voice Print Recognition (Voice Print Recognition), as a kind of biometric identification, is a service for identifying the speaker based on the vocal characteristics of the speaker. Its identity recognition has nothing to do with accent, has nothing to do with language, is non-contact, and has a natural way of realization. It has received extensive attention and application in recen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L17/00G10L17/20
CPCG10L17/00G10L17/20
Inventor 高登科
Owner CHINA SCI INTELLICLOUD TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products