Synthetic speech detection method based on speech segmentation

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology for synthesizing speech and detection methods, applied in speech analysis, speech recognition, instruments, etc., can solve the problem of high threat degree of ASV system, and achieve the effect of improving accuracy, improving detection accuracy, and high detection accuracy.

Active Publication Date: 2021-06-22

UNIV OF ELECTRONIC SCI & TECH OF CHINA

View PDF13 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The principle of converted speech attack is similar to that of synthetic speech attack, and it poses a greater threat to the ASV system

At the same time, these two attacks often appear in other speech recognition technology application scenarios, such as phone fraud, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0038] In order to facilitate those skilled in the art to understand the technical content of the present invention, the content of the present invention will be further explained below in conjunction with the accompanying drawings.

[0039] The present invention is divided into a training stage and a deployment stage, the training stage is carried out on the server, the deployment stage is carried out after the training stage is completed, and the data in the training stage is deployed on the voice equipment.

[0040] The training phase mainly includes two parts: data processing and model training.

[0041] Step A data preprocessing is mainly to process the input original voice signal, detect the sampling rate, and perform endpoint detection of the voice signal (to find out the beginning and end of the voice signal), voice framing (approximately considered to be voice within 10-30ms) The signal is short-term stable, and the speech signal is divided into sections for analysis)...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a synthetic speech detection method based on speech segmentation, belongs to the field of speech detection, and aims to solve the problem of low detection precision in the prior art. The method comprises the following steps of: extracting two kinds of features in an audio, namely a CQCC feature of a voiced segment of the audio and an average zero-crossing rate feature of a silent (mute) segment of the audio; and adopting two GMM models to fit the two kinds of features respectively, giving different weights to the two GMM models, carrying out testing, and finding the most appropriate weight. The detection precision of synthetic speech is obviously improved.

Description

technical field [0001] The invention belongs to the field of voice detection, in particular to a synthetic voice detection technology. Background technique [0002] With the development of artificial intelligence, embedded devices have undergone tremendous changes. The application of image recognition and face unlocking in embedded devices greatly facilitates production and life. Speech recognition, as a representative of acoustic artificial intelligence, has been more and more widely used in embedded devices such as voice assistants, voice printing and unlocking. Speech recognition technology refers to the technology that enables computers to convert voice signals into corresponding text or commands through the process of recognition and analysis. Automatic Speaker Verification (ASV) is a speech recognition technology that identifies individuals by distinguishing the speech print features of human speech. In many cases, ASV technology can replace traditional password aut...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/02G10L15/04G10L15/06G10L25/24

CPCG10L15/02G10L15/04G10L15/063G10L25/24

Inventor詹瑾瑜江维蒲治北杨永佳边晨雷洪江昱呈于安泰

OwnerUNIV OF ELECTRONIC SCI & TECH OF CHINA

Synthetic speech detection method based on speech segmentation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology