Speaker separating model training method, two-speaker separation method, and related equipment

A speaker separation and model training technology, applied in the field of biometrics, can solve the problems of poor speaker separation effect and insufficient single Gaussian model to describe the data distribution of different speakers, so as to improve the accuracy rate, achieve the best separation effect, and reduce the The effect of the risk of reduced performance

Active Publication Date: 2018-11-06
PING AN TECH (SHENZHEN) CO LTD
View PDF4 Cites 38 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The traditional speaker separation technology using Bayesian Information Criterion (BIC) as a similarity measure can achieve better results in short-term dialogue separation tasks, but as the dialogue length increases, BIC's The single Gaussian model is not enough to describe the distribution of different speaker data, so its speaker separation effect is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speaker separating model training method, two-speaker separation method, and related equipment
  • Speaker separating model training method, two-speaker separation method, and related equipment
  • Speaker separating model training method, two-speaker separation method, and related equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] figure 1 It is a flow chart of the speaker separation model training method provided by Embodiment 1 of the present invention. According to different requirements, the execution sequence in the flow chart can be changed, and some steps can be omitted.

[0062] S11. Acquire multiple pieces of audio data of multiple people.

[0063] In this embodiment, the acquisition of the plurality of audio data may include the following two methods:

[0064] (1) An audio device (for example, a tape recorder, etc.) is set in advance, and the voices of multiple people are recorded on the spot through the audio device to obtain audio data.

[0065] (2) Obtain a plurality of audio data from the audio data set.

[0066] The audio data set is an open-source data set, such as UBM data set and TV data set. The open-source audio data set is dedicated to training the speaker separation model and testing the accuracy of the trained speaker separation model. The UBM data set and TV data set c...

Embodiment 2

[0103] figure 2 It is a flow chart of the two-speaker separation method provided by Embodiment 2 of the present invention. According to different requirements, the execution sequence in the flow chart can be changed, and some steps can be omitted.

[0104] S21. Perform preprocessing on the speech signal to be separated.

[0105] In this embodiment, the process of preprocessing the speech signal to be separated includes:

[0106] 1) Pre-emphasis processing

[0107] In this embodiment, a digital filter may be used to perform pre-emphasis processing on the speech signal to be separated, so as to enhance the speech signal of the high frequency part. The details are as follows (2-1):

[0108]

[0109] Among them, S(n) is the speech signal to be separated, a is the pre-emphasis coefficient, and generally a is 0.95, is the speech signal after pre-emphasis processing.

[0110] Due to factors such as the human vocal organ itself and the equipment for collecting voice signals...

Embodiment 3

[0143] Figure 4 It is a functional block diagram of a preferred embodiment of the speaker separation model training device of the present invention.

[0144] In some embodiments, the speaker separation model training device 40 runs in a terminal. The speaker separation model training device 40 may include a plurality of functional modules composed of program code segments. The program codes of each program segment in the speaker separation model training device 40 can be stored in a memory, and executed by at least one processor to execute (see for details figure 1 and its related description) to train the speaker separation model.

[0145] In this embodiment, the speaker separation model training device 40 of the terminal can be divided into multiple functional modules according to the functions it performs. The functional modules may include: an acquisition module 401 , a preprocessing module 402 , a feature extraction module 403 , a training module 404 , a calculation m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provided is a speaker separating model training method. The method includes steps: obtaining and preprocessing multiple pieces of audio data; extracting audio characteristics of the preprocessed audiodata; inputting the audio characteristics to a preset neural network for training to obtain vector characteristics; calculating a first similarity between a first vector characteristic and a second vector characteristic of a first speaker; calculating a second similarity between the first vector characteristic of the first speaker and a third vector characteristic of a second speaker; and calculating a loss function value of the first similarity and the second similarity, and when the loss function value is less than or equal to a preset loss function threshold, finishing a training process of a speaker separating model, and updating parameters in the model. The invention also provides a two-speaker separation method, a terminal and a storage medium. The characteristic extraction capability of the input voice data by the model can be substantially enhanced, the accuracy of two-speaker separation is improved, and better separating effect can be achieved especially in separation tasks with long-time dialogues.

Description

technical field [0001] The invention relates to the technical field of biological identification, in particular to a speaker separation model training method, a two-speaker separation method, a terminal and a storage medium. Background technique [0002] With the continuous improvement of audio processing technology, it has become a research hotspot to obtain specific human voices of interest from massive data, such as telephone recordings, news broadcasts, conference recordings, etc. Speaker separation technology refers to the process of automatically dividing and marking the speech according to the speaker in a multi-person conversation, that is, it solves the problem of "who speaks when". [0003] The two-speaker separation refers to the separation of the recordings of two speakers taking turns speaking on the same audio track to form two audio tracks, and each audio track only contains the speech recording of one of the speakers. Two speakers are widely used in many fie...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L17/00G10L17/04G10L21/0272
CPCG10L17/00G10L17/04G10L21/0272G10L17/06G10L17/18G10L21/0208
Inventor 赵峰王健宗肖京
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products