Speaker segmentation model optimization method, speaker segmentation method and speaker segmentation device

A technology of segmentation model and optimization method, applied in neural learning method, design optimization/simulation, biological neural network model, etc. Effect

Pending Publication Date: 2022-06-28
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The embodiment of the present application solves the speaker segmentation model in the prior art by providing a speaker segmentation model optimization method, speaker segmentation method and device, which includes two or more than two In the case of the speaker, there is a technical problem of low accuracy of the speaker segmentation point, and the technical effect of improving the accuracy of the speaker segmentation point is achieved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speaker segmentation model optimization method, speaker segmentation method and speaker segmentation device
  • Speaker segmentation model optimization method, speaker segmentation method and speaker segmentation device
  • Speaker segmentation model optimization method, speaker segmentation method and speaker segmentation device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0088] This embodiment provides an optimization method for a speaker segmentation model, which is applied to a UIS-RNN system and used for optimizing a GRU (Gated Recurrent Unit, gated recurrent unit) model in the UIS-RNN system.

[0089] like figure 1 As shown, the optimization method of the speaker segmentation model (taking one update of GRU parameters as an example) includes:

[0090] Step S101: Acquire first voice stream data and second voice stream data, where the first voice stream data corresponds to the first speaker, and the second voice stream data corresponds to the second speaker.

[0091] In a specific implementation process, the voice stream data of at least two different speakers can be collected in real time.

[0092] For example, if the first speaker is speaker a and the second speaker is speaker b, the first voice stream data of speaker a and the second voice stream data of speaker b can be collected in real time.

[0093] Step S102: Based on the first voi...

Embodiment 2

[0146] Based on the same inventive concept, this embodiment provides a speaker segmentation method, such as image 3 shown, including:

[0147]Step S201: acquiring third voice stream data;

[0148] Step S202: Input the third voice terminal into the target speaker segmentation model to obtain a speaker segmentation result. Wherein, the target speaker segmentation model is obtained based on any implementation in the first embodiment.

[0149] In the specific implementation process, the third voice stream data may be the voice stream data collected in real time. After the third voice stream data is input into the target speaker segmentation model, the target speaker segmentation model can identify the third voice stream data. The identities of different speakers in the device are divided, and the third speech data is segmented based on the identities of the different speakers to obtain speech segment data corresponding to each speaker.

[0150] The technical solutions in the a...

Embodiment 3

[0153] Based on the same inventive concept, this embodiment provides an optimization device for a speaker segmentation model, such as Figure 4 shown, including:

[0154] a first acquiring unit 301, configured to acquire first voice stream data and second voice stream data, where the first voice stream data corresponds to the first speaker, and the second voice stream data corresponds to the second speaker;

[0155] a first obtaining unit 302, configured to obtain a first error function item of a target contrast error function based on the first voice stream data; wherein, the first error function item is an object to be minimized;

[0156] A second obtaining unit 303, configured to obtain a second error function term of the target contrast error function based on the first voice stream data and the second voice stream data; wherein the second error function term is the largest object;

[0157] a third obtaining unit 304, configured to obtain the target contrast error functi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speaker segmentation model optimization method and device, and the method comprises the steps: obtaining first voice stream data and second voice stream data, the first voice stream data corresponding to a first speaker, and the second voice stream data corresponding to a second speaker; obtaining a first error function item of a target comparison error function based on the first voice stream data; the first error function term is a minimized object; obtaining a second error function item of the target comparison error function based on the first voice stream data and the second voice stream data; the second error function term is a maximized object; obtaining a target comparison error function based on the first error function term and the second error function term; and adjusting model parameters of the original speaker segmentation model based on the target comparison error function to obtain a target speaker segmentation model. The speaker segmentation point accuracy can be improved. Meanwhile, the invention also discloses a speaker segmentation method and device.

Description

technical field [0001] The present invention relates to the technical field of speaker segmentation, in particular to an optimization method for a speaker segmentation model, a speaker segmentation method and a device. Background technique [0002] The real-time speaker segmentation technology can automatically determine the identity of the speaker in the real-time voice stream and give information about when the speaker is speaking. It has a large application demand in conferences, interviews and other scenarios, and is a hot research topic in the current voice industry. [0003] The commonly used framework for speaker segmentation technology is to first segment the speech, and then cluster the speech segments, but this framework cannot handle real-time speech streams, so it can only be used as a back-end system for non-real-time speaker segmentation of speech. . [0004] The emergence of UIS-RNN (Unbounded Interleaved-State Recurrent Neural Network, Unbounded Interleaved...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F30/27G06N3/08
CPCG06F30/27G06N3/08
Inventor 姚升余潘逸倩
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products