Supercharge Your Innovation With Domain-Expert AI Agents!

A method and system for judging the number of speakers

A speaker and purpose technology, applied in the field of speech signal processing, can solve problems such as inaccurate number of speakers, achieve the effect of eliminating the step size limit, improving the effect of speech recognition, and improving the accuracy

Active Publication Date: 2019-07-09
IFLYTEK CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Embodiments of the present invention provide a method and system for judging the number of speakers, which solves the problem that the number of speakers judged by the prior art is inaccurate for dual-speaker or multi-speaker scenarios, especially for long-term audio, so as to improve judgment speaker number accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for judging the number of speakers
  • A method and system for judging the number of speakers
  • A method and system for judging the number of speakers

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0078] like figure 2 Shown is a flowchart of a method for judging the number of speakers provided by an embodiment of the present invention, including the following steps:

[0079] Step S01, receiving a voice signal.

[0080] In this embodiment, a voice signal is received through a device such as a microphone. The voice signal can be the real-time pronunciation of the speaker, or a voice signal saved by a recording device, etc. Of course, it can also be a voice signal transmitted by communication equipment, such as a mobile phone, a teleconferencing system, and the like.

[0081] In practical application, it is necessary to carry out endpoint detection to the received speech signal. The endpoint detection refers to determining the start point and termination point of speech from a section of signal containing speech. Effective endpoint detection can not only minimize the processing time, but also And it can remove the noise interference of the silent segment. In this embod...

Embodiment 2

[0115] A method for judging the number of speakers, as described in Embodiment 1, the difference is that in this embodiment, in order to eliminate the influence of channel interference on judging the similarity between speech signal classes, a probabilistic linear discriminant analysis (Probabilistic linear discriminant analysis (PLDA) technology to remove the interference information of the channel, so as to improve the accuracy of judging the similarity between speech signal classes.

[0116] Step S11 to step S15 are the same as the first embodiment, and will not be described in detail here.

[0117] Step S16, calculation process: calculate and compare the similarity between different speech signal classes according to the speech signal features of each segmented signal segment in the speech signal class after re-segmentation.

[0118] In this embodiment, the PLDA technology is used to remove channel interference information. Specifically, the part representing the channel ...

Embodiment 3

[0132] A method for judging the number of speakers, as described in Embodiment 2, the difference is that in this embodiment, in order to further improve the accuracy of judging the similarity between speech signal classes, this embodiment uses probabilistic linear discrimination The analysis (Probabilistic linear discriminant analysis, PLDA) technology calculates the PLDA score between each speech signal category, and judges the similarity between each speech signal category through the PLDA score, thereby improving the accuracy of judging the similarity between speech signal categories. Wherein, the larger the value of the PLDA score, the higher the possibility that the speech signal feature of the corresponding class 2 speech signal class is judged as class 1.

[0133] Step S11 to step S15 are the same as the second embodiment, and will not be described in detail here.

[0134] Step S16, calculation process: calculate and compare the similarity between different speech signa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and system for determining the number of speakers. The method comprises steps of: receiving a voice signal; extracting a voice signal characteristic of the voice signal; segmenting the voice signal according to the voice signal characteristic of the voice signal in order to obtain segmented signal segments; clustering the segmented signal segments to be voice signal categories with an assigned amount; re-segmenting the voice signal according to the voice signal characteristic of each segmented signal segment in the voice signal categories; performing a computing process in order to compute and compare the similarity of different voice signal categories according to the voice signal characteristic of each re-segmented signal segment in the voice signal categories; and determining the number of speakers according to a computed result after a computing process finishes. Since the voice signal is re-segmented, the method and the system may eliminate the influence of step length restriction in the voice signal segmentation in the prior art, and improve the determining accuracy of the number of subsequent speakers by computing and comparing the similarity of different voice signal categories.

Description

technical field [0001] The invention relates to the field of speech signal processing, in particular to a method and system for judging the number of speakers. Background technique [0002] With the continuous development of speech signal processing technology, the object of speech signal processing has gradually begun to include dual-speaker scenes, such as telephone recordings, and even multi-speaker scenes, such as meeting records; in addition, the current The data of speech signal processing is gradually extended from the original short-term audio of several seconds to tens of seconds to long-term audio of tens of minutes or even hours. For dual-speaker or multi-speaker scenarios, especially for long-term audio, the recognition effect of voice recordings is closely related to the effect of speaker separation. Accurately judging the number of speakers can help analyze voice recording scenarios and optimize speakers. The effect of separation, so as to formulate correspond...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/02G10L15/14G10L17/02
Inventor 何山殷兵潘青华胡国平胡郁刘庆峰
Owner IFLYTEK CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More