Speaker counting method and system

A counting method and speaker technology, applied in the field of speaker counting methods and systems, can solve the problems of unable to extract speaker counting or aliasing speech detection, high-level speech features, and acoustic features are easy to fall into local optimum.

Active Publication Date: 2022-05-13
AISPEECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve at least the traditional acoustic features that are artificially designed in the prior art, speech features that are highly correlated with speaker counts or aliasing speech detection cannot be extracted, and the artificially designed traditional acoustic features tend to fall into local optimum question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speaker counting method and system
  • Speaker counting method and system
  • Speaker counting method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0045] Such as figure 1 Shown is a flow chart of a speaker counting method provided by an embodiment of the present invention, including the following steps:

[0046] S11: Establish an end-to-end speaker counting model based on a deep convolutional neural network;

[0047] S12: Using the original audio waveform as an input of the end-to-end spea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the present invention provide a speaker counting method. The method includes: establishing an end-to-end speaker counting model based on a deep convolutional neural network; using an original audio waveform as an input of the end-to-end speaker counting model; determining according to an output result of the end-to-end speaker counting model number of speakers. Embodiments of the present invention provide a speaker counting system and also provide an optimization method and system for an aliased speech detection model. The embodiment of the present invention adopts the method of end-to-end speech aliasing detection and speaker counting of the original waveform input, and uses the neural network to directly extract the depth features from the original speech for subsequent tasks, so that it is easier to obtain the features matching the corresponding tasks, and the accurate determination It is more suitable for the scene of multiple people speaking at the same time in real life, and provides additional information for the back-end speech processing system, thereby promoting the recognition, separation and enhancement of aliased speech.

Description

technical field [0001] The invention relates to the field of voice detection, in particular to a speaker counting method and system. Background technique [0002] Although intelligent speech is constantly developing, the performance of the speech processing system will still be severely degraded under complex scene conditions, for example, a cocktail party scene that includes multiple talkers overlapping speech at the same time and involves other background noises. In this case, if the number of speakers is given in advance, the performance of overlapping speech processing can be significantly improved, and then accurate overlapping speech detection and speaker counting are very useful for later speech detection and recognition. For the above problems, the speaker counting method based on convolutional neural network is usually used. The input is the artificially designed acoustic features, including the speech signal envelope, histogram, Mel frequency cepstral coefficient, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06N3/04G10L25/30G10L25/51
CPCG10L25/30G10L25/51G06N3/045G06F18/214
Inventor 钱彦旻张王优孙曼王岚
Owner AISPEECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products