Microphone array and binocular camera-based speaker positioning and recognizing method

A technology of binocular camera and microphone array, applied in the directions of positioning, character and pattern recognition, image analysis, etc., can solve the problems of limited shooting area, increased hardware cost and software resource occupation, etc., to reduce software overhead, accurate and reliable recognition results, The effect of high refresh rates

Active Publication Date: 2018-11-02
SOUTHEAST UNIV
View PDF3 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the characteristics of the camera, the objects that can be located and identified are limited to the area that the camera can capture
The shooting area of ​​a single camera is very limited, and adding more cameras will greatly increase the hardware cost and software resource occupation during image processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microphone array and binocular camera-based speaker positioning and recognizing method
  • Microphone array and binocular camera-based speaker positioning and recognizing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] Such as figure 1 As shown, a speaker localization and recognition method based on a microphone array and a binocular camera includes the following steps:

[0022] (1) Set up the microphone array, the binocular camera and the rotating platform where the binocular camera is located, respectively build the target face and i-vector database and train the image-based face recognition model and the audio-based speaker recognition model;

[0023] (2) Fix the binocular camera and microphone array on the rotating platform, and calculate the confidence w of the face recognition model and the speaker recognition model in the current environment V with w A ;

[0024] (3) The microphone array first records a section of audio, and calculates its average power as the ambient power;

[0025] (4) When the microphone array detects that the difference between the current power and the ambient power is greater than a certain threshold, and the duration is greater than a certain threshol...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a microphone array and binocular camera-based speaker positioning and recognizing method. A microphone array and a TDOA method occupying relatively few resources are mainly used for carrying out rough positioning of a speaker, and then accurate positioning is carried out in combination with a binocular camera which is higher in precision but narrow in visual field, so thatthe software expenditure is greatly reduced on the basis of realizing accurate 360-degree omni-directional sound source positioning, a system can obtain a higher refresh rate or execute other tasks atidle time; and meanwhile, a weighted average method for dynamic adjustment is further used for trading off recognition results of a sound and an image, so that the recognition results finally outputby the system are more accurate and reliable.

Description

technical field [0001] The invention relates to the technical field of fusion of sound and image information, in particular to a speaker positioning and recognition method based on a microphone array and a binocular camera. Background technique [0002] At present, the sound source localization technology based on the microphone array is relatively mature, and products based on this technology can also be seen on the market, such as Amazon's Alexa and iFLYTEK's six-microphone ring array voice positioning and recognition module. At present, the most commonly used sound source localization method is the TDOA (Time Difference of Arrival) method, which uses GCC (Generalized Cross Correlation) to judge the time difference between the sound source and the different microphones in the array, and uses the geometric positioning method in combination with the position of the microphone in the array. position. However, the performance of the GCC method will decrease under reverberant ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06T7/70G06K9/00G01S5/22G10L17/04G06N3/04
CPCG06T7/70G10L17/04G01S5/22G06V40/161G06N3/045
Inventor 莫凌飞李英昊厉叶
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products