Unlock instant, AI-driven research and patent intelligence for your innovation.

Audio signal processing method and device and electronic equipment

An audio signal processing and audio signal technology, applied in the field of audio signal recognition, can solve problems such as high error rate, achieve real-time guarantee, improve efficiency and accuracy, and reduce workload

Pending Publication Date: 2022-04-22
ALIBABA GRP HLDG LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Although there are some speech recognition-related products in the prior art, usually only the collected speech signals can be converted into text, and a clerk is required to segment the text string, mark the specific speaker information, etc., so , still requires a lot of manual operations, and still produces a high error rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio signal processing method and device and electronic equipment
  • Audio signal processing method and device and electronic equipment
  • Audio signal processing method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0078] This embodiment one is aimed at figure 1 The multi-person speech scene shown provides an audio signal processing method, see figure 2 , the method may specifically include:

[0079] S210: Perform speech recognition and sound source localization on the audio signal collected in a multi-person speaking scene; wherein, when performing sound source localization on the audio signal, the following steps are respectively performed in units of signal frames in the audio signal deal with:

[0080] Obtaining the DOA spectrogram information of the current signal frame and the signal frames of the number of targets before and after it to form a matrix spectrogram, and smoothing the matrix spectrogram;

[0081] The sound source localization result of the current signal frame is determined according to the angle corresponding to the value satisfying the target condition in the smoothed DOA spectrogram corresponding to the current signal frame.

[0082] Among them, with regard to ...

Embodiment 2

[0105] In the first embodiment above, an information processing method in a specific multi-person speaking scene is introduced, which involves a specific sound source localization method, which can also be used in other application scenarios. For this reason, in Embodiment 2 of the present application, a sound source localization method is separately provided, see Figure 4 , the method may specifically include:

[0106] S410: Determine the audio signal to be processed;

[0107] There may be many kinds of audio signals to be processed, for example, they may be audio signals collected in real time in a certain scene, or they may be recording results, and so on.

[0108] S420: Obtain DOA spectrogram information of the current signal frame in the audio signal and the signal frames of the number of targets before and after it;

[0109] S430: Perform smoothing processing on a matrix spectrogram composed of direction-of-arrival spectrogram information of the signal frame and signa...

Embodiment 3

[0112] The third embodiment provides a specific application solution for a conference scene in which multiple people speak. Specifically, the third embodiment provides a method for generating meeting minutes, see Figure 5 , the method can include:

[0113] S510: Perform speech recognition and sound source localization on the audio signal collected in a conference scene where many people speak; wherein, when performing sound source localization on the audio signal, the signal frame in the audio signal is used as a unit, respectively Do the following:

[0114] Obtaining the DOA spectrogram information of the current signal frame and the signal frames of the number of targets before and after it to form a matrix spectrogram, and smoothing the matrix spectrogram;

[0115] Determine the sound source localization result of the current signal frame according to the angle corresponding to the value satisfying the target condition in the smoothed DOA spectrogram corresponding to the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the invention disclose an audio signal processing method and apparatus, and an electronic device. The method comprises the steps of performing voice recognition and sound source positioning on an audio signal acquired in a multi-person speaking scene; wherein when sound source localization is carried out on the audio signal, direction-of-arrival spectrogram information of a current signal frame and a target number of signal frames before and after the current signal frame is acquired to form a matrix spectrogram, and smoothing processing is carried out on the matrix spectrogram; determining a sound source positioning result of the current signal frame according to an angle corresponding to a value meeting a target condition in the smoothed direction of arrival spectrogram corresponding to the current signal frame; and determining the occurrence position of a spokesman change event according to the sound source positioning results of the plurality of signal frames, and separating the text obtained by voice recognition according to the occurrence position of the spokesman change event. According to the embodiment of the invention, the conference recording efficiency and accuracy can be improved, and the workload of conference recording workers is reduced.

Description

technical field [0001] The present application relates to the technical field of audio signal recognition, in particular to an audio signal processing method, device and electronic equipment. Background technique [0002] In conferences, court hearings and other scenarios where many people speak, there is usually a need to record the content of the meeting. In the traditional way, a special clerk is usually required to record on the spot, recording the specific content of the speech and the corresponding speaker. The way of recording is usually that the clerk will input the content of the speech heard on the spot into computer equipment such as a computer by typing. However, this has high requirements for the professional ability and concentration of the clerk. Once the speaker changes at a certain moment or someone "snatches the talk", it is necessary to change the speaker information in a timely manner and record the content of the speech. Therefore, there may be cases ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/26G06F40/289G01S5/22
CPCG10L15/26G06F40/289G01S5/22
Inventor 郑斯奇索宏彬
Owner ALIBABA GRP HLDG LTD