Unlock instant, AI-driven research and patent intelligence for your innovation.

Speech signal detection method, device, equipment and storage medium

A voice signal and detection method technology, applied in the direction of voice analysis, instruments, etc., can solve the problems of low accuracy, undetectable end endpoint, difficult detection of voice signal start endpoint, etc., to achieve self-adaptive background noise, improve The effect of accuracy

Active Publication Date: 2021-10-15
北京如布科技有限公司
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, traditional VAD detection methods lack the ability to adapt to background noise, and the accuracy is low.
With energy as a feature, if the energy at the beginning is relatively large, it is difficult for the traditional VAD detection method to detect the start and end of the speech signal
In addition, if the starting endpoint is detected, the overall background noise becomes very large, that is, the energy value is difficult to be lower than the preset threshold, resulting in the detection of the ending endpoint

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech signal detection method, device, equipment and storage medium
  • Speech signal detection method, device, equipment and storage medium
  • Speech signal detection method, device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] Figure 1A It is a flow chart of a voice signal detection method provided by Embodiment 1 of the present invention. This embodiment is applicable to how to accurately detect a voice signal from an audio signal including noise. The method can be executed by the device provided in the embodiment of the present invention, and the device can be implemented in the form of software and / or hardware, and the device can be integrated in a computing device, or can be independently used as a device. see Figure 1A , the method may specifically include:

[0036] S101. Acquire an audio signal, where the audio signal includes a voice signal.

[0037] In this embodiment, the audio signal may be obtained in real time from a recording device, an audio collection device such as a microphone, a communication device, or an audio storage device. The voice signal refers to an effective signal in the audio signal, which may specifically be a voice signal that needs to occupy call resources. ...

Embodiment 2

[0062] Figure 2A It is a flow chart of a speech signal detection method provided by Embodiment 2 of the present invention. On the basis of Embodiment 1 above, this embodiment further determines the long-term eigenvalue and short-term eigenvalue based on the eigenvalue of each frame signal in the audio signal. eigenvalues ​​are explained in detail. see Figure 2A , the method may specifically include:

[0063] S201. Acquire an audio signal, where the audio signal includes a voice signal.

[0064] S202. Extract feature values ​​of each frame of the audio signal.

[0065] Specifically, after the audio signal is acquired, the feature value of each frame signal can be extracted through the VAD detection method. Optionally, the eigenvalue of each frame signal may be any one of time-domain energy, time-domain zero-crossing rate, logarithmic energy, spectral entropy, frequency-domain subband, and frequency-domain variance, which can be selected according to actual conditions.

...

Embodiment 3

[0079] image 3 It is a flow chart of a speech signal detection method provided by Embodiment 3 of the present invention. On the basis of the above-mentioned embodiments, this embodiment further determines the The starting point of the speech signal is explained in detail. see image 3 , the method may specifically include:

[0080] S301. Acquire an audio signal, where the audio signal includes a voice signal.

[0081] S302. Determine a long-term feature value and a short-term feature value according to the feature value of each frame of the audio signal.

[0082] S303. If the eigenvalue of the current frame signal is greater than the long-term eigenvalue or the short-term eigenvalue, and starting from the current frame signal, the eigenvalues ​​of each frame signal within the first duration are greater than the long-term eigenvalue or the short-term eigenvalue, Then the current frame signal is taken as the starting point of the speech signal.

[0083] In order to ensure ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a voice signal detection method, device, equipment and storage medium. Wherein, the method includes: acquiring an audio signal, wherein the audio signal includes a speech signal; determining a long-term characteristic value and a short-term characteristic value according to the characteristic value of each frame signal in the audio signal; determining a long-term characteristic value and a short-term characteristic value according to the long-term The eigenvalue, the short-term eigenvalue and the eigenvalue of the current frame signal determine the starting point of the speech signal; starting from the frame signal corresponding to the starting point of the speech signal, any frame after the first duration The eigenvalue of the signal is used as a valley value, and with the frame signal as the end point, the peak value is determined according to the eigenvalues ​​of each frame signal in the second time length; wherein, the first time length is longer than the second time length; according to the peak value and the valley value to determine the end point of the speech signal. The technical solutions provided by the embodiments of the present invention can adapt to background noise and improve the accuracy of VAD detection.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of voice signal processing, and in particular, to a voice signal detection method, device, equipment, and storage medium. Background technique [0002] With the development of artificial intelligence, speech recognition technology is becoming more and more mature, and is widely used to detect user speech. Among them, voice activity detection (Voice Activity Detection, VAD), also known as voice endpoint detection, is used to detect the presence or absence of voice in a noisy environment, and is an important link before voice recognition. [0003] At present, VAD detection basically revolves around extracting and using speech features (such as: time-domain energy, time-domain zero-crossing rate, logarithmic energy, spectral entropy, frequency-domain subband or frequency-domain variance, etc.). The traditional VAD detection method is: the first few frames of the audio signal collected by ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L25/87G10L25/84G10L25/03
CPCG10L25/03G10L25/84G10L25/87
Inventor 刘东强徐燃雷宇
Owner 北京如布科技有限公司