Broadband background noise and voice separation detection system and method

A background noise and speech detection technology, applied in speech analysis, instruments, etc., can solve the problems of misjudgment of noise as speech, poor adaptability, poor adaptability to quiet environments, etc., to achieve improved accuracy and good detection effect Effect

Active Publication Date: 2017-03-15
成都启英泰伦科技有限公司
5 Cites 6 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0003] The current mainstream method of automatic voice endpoint detection is to rely on the short-term energy in the time domain, the zero-crossing rate, and the mean square error of the frequency band energy in the frequency domain. The specific method is to find out the short-term energy, zero-crossing rate or The mean square error of frequency band energy is then compared with an empirical threshold. Experiments have proved that this method of comparing short-term energy or zero-crossing rate alone is not suitable for noisy environments, especially when the application environment changes. The same When the background noise of the environment will also change, the frequency band energy mean square error method is not suitable for quiet environments
[0004] Speech detection can also be performed separately according to the change of the average sound energy in the time domain and frequ...
View more

Abstract

The invention relates to the field of information processing technology and sensing signal processing and more particularly to a broadband background noise and voice separation detection system. The system comprises a current frame time-domain energy calculation circuit, a background noise calculation circuit, a time-domain voice detection long-and-short-time average energy comparison circuit, a frequency-domain voice detection length time-frequency-domain energy comparison circuit, a background noise comparison circuit, a sub-band energy distribution uniformity voice detection circuit, and a voice frame number counting circuit. The invention also discloses a broadband background noise and voice separation detection method. The system and the method use a three-level voice detection means, has a very good detection effect on high-low-frequency background noise, and has a very good detection effect on the noise of the accidental intermittent line, and greatly improves the accuracy of voice detection in the complicated noise environment.

Application Domain

Speech analysis

Technology Topic

Frame timeThree level +10

Image

  • Broadband background noise and voice separation detection system and method
  • Broadband background noise and voice separation detection system and method
  • Broadband background noise and voice separation detection system and method

Examples

  • Experimental program(1)

Example Embodiment

[0032] The present invention will be further described in detail below in conjunction with examples and specific implementations, but it should not be understood that the scope of the above-mentioned subject of the present invention is limited to the following examples. All technologies implemented based on the content of the present invention belong to the present invention. range.
[0033] Such as figure 1 As shown, a broadband background noise and speech separation detection system, the system current frame time-frequency domain energy calculation circuit, the background noise calculation circuit connected to the current frame time-frequency domain energy calculation circuit, and the time-domain speech detection long and short time The average energy comparison circuit and the frequency domain speech detection long and short time frequency domain energy comparison circuit are connected to the background noise calculation circuit, the time domain speech detection long and short time average energy comparison circuit and the frequency domain speech detection long and short time frequency domain energy comparison circuit. A noise comparison circuit, a sub-band energy distribution uniformity voice detection circuit respectively connected to the time-domain speech detection long-short-term average energy comparison circuit and frequency-domain speech detection long-short-time frequency domain energy comparison circuit, and the sub-band energy distribution is uniform The voice frame number statistics circuit connected to the sexual voice detection circuit, the background noise calculation circuit is also respectively connected with the sub-band energy distribution uniformity voice detection circuit, the voice frame number statistics circuit, the time domain voice detection long and short-term average energy comparison circuit and The frequency domain speech detection long and short time frequency domain energy comparison circuit is connected, the speech frame number statistics circuit is composed of a time width filter, and the time width filter is used to count the number of frames of speech. In this embodiment, the number of time width filters is 1 In this embodiment, the time-width filter is a voice frame counter.
[0034] Such as figure 2 As shown, a wideband background noise and speech separation detection method includes the following eleven steps:
[0035] Step 1: Load sound data, the sound data is processed in frames, the sound data is speech data in the time domain, and the time size of the frame can be configured, usually between 10 milliseconds and 50 milliseconds;
[0036] Step 2: Calculate the time-domain short-term energy and the time-domain long-term average energy. The time-domain short-term energy is the sum of the energy of the current frame of speech data in the time domain. The time-domain short-term energy of multiple frames is accumulated and divided by The number of frames of the time domain short-term energy obtains the time domain long-term average energy;
[0037] Step 3: Perform FFT (Fast Fourier) transformation on the current frame of voice data in the time domain, and transform the current frame of voice data in the time domain into subband voice data in the frequency domain;
[0038] Step 4 Calculate the short-term energy in the frequency domain and the long-term average energy in the frequency domain, and accumulate the sub-band energy in the frequency range of the main energy distribution of the human voice in the current frame of the sub-band voice data in the frequency domain to obtain the short-term energy in the frequency domain. Accumulating and dividing the frequency domain short-term energy by the number of frames of the frequency domain short-term energy to obtain the frequency domain long-term average energy;
[0039] Step 5: Background noise accumulation calculation, sending the time-domain short-term energy of non-speech frames to the background noise estimation unit for accumulation, and outputting a new background noise every time accumulation reaches a certain number of frames;
[0040] Step 6: Compare the background noise with the set threshold value 1, if it is greater than the threshold value 1, proceed to step 7, and if it is less than the threshold value 1, proceed to step 8;
[0041] Step 7: Perform frequency domain speech detection. The frequency domain speech detection compares the frequency domain short-term energy with the frequency domain long-term average energy. The frequency domain short-term energy exceeds the frequency domain long-term average energy by a certain amount. If the level is voice, if it is non-voice, go to step 9 if it is voice, and proceed to step 5 and step 11 if it is not voice;
[0042] Step 8: Perform time-domain speech detection. The time-domain speech detection compares the time-domain short-term energy with the time-domain long-term average energy. The time-domain short-term energy exceeds the time-domain long-term average energy by a certain amount. If the level is voice, otherwise it is non-speech, if it is voice, then go to said step 9, if it is not voice, go to said step 5 and step 11;
[0043] Step 9: Perform frequency domain subband energy distribution uniformity detection. If the detection result has a high uniformity, it is speech, if the detection result is low, it is non-speech, if it is speech, go to step 10, and if it is not speech, proceed to step Step 5 and Step 11;
[0044] Step 10: The time-width filter counts the number of voice frames generated in the step 9, and the time-width filter counts the number of frames in which the voice data is continuous speech, and compares it with the set threshold 2. If the number of frames is greater than the second threshold, it means that the voice directly enters the step eleven, and if the number of frames is less than the threshold two, it means that the non-voice enters the steps 5 and eleven;
[0045] Step 11 The detection result is output and the detection ends.
[0046] When running step 7 to step 10, when the running result is determined to be non-speech, run step 5 of the non-speech data to generate the new background noise.
[0047] In this embodiment, the calculation process of step three is as follows:
[0048] Assuming that the number of frequency domain subbands is N, the average subband energy is , Where Eavg is the average sub-band energy, Etotal is the sum of all sub-band energies, Ei is the energy of the i-th sub-band, i = 1, 2...N. In the frequency domain, the subband energy is equal to the sum of the square of the real part and the square of the imaginary part.
[0049] In this embodiment, the calculation process of step 9 is as follows:
[0050] Use the mean square error method to find the non-uniformity, set the energy of each subband as Ei, then use the mean square error to find the non-uniformity, the formula is , Where nU is non-uniformity, set the threshold Th_nu as the non-uniformity threshold, then when nU When
[0051] In other embodiments, the following two methods can be used for calculation:
[0052] 1. Using the absolute value of the difference and the average, the formula is , Where nU is the non-uniformity, and the threshold Th_nu is the non-uniformity threshold, then when nU
[0053] 2. Count the sub-bands whose sub-band energy is close to the average sub-band energy. If more sub-band energy is distributed near the average energy, it is speech, otherwise it is non-speech. The specific formula is as follows, if: |Ei-Eavg| Th_u, It is judged as speech, otherwise it is non-speech.
[0054] The detailed calculation process of step 10 in this embodiment is as follows:
[0055] Set a voice frame counter, which is initially 0 at the beginning, cleared to zero when encountering a non-voice frame, and incremented by 1 when encountering a voice frame, and when changing from a non-voice frame to a voice frame, the value of the first voice frame The serial number is updated to the start address of the voice frame. When the value of the voice frame counter is greater than a threshold of two, then starting from the first voice frame, the continuous voice frames are voice frames until a non-voice frame appears. When a non-speech frame is reached, the speech frame counter value is less than the threshold, and the previous speech frame is also judged as a non-speech frame.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Vehicle type identification method based on machine vision and deep learning

PendingCN111507196AGood detection effectImproved precision and accuracy
Owner:HANGZHOU DIANZI UNIV

Voice activation detecting system used for video conference system

ActiveCN110689905AGood detection effectSNR Robust
Owner:西安合谱声学科技有限公司

Classification and recommendation of technical efficacy words

  • improve accuracy
  • Good detection effect

Golf club head with adjustable vibration-absorbing capacity

InactiveUS20050277485A1improve grip comfortimprove accuracy
Owner:FUSHENG IND CO LTD

Stent delivery system with securement and deployment accuracy

ActiveUS7473271B2improve accuracyreduces occurrence and/or severity
Owner:BOSTON SCI SCIMED INC

Method for improving an HS-DSCH transport format allocation

InactiveUS20060089104A1improve accuracyincrease benefit
Owner:NOKIA SOLUTIONS & NETWORKS OY

Catheter systems

ActiveUS20120059255A1increase selectivityimprove accuracy
Owner:ST JUDE MEDICAL ATRIAL FIBRILLATION DIV

Gaming Machine And Gaming System Using Chips

ActiveUS20090075725A1improve accuracy
Owner:UNIVERSAL ENTERTAINMENT CORP

Identity authentication system based on living body detection and face verification

InactiveCN107133608AHigh-precision face recognition and liveness detectionGood detection effect
Owner:天津中科智能识别有限公司

Passive positioning and identification system for civil unmanned aerial vehicles

ActiveCN107678023AGood ability to detect low-altitude, slow-moving and small targetsGood detection effect
Owner:芜湖华创光电科技有限公司

X-ray crystal orientation device

Owner:DONGDAN AOLONG RADIATIVE INSTR GRP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products