Method for testing silent frame

A silent frame and speech frame technology, applied in speech analysis, instruments, digital transmission systems, etc., can solve problems such as inaccurate detection and achieve the effect of improving accuracy and precision

Active Publication Date: 2008-07-02
TENCENT TECH (SHENZHEN) CO LTD
4 Cites 2 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0011] In view of this, the main purpose of the present invention is to provide a method for detecting silent frames, to eliminate...
View more

Method used

In step 101, with the minimum processing time length as unit, data frame is divided into data subframe, thus on the one hand can be divided into the data subframe of same size with the different data frame of packing time length and carry out silent frame detection, thereby can obtain A unified silent frame detection accuracy, on the other hand, because the embodiment of the present invention divides the data subframe with the minimum processing time as the unit, therefore, compared with the mute performed after the data subframe is divided by the unit of a longer time length Frame detection can greatly improve the accuracy of silent frame detection and avoid errors;
The present invention ...
View more

Abstract

The invention discloses a method for detecting sound eliminating frames, setting a background noise energy variant, judging if the variant needs to be changed according to the minimum energy of the sampling locations in the current obtained data frame and making corresponding regulation on the condition of needing to change, then using the variant to realize detection of sound eliminating frames. Because the invention adopts the background noise energy variant and real-time changes the value of the variant according to the current need, it can adapt to the current environment and equipment requirements of the sound eliminating frame detection, thus making the sound eliminating frame detection more accurate.

Application Domain

Speech analysisData switching networks

Technology Topic

Time changesBackground noise +2

Image

  • Method for testing silent frame

Examples

  • Experimental program(1)

Example Embodiment

[0049] The present invention is a method for detecting a silent frame. The method sets a background noise energy value variable, and determines whether the background noise energy value variable needs to be changed according to the minimum energy of the sampling point in the currently obtained data frame, and when it needs to be changed In the case of the corresponding adjustment, and then use the background noise variable to achieve silent frame detection. Since the present invention adopts the variable of the background noise energy value and changes the value of the variable in real time according to current needs, it can adapt to the current environment of silent frame detection and the needs of the device, thereby making the silent frame detection more accurate.
[0050] The present invention will be described in detail below in conjunction with the accompanying drawings.
[0051] See figure 1, The following steps are required to realize the present invention:
[0052] Step 101: Divide the current data frame into data sub-frames;
[0053] Step 102: Select a data subframe that has not undergone silent frame detection as the currently detected data subframe from the current data frame, sample the data subframe, and calculate the minimum energy value and the maximum energy value of each sampling point;
[0054] Step 103: Compare the minimum energy value with the background noise energy value variable, and determine whether the background noise energy value variable needs to be changed currently according to the comparison result, and if so, increase or decrease the background noise energy value variable accordingly;
[0055] Step 104: Calculate the upper limit of the noise energy value according to the current value of the background noise energy value variable, and determine whether the maximum energy value in step 102 is less than the upper limit, if so, the currently detected data subframe is a silent frame, otherwise it is Speech frame
[0056] Step 105: Determine whether there are other data subframes in the current data frame that have not undergone silence detection. If yes, return to step 102 until all data subframes in the data frame are detected; otherwise, go to step 106;
[0057] Step 106: Determine whether the data subframes in the current data frame are all silent frames, if so, the current data frame is a silent frame, otherwise it is a voice frame.
[0058] Wherein, the calculation of the maximum energy value of each sample point in the foregoing step 102 can be performed at any position after the step 102 and before the judgment step in the step 104, and does not affect the implementation of the present invention.
[0059] The specific implementation of the above steps will be described below in conjunction with specific examples.
[0060] In the embodiment of the present invention, the sampling rate of the voice stream is 8000 Hz, the quantization level is 16 bits, with signs; in other embodiments of the present invention, the silent frame detection can also be performed on voice streams of other sampling rates and quantization levels. It does not affect the realization of the present invention.
[0061] (1) The concrete realization of step 101:
[0062] In the embodiment of the present invention, in order to be able to perform silent frame detection more accurately, a data frame is divided into multiple data sub-frames with a minimum processing duration as a unit. In the embodiment of the present invention, 5 milliseconds is used as the minimum processing duration. Since the data stream sampling rate is 8000 Hz and the quantization level is 16, the 5 millisecond data sub-frame has 5×8000/1000=40 sample points. In other embodiments of the present invention, other durations may also be used as the unit Dividing a data frame into data sub-frames does not affect the implementation of the present invention;
[0063] In step 101, the data frame is divided into data sub-frames in the unit of the minimum processing time, so that on the one hand, data frames with different packing durations can be divided into data sub-frames of the same size for silent frame detection, so as to obtain a unified Mute frame detection accuracy. On the other hand, because the embodiment of the present invention uses the minimum processing duration as the unit to divide the data subframes, compared to the mute frame detection performed after the data subframes are divided using a longer duration as the unit, It can greatly improve the accuracy of silent frame detection and avoid errors;
[0064] If you do not consider the problems of unified detection accuracy and improved detection accuracy as described above, the entire data frame can also be divided into only one data subframe. At this time, the data frame itself is the detected data subframe, and it can also be executed by executing The subsequent steps implement the silent detection process of the data frame and do not affect the implementation of the present invention;
[0065] (2) The concrete realization of step 102:
[0066] In the embodiment of the present invention, a data subframe is selected in sequence from each data subframe of the current data frame as the currently detected data subframe, the current data subframe is sampled, and then each sampling point is calculated as follows The minimum energy value and maximum energy value of:
[0067] Method 1: Calculate the difference between the absolute value of the amplitude value of each adjacent sampling point by cross-subtraction, calculate the energy of each sampling point according to the absolute value of the difference, and then compare the energy of each sample point to obtain the maximum The energy value and the minimum energy value are specifically realized as follows:
[0068] Step a1: Take the absolute value of the amplitude value of all sample points in the data subframe, and calculate the average value of the absolute value of the amplitude value of all sample points, and then subtract the absolute value of the amplitude value of the adjacent sample points and take the absolute value The absolute value of the difference obtained by the subtraction is used as the output value of one of the adjacent sample points that produced the difference, and the average value of the absolute value of the amplitude of all the sample points and the first sample point are calculated Or the difference of the absolute value of the amplitude value of the last sample point, taking the absolute value of the difference as the output value of the first sample point or the last sample point;
[0069] Wherein, in the embodiment of the present invention, the absolute value of the difference obtained by the subtraction is used as the output value of the next sample point among the adjacent sample points. For the first sample point, the amplitude of the sample point and all the sample points are calculated. The difference between the average value of the absolute value of the value, and then the absolute value of the difference is taken, and the absolute value is used as the output value of the first sample point; in other embodiments of the present invention, the subtraction can also be obtained The absolute value of the difference is taken as the output value of the previous sample point among adjacent sample points. For the last sample point, calculate the difference between the sample point and the average of the absolute values ​​of the amplitude values ​​of all sample points, and then take the absolute value of the difference Value, the absolute value is used as the output value of the last sample point;
[0070] Step a2: Using the output value of each sample point obtained in step a1, calculate the energy value of each sample point according to the following formula (1);
[0071] Formula 1):
[0072] Among them, in formula (1), Energy i Represents the energy value of the i-th sample point, output[i] represents the output value of the i-th sample point, i is a natural number, and represents the sample point number;
[0073] Step a3: Compare all the sample points to obtain the maximum energy value of the sample point and the minimum energy value of the sample point;
[0074] Among them, in the embodiment of the present invention, the energy values ​​of all sample points are first calculated through step a2, and then these energy values ​​are compared through step a3. In other embodiments of the present invention, the energy values ​​can also be sequentially calculated according to formula (1). Calculate the energy value of the current sample point, and then compare the energy value of the current sample point with the current minimum energy value and maximum energy value respectively. If the current minimum energy value is less than the current minimum energy value, the current sample The energy value of the point is assigned to the current minimum energy value. If the energy value of the current sample point is greater than the current maximum energy value, the energy value of the current sample point is assigned to the current maximum energy value, and the calculation and comparison of energy values ​​are repeated The process until all sample points have been processed.
[0075] In this way, because the absolute value of the amplitude value of the sample point is cross-subtracted, the subsequent processing can be based on the subtle changes of adjacent voices, which is helpful to shield the differences between different sound cards and also in the background. In the presence of noise, the mute can be judged more accurately.
[0076] In step 102, in addition to calculating the minimum energy value and the maximum energy value of each sampling point according to the first method described above, the second method may also be used to calculate the minimum energy value and the maximum energy value of the sample point.
[0077] Way two:
[0078] Take the absolute value of the amplitude value of each sample point in the data subframe, and then use the absolute value of the amplitude value of each sample point to calculate the energy value of each sample point according to formula (2), and then compare the energy value of each sample point , Get the minimum energy value and maximum energy value of the sample point;
[0079] Formula (2):
[0080] In formula (2), Energy i Represents the energy value of the i-th sample point, extent[i] represents the absolute value of the amplitude value of the i-th sample point, i is a natural number, and represents the sample point number;
[0081] Wherein, in the second mode, the energy values ​​of the sample points can be compared with each other in the same manner as in the first mode, which does not affect the implementation of the present invention.
[0082] (3) The concrete realization of step 103:
[0083] Determine whether the minimum energy value of the sample point in the data subframe is less than or equal to the current value of the background noise energy value variable. If it is, it indicates that the current background noise energy value is too large, then assign the minimum energy value of the sample point to the background noise energy value Variable; and at the same time, it is judged whether the minimum energy value of the sample point in the data subframe is greater than the current value of the background noise energy value variable. If it is, it indicates that the current background noise energy value is too small and cannot correctly reflect the current background situation. , Then increase the background noise energy value;
[0084] Among them, in the embodiment of the present invention, in order to make the judgment more accurate, it can be judged whether the minimum energy value is greater than the background noise energy value variable in the following manner:
[0085] Determine whether the minimum energy value of the sample point continues to be greater than the background noise energy value variable within the preset time, if so, increase the background noise energy value; where the preset time corresponds to the corresponding number of data subframes , The purpose of using the preset time to make the above judgment is:
[0086] The current detection data frame may be either a speech frame or a silent frame. If it is a silent frame, the energy value of the sample point only includes the energy of the background noise, so obviously the minimum energy of the sample point in the silent frame can be used The value is compared with the background noise energy value, and the background noise energy value is updated according to the comparison result; if the current detected data frame is a speech frame, the energy value in the sample point includes not only the energy of the background noise, but also the speech itself Obviously, the minimum energy value of the sample point in the speech frame cannot correctly reflect the current background noise situation. Therefore, it should be avoided to compare the minimum energy value in the speech frame with the background noise energy value; taking into account that people are performing voice communication , Its pronunciation cannot be performed continuously within a period of time. The present invention presets a time corresponding to the limit length of the human’s pronunciation time, and then determines whether the smallest sample point in the data frame continues to appear within this period of time. The energy value is less than the background noise energy value. If it is, it means that the minimum energy value of the sample point in the data frame is greater than the background noise energy value in the case of silence. Increase the background noise energy value accordingly. Otherwise, if only When the minimum energy value of the sample point in the data frame is greater than the background noise energy value at a certain moment, it indicates that the data frame may be a speech frame, and the background noise energy value is not increased;
[0087] Among them, in the embodiment of the present invention, in order to enable a smooth transition of the background noise energy value, the background noise energy value is increased as follows:
[0088] If within the preset period of time, the minimum energy value of the sample point of the data sub-frame continues to be greater than the background noise energy value variable, then the background noise energy value variable is increased by 1, and then the timer is timed again. If the minimum energy value of the sample point of the data sub-frame is still greater than the background noise energy value variable for a period of time, the background noise energy value variable is increased by 1 again to achieve a smooth transition of the background noise energy value; in the embodiment of the present invention Here, the background noise energy value is increased by 1 each time. In other embodiments of the present invention, the background noise energy value may also be added by other constants each time, which does not affect the implementation of the present invention.
[0089](4) The concrete realization of step 104:
[0090] Since the background noise energy value variable only represents the lower limit of the background noise energy value, that is, pure background sound, and during the silent period, in addition to the background sound, there may be other types of noise. Therefore, in the silent detection process, The impact of the energy of these other types of noise should be considered. The upper limit of the background noise energy value needs to be calculated, and the upper limit is used to distinguish between normal speech and silence. In the embodiment of the present invention, the background noise energy value variable is added by one The preset noise energy threshold constant is used to obtain the upper limit of the noise energy value; the value of the noise energy threshold constant can be determined through actual tests. In the embodiment of the present invention, the noise energy threshold constant is 10dB;
[0091] After calculating the upper limit of the background noise energy value, the maximum energy value is compared with the upper limit. In the embodiment of the present invention, in order to detect more accurately, it is determined whether the maximum energy value is less than the upper limit continuously within a preset time. If yes, it is determined that the currently detected data subframe is a silent frame; wherein, in the embodiment of the present invention, the preset time is 500 milliseconds.
[0092] (5) The concrete realization of step 106:
[0093] For the case of dividing a data frame into multiple data subframes, in this step 106, it is determined whether all data subframes in a data frame are silent frames, if so, the data frame is a silent frame; When the data frame is divided into one data subframe, in this step 106, it is judged whether the data frame itself as a data subframe is a silent frame, and if so, it is judged that the data frame is a silent frame.
[0094] The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the present invention. Within the scope of protection.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Plate surface roughness detection device

InactiveCN112325836Aavoid human biasImprove accuracy and precision
Owner:合肥潜望镜机械科技有限公司

Data processing method and device and storage medium

PendingCN111191669AReduce inaccurate or insufficiently precise analysis resultsImprove accuracy and precision
Owner:CHINA MOBILE GRP HEILONGJIANG CO LTD +1

Classification and recommendation of technical efficacy words

  • Improve accuracy and precision

Corn ear image grain segmentation method

Owner:BEIJING RES CENT FOR INFORMATION TECH & AGRI

Camera parameter calibration method based on GPS

InactiveCN107464264AAvoid low precisionImprove accuracy and precision
Owner:NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products