Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

An Objective Evaluation Method of Speech Quality Based on Deep Neural Network

A technology of deep neural network and objective evaluation method, applied in speech analysis, instruments, etc., can solve the problems of inability to evaluate variable-speed speech, wideband signal quality evaluation of narrowband signal, and inability to target speech signal quality evaluation, etc., to achieve accurate evaluation results Effect

Active Publication Date: 2019-12-17
INST OF ACOUSTICS CHINESE ACAD OF SCI
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Although the PESQ method has great advantages over other methods, it still has limitations: a) This method relies on a pure reference signal, and it cannot evaluate the quality of the target speech signal without a reference signal; b) The The method is not ideal for evaluating the quality of wideband signals compared to narrowband signals; c) Even in the presence of reference signals, this method cannot evaluate variable speed speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An Objective Evaluation Method of Speech Quality Based on Deep Neural Network
  • An Objective Evaluation Method of Speech Quality Based on Deep Neural Network
  • An Objective Evaluation Method of Speech Quality Based on Deep Neural Network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0093] To test the network model, the present invention makes predictions on test set data. The test set uses the test part of the TIMIT speech library, which contains a total of 1672 pure speech sentences. The speakers are different from the network training set data. The test set data generates noisy speech with the same signal-to-noise ratio as the training set, but its noise The kind is different from the noise samples used in the training set. In the experiment, we predict the test data under each signal-to-noise ratio, and compare the data with the scores obtained by the actual PESQ algorithm. The results are as follows (the data in the table are all under the same signal-to-noise ratio. average value):

[0094] Table 1 Distribution table of predicted results and actual PESQ results

[0095] SNR / dB -40 -30 -20 -10 0 10 20 30 40 forecast result 1.15 1.13 1.18 1.37 1.89 2.56 3.32 3.92 4.31 PESQ results 0.94 0.94 0.99 1.27 1.81 ...

Embodiment 2

[0098] PESQ algorithm can't objectively evaluate the pitch-shifted voice quality even if there is a reference signal. For this reason, in order to prove that the present invention is applicable to this kind of situation, a section of noisy voice with a signal-to-noise ratio of 10 decibels and 3.5 seconds is selected. The signal is evaluated, and its time-domain waveform diagram is as follows Figure 4 As shown, the 1.5x speed variable speed speech signal is as follows Figure 5 As shown, the corresponding pure speech signals of the two are as follows Image 6 with Figure 7 shown. The score given by the actual PESQ algorithm before the speed change is 2.18, and the score predicted by the algorithm is 2.26. After the speed change, the actual PESQ score is 2.07, and the score predicted by the algorithm is 2.16, which has not changed, indicating that the present invention is applicable to this situation.

Embodiment 3

[0100] Considering the generalization ability of the present invention to Chinese speech, the present invention adds part of the THCH30 corpus on the basis of the original training set. In view of the fact that in the implementation example 1, the actual PESQ score has no significant difference for the noisy speech score below -20db, the SNR range selected in this example is -30-40db. The final prediction results of TIMIT test voice and THCH30 test voice are as follows Figure 8 with Figure 9 shown.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a deep-neural-network-based objective evaluation method for speech quality. The method comprises: step one, constructing and training a deep neural network by using data generated based on three kinds of speech features of a noisy speech as the input of the deep neural network and a PESQ score of an actual target speech as an output target of the deep neural network; and step two, inputting a q-frame speech feature vector, as input data, of a to-be-evaluated speech into the trained deep neural network and outputting a quality evaluation score of the to-be-evaluated speech. With the deep-neural-network-based objective evaluation method disclosed by the invention, the target speech quality can be evaluated objectively under the condition of only having a target speechsignal; and the objective evaluation of the target speech is associated with evaluation made by the actual PESQ algorithm highly. The deep-neural-network-based objective evaluation method is suitablefor the variable-speed speech; a phenomenon that the speech quality can not be evaluated objectively because of speech speed changing is avoided; and the evaluation result is accurate. The speech quality can be evaluated directly without a pure reference signal.

Description

technical field [0001] The invention belongs to the field of speech quality evaluation, and in particular relates to an objective speech quality evaluation method based on a deep neural network, which can objectively evaluate target speech quality without a reference speech signal. Background technique [0002] Speech signals will inevitably be distorted after transmission and processing, especially with the emergence of various speech algorithms, such as speech coding, speech enhancement, speech synthesis and channel transmission, etc. Although these algorithms meet the needs of certain applications, but It is also inevitable to cause damage to the voice, and the types of distortion caused by the algorithm cannot be treated as one, resulting in unpredictable impact on voice quality. Voice quality is an important index for testing the performance of voice equipment and algorithms, so how to evaluate voice quality is particularly important. [0003] At present, there are man...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L25/60G10L25/30
CPCG10L25/30G10L25/60
Inventor 李国腾彭任华郑成诗李晓东
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products