Method and system for eliminating channel difference in speech interaction, electronic equipment and medium
A channel difference and voice interaction technology, which is applied in voice analysis, voice recognition, instruments, etc., can solve problems such as the degradation of back-end recognition performance, and achieve the effect of eliminating channel differences and improving accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0078] This embodiment provides a method for eliminating channel differences in voice interaction, such as figure 2 shown, including:
[0079] During the training phase of the speech model:
[0080] Step S101 , extracting cepstral features from the training corpus in each scenario.
[0081] Among them, different scenes can be set for different places, such as offices, squares, homes, subway stations, etc. The training corpus can be recorded in different scenarios.
[0082] Step S102. Calculate the cepstrum mean value of the background environment signal in the corresponding scene according to the cepstrum feature.
[0083] Step S103, using the cepstral feature of the speech signal in the training corpus to subtract the cepstral mean value of the background environment signal to obtain a normalized cepstral sequence, and using the cepstral sequence to train the speech model; wherein, the The speech signal includes a background environment signal.
[0084] The process of o...
Embodiment 2
[0129] This embodiment provides a system 400 for eliminating channel differences in voice interaction, such as Figure 4 As shown, it includes: the first extraction module 411, the first calculation module 412 and the first normalization module 413 for the speech model training stage, and the second extraction module 421 and the second calculation module for the speech model use stage 422 and the second normalization module 423 .
[0130] The first extraction module 411 is used for extracting cepstral features for the training corpus in each scene.
[0131] The first calculation module 412 is configured to calculate the cepstrum mean value of the background environment signal in a corresponding scene according to the cepstrum feature.
[0132] The first normalization module 413 is used to subtract the cepstral mean value of the background environment signal from the cepstral feature of the speech signal in the training corpus to obtain a normalized cepstral sequence, and use ...
Embodiment 3
[0144] Figure 5 A schematic structural diagram of an electronic device provided in this embodiment. The electronic device includes a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the method for eliminating channel differences in voice interaction in Embodiment 1 is implemented. Figure 5 The electronic device 3 shown is only an example, and should not impose any limitation on the functions and application scope of the embodiments of the present invention.
[0145] The electronic device 3 may be in the form of a general computing device, eg it may be a server device. Components of the electronic device 3 may include but not limited to: the at least one processor 4 mentioned above, the at least one memory 5 mentioned above, and the bus 6 connecting different system components (including the memory 5 and the processor 4 ).
[0146] The bus 6 includes a data bus, an address bus and a cont...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


