Sound source identification method and device, server and storage medium

A recognition method and sound source technology, applied in instruments, speech analysis, etc., can solve the problems of low voiceprint description accuracy, high model maintenance cost, and occupied resources, etc., and achieve easy calculation, easy storage, and individual representation. strong effect

Inactive Publication Date: 2020-04-21
铭迅(北京)信息技术有限公司
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the face of voiceprint recognition of multiple voice signals, a model needs to be established for each voice signal, and the amount of calculation is also very large. For server clusters, it takes up more resources; and a large number of models The maintenance cost is also high
In addition, in the process of MFCC feature extraction, a lot of effective information is also screened out, which reduces the accuracy of the voiceprint description in the voice signal, so the accuracy of separating and classifying multiple voice signals generated by multiple sound sources is relatively low. Difference

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sound source identification method and device, server and storage medium
  • Sound source identification method and device, server and storage medium
  • Sound source identification method and device, server and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0059] figure 1 The flow chart of the sound source identification method provided by Embodiment 1 of the present invention specifically includes the following steps:

[0060] Step 110, acquire the spectrogram of the speech signal, the spectrogram is a spectrogram or an energy spectrogram, the number of the speech signals is at least two, and the spectrogram corresponds to the speech signal one by one;

[0061] In this embodiment, each speech signal corresponds to a spectrogram.

[0062] Exemplarily, when the spectrogram is a speech spectrogram, specifically, the time-domain function of the speech signal can be obtained first, and the time-domain function is subjected to Fourier transform or Laplace transform to obtain the frequency-domain function. Preferably, the The frequency domain function is obtained by short-time Fourier transform (STFT) of the time domain function, and then the waveform diagram drawn according to the frequency domain function is the speech spectrogram....

Embodiment 2

[0091] Figure 8 The flow chart of the sound source identification method provided by Embodiment 1 of the present invention specifically includes the following steps:

[0092] Step 210. Acquire sound signals according to the collected recordings, where the sound sources of the recordings include at least two.

[0093] In this embodiment, the recording is a recording of at least two people speaking, and preferably the recording is a telephone recording of at least two people speaking. Exemplarily, when the user dials the service party's telephone, the customer service of the telephone recording service party and the user's telephone recording .

[0094] Step 220: Filter out the silent segment in the sound signal, and divide the sound signal into at least two speech signals.

[0095] In this embodiment, when at least two people speak, it usually starts after one person finishes speaking and the other person listens to the former person's words. The voice signal obtained after...

Embodiment 3

[0123] The sound source identification device provided by the embodiment of the present invention can execute the sound source identification method provided by any embodiment of the present invention, see Figure 11 , the identification device 3 of the sound source specifically includes:

[0124] A spectrogram acquisition module 31, configured to acquire a spectrogram of a speech signal, the spectrogram is a spectrogram or an energy spectrogram, the number of the speech signals is at least two, and the spectrogram corresponds to the speech signal one by one ;

[0125] An identity encoding vector acquisition module 32, configured to input the spectrogram into an identity encoding model to obtain an identity encoding vector of the spectrogram;

[0126] The speech signal summarization module 33 is configured to sum up the speech signals corresponding to at least one same sound source according to the identity coding vector.

[0127] In an alternate embodiment, see Figure 12 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sound source identification method and device, a server and a storage medium. The sound source identification method comprises the following steps: spectrograms of voice signals are acquired, the spectrograms are spectrograms or energy spectrograms, the number of the voice signals is at least two, and the spectrograms are in one-to-one correspondence with the voice signals; by inputting the spectrograms into an identity coding model, identity coding vectors of the spectrograms are obtained; and the voice signals corresponding to at least one same sound source are summarized according to the identity coding vectors. According to the technical scheme, the effects of reducing the calculation difficulty of recognizing and summarizing the voice signals corresponding tothe at least one same sound source according to the identity code vectors of the spectrograms corresponding to the voice signals of the user and consuming resources and occupying space are achieved.And meanwhile, the accuracy of separating and classifying the voice signals generated by the plurality of sound sources is also improved.

Description

technical field [0001] Embodiments of the present invention relate to a sound source identification technology, and in particular, to a sound source identification method, device, server, and storage medium. Background technique [0002] In common conversation scenarios, especially in telephone services, telephone recordings will be retained by the service company as the basis for communicating with customers. In the process of analyzing and processing telephone recordings, we usually pay more attention to the voice of customers, especially In financial services, further analysis of customer voice signals can also play a role in user identity authentication. Therefore, it becomes very important to separate and aggregate the voices of each customer in a telephone recording. A person's voiceprint (used to express voice features) is as unique as a face, fingerprint, iris and other biological characteristics, so different speakers can be distinguished according to the voiceprint...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L17/06G10L17/02G10L25/03G10L25/48G10L21/0272
CPCG10L17/06G10L17/02G10L25/03G10L25/48G10L21/0272
Inventor 杨楠
Owner 铭迅(北京)信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products