Unlock instant, AI-driven research and patent intelligence for your innovation.

Real-time role separation transcription method, device and system

A technology of roles and speech segments, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of mistakenly grabbing to this side, failure to obtain recognition effect, and no way to deal with it, and achieve the effect of reducing load

Active Publication Date: 2021-03-19
北京快鱼电子股份公司
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The current front-end voice collection is often based on deep learning solutions or requires the use of front-end recognition equipment; deep learning-based solutions require a local or cloud voice recognition server to perform voice-to-text transcription on the voice streams collected in real time. In addition to identification, roles often need to be processed in the cloud. It is an end-to-end one-stop solution. This solution is not suitable for simple conversations between the two parties. It is generally suitable for meetings where multiple people speak mode; and this solution has high requirements for the hardware configuration of the cloud or local speech recognition engine, the accuracy of role classification is related to the frequency of the speaker, if in some scenarios, a speaker simply said "yes" , "Okay", and there is no speech in the whole speaking process. At this time, the accuracy of role classification is very low. Therefore, this scheme adopts a large and comprehensive scheme to solve some specific problems, and its existence cannot be analyzed locally. The specific situation to do very specific corresponding defects
[0004] With the help of front-end recognition equipment, microphone arrays are generally used to form two directions of 180 degrees to each other. When transcribing, the two speakers need to speak face to face. The device is placed in the center of the connection between the two parties, which makes people feel unfriendly. The voices collected at both ends are compared in real time. Generally, a solution with a high real-time volume is adopted to determine the voice stream of a certain character; There is often no way to deal with the scene of speaking. This solution is to judge the energy of the speakers at both ends. Which side has the loudest volume will classify the voice as that side. If one side is not a voice but a loud volume If there is no noise, the "speaking right" will be taken to this side by mistake; if the speakers at both ends speak at the same time, the strategy adopted in this scheme is to always select the loud voice on one side and keep it until the end. This is a simple and rude behavior. This solution does not have a good estimate of the problem that this kind of "simultaneous speaking" should be treated specially, and there is a defect that it cannot perform special treatment on "simultaneous speaking", and cannot obtain a better recognition effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Real-time role separation transcription method, device and system
  • Real-time role separation transcription method, device and system
  • Real-time role separation transcription method, device and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] like figure 1 and figure 2 As shown, the present invention provides a real-time role-based transcription method, the method comprising the following steps:

[0063] S100: Installing a sound collection device with a directional microphone on the side between the two speakers, respectively collecting the left channel sound signal and the right channel sound signal;

[0064] S200: Detect whether the left channel sound signal and the right channel sound signal contain a speech segment, if the speech segment is detected, then extract the left channel speech segment and the right channel speech segment corresponding to the speech segment;

[0065] like image 3 As shown, the specific steps of detecting whether the voice segment is included in the left channel sound signal and the right channel sound signal are: extracting the fundamental frequency and subband energy in the left channel sound signal and the right channel sound signal; according to the high-dimensional featu...

Embodiment 2

[0102] like Figure 8 As shown, the present invention provides a real-time role-based transcription device, which includes a sound collection device, a voice activity detection (VAD) module, a single-sided speech judgment module, a clustering module, a separation module, and a sending module;

[0103] The sound collection device includes a directional microphone, which is respectively used to collect the left channel sound signal and the right channel sound signal; the directional microphone includes a left sound channel and a right sound channel, and the left sound channel and the right sound track diverge at an angle of 90 degrees ~120 degrees, the spacing is 10cm to 15cm; the angle between the left and right channels and the vertical direction is 40 degrees to 60 degrees.

[0104] The voice activity segment detection module is used to detect whether the left channel sound signal and the right channel sound signal contain a voice segment, if a voice segment is detected, then...

Embodiment 3

[0121] like Figure 12 As shown, the present invention provides a real-time role-by-role transcription system, which includes a processor, a left-side speech recognition engine, a right-side speech recognition engine, a network card, and the real-time role-by-role transcription device provided in Embodiment 2, and the processor is connected with The real-time role-based transcription device is connected to the network card, and the network card is respectively connected to the left-channel speech recognition engine and the right-channel speech recognition engine.

[0122] The real-time role-based transcription device sends the left voice signal to the left voice recognition engine, and sends the right voice signal to the right voice recognition engine; compared with the previous transcription system that only sent one signal to the engine, the present invention combines both ends Different signals are independently input to the speech recognition engine; both systems have thei...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a real-time role separation transcription method, which comprises the following steps of: detecting whether an acquired left channel sound signal and an acquired right channelsound signal contain a voice segment or not, and if the voice segment is detected, extracting a left channel voice segment and a right channel voice segment corresponding to the voice segment; judgingwhether the voice segment is a single-side speech or a double-side speech based on the phase difference, the amplitude difference of the left channel voice segment and the right channel voice segmentand fundamental frequency detection if the voice segment is the single-side speech, judging whether a speaker is located at the left side or the right side; if the speaker is located at the left side, clustering the left channel voice segments to form a left clustering center; if the speaker is located on the right side, clustering the right channel voice segments to form a right clustering center; separating left and right side voice signals contained in the left channel voice segment and the right channel voice segment if the voice segment is the double-side speech; and sending the separated left side voice signal and right side voice signal to a voice recognition engine. According to the method, roles can be accurately separated. The invention further discloses a real-time role separation transcription device and system.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a real-time role-based transcription method, device and system. Background technique [0002] Nowadays, in the process of one-to-one window service, the service quality requirements for customer service are getting higher and higher. In such occasions (such as telecom business halls and automobile 4S stores), customer service and customers are often located on the inside and outside of the service counter or window. , to carry out one-on-one dialogue service activities; now there are various assessments on the service quality of customer service, including service words, professional terminology expressions, service attitudes, emotions, and whether to guide customers correctly. The content is clearly collected, and the back-end is used for speech recognition. The back-end processing generally uses a real-time transcription system, and then organizes and analyzes the tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/04G10L15/08G10L15/26G10L25/03G10L25/51
CPCG10L15/04G10L15/08G10L25/03G10L25/51
Inventor 袁斌
Owner 北京快鱼电子股份公司