Unlock instant, AI-driven research and patent intelligence for your innovation.

A real-time role-based transcription method, device and system

A technology of roles and speech segments, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of grabbing to this side by mistake, feeling unfriendly, unable to obtain recognition effect, etc., and achieve the effect of reducing load

Active Publication Date: 2021-07-20
北京快鱼电子股份公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The current front-end voice collection is often based on deep learning solutions or requires the use of front-end recognition equipment; deep learning-based solutions require a local or cloud voice recognition server to perform voice-to-text transcription on the voice streams collected in real time. In addition to identification, roles often need to be processed in the cloud. It is an end-to-end one-stop solution. This solution is not suitable for simple conversations between the two parties. It is generally suitable for meetings where multiple people speak mode; and this solution has high requirements for the hardware configuration of the cloud or local speech recognition engine, the accuracy of role classification is related to the frequency of the speaker, if in some scenarios, a speaker simply said "yes" , "Okay", and there is no speech in the whole speaking process. At this time, the accuracy of role classification is very low. Therefore, this scheme adopts a large and comprehensive scheme to solve some specific problems, and its existence cannot be analyzed locally. The specific situation to do very specific corresponding defects
[0004] With the help of front-end recognition equipment, microphone arrays are generally used to form two directions of 180 degrees to each other. When transcribing, the two speakers need to speak face to face. The device is placed in the center of the connection between the two parties, which makes people feel unfriendly. The voices collected at both ends are compared in real time. Generally, a solution with a high real-time volume is adopted to determine the voice stream of a certain character; There is often no way to deal with the scene of speaking. This solution is to judge the energy of the speakers at both ends. Which side has the loudest volume will classify the voice as that side. If one side is not a voice but a loud volume If there is no noise, the "speaking right" will be taken to this side by mistake; if the speakers at both ends speak at the same time, the strategy adopted in this scheme is to always select the loud voice on one side and keep it until the end. This is a simple and rude behavior. This solution does not have a good estimate of the problem that this kind of "simultaneous speaking" should be treated specially, and there is a defect that it cannot perform special treatment on "simultaneous speaking", and cannot obtain a better recognition effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A real-time role-based transcription method, device and system
  • A real-time role-based transcription method, device and system
  • A real-time role-based transcription method, device and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] Such as figure 1 with figure 2 As shown, the present invention provides a real-time role-based transcription method, the method comprising the following steps:

[0063] S100: Installing a sound collection device with a directional microphone on the side between the two speakers, respectively collecting the left channel sound signal and the right channel sound signal;

[0064] S200: Detect whether the left channel sound signal and the right channel sound signal contain a speech segment, if the speech segment is detected, then extract the left channel speech segment and the right channel speech segment corresponding to the speech segment;

[0065] Such as image 3 As shown, the specific steps of detecting whether the voice segment is included in the left channel sound signal and the right channel sound signal are: extracting the fundamental frequency and subband energy in the left channel sound signal and the right channel sound signal; according to the high-dimensiona...

Embodiment 2

[0102] Such as Figure 8 As shown, the present invention provides a real-time role-based transcription device, which includes a sound collection device, a voice activity detection (VAD) module, a single-sided speech judgment module, a clustering module, a separation module, and a sending module;

[0103] The sound collection device includes a directional microphone, which is respectively used to collect the left channel sound signal and the right channel sound signal; the directional microphone includes a left sound channel and a right sound channel, and the left sound channel and the right sound track diverge at an angle of 90 degrees ~120 degrees, the spacing is 10cm to 15cm; the angle between the left and right channels and the vertical direction is 40 degrees to 60 degrees.

[0104] The voice activity segment detection module is used to detect whether the left channel sound signal and the right channel sound signal contain a voice segment, if a voice segment is detected, t...

Embodiment 3

[0121] Such as Figure 12 As shown, the present invention provides a real-time role-by-role transcription system, which includes a processor, a left-side speech recognition engine, a right-side speech recognition engine, a network card, and the real-time role-by-role transcription device provided in Embodiment 2, and the processor is connected with The real-time role-based transcription device is connected to the network card, and the network card is respectively connected to the left-channel speech recognition engine and the right-channel speech recognition engine.

[0122] The real-time role-based transcription device sends the left voice signal to the left voice recognition engine, and sends the right voice signal to the right voice recognition engine; compared with the previous transcription system that only sent one signal to the engine, the present invention combines both ends Different signals are independently input to the speech recognition engine; both systems have t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a real-time role-based transcription method, which includes detecting whether the collected left-channel sound signal and right-channel sound signal contain a speech segment, and if the speech segment is detected, extracting the left-channel speech corresponding to the speech segment segment and the right channel speech segment; based on the phase difference, amplitude difference and fundamental frequency detection of the left channel speech segment and the right channel speech segment, it is judged whether the speech is on both sides; if it is one-sided speech, it is judged whether the speaker is on the side or the right side; if the speaker is on the left side, the left channel speech segments are clustered to form the left cluster center; if the speaker is on the right side, the right channel speech segments are clustered to form the right clustering center Class center; if it is a double-sided speech, separate the left and right side speech signals contained in the left channel speech segment and the right channel speech segment; and send the separated left speech signal and right speech signal to the speech recognition engine; This method can achieve accurate separation of characters. The invention also discloses a real-time role-based transcription device and system.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a real-time role-based transcription method, device and system. Background technique [0002] Nowadays, in the process of one-to-one window service, the service quality requirements for customer service are getting higher and higher. In such occasions (such as telecom business halls and automobile 4S stores), customer service and customers are often located on the inside and outside of the service counter or window. , to carry out one-on-one dialogue service activities; now there are various assessments on the service quality of customer service, including service words, professional terminology expressions, service attitudes, emotions, and whether to guide customers correctly. The content is clearly collected, and the back-end is used for speech recognition. The back-end processing generally uses a real-time transcription system, and then organizes and analyzes the tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/04G10L15/08G10L15/26G10L25/03G10L25/51
CPCG10L15/04G10L15/08G10L25/03G10L25/51
Inventor 袁斌
Owner 北京快鱼电子股份公司