Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speaker recognition method and device and electronic equipment

A speaker recognition and speaker technology, applied in the field of speaker recognition method, video processing method, device and electronic equipment, can solve the problems of wrong speech segment, wrong speech segment, wrong speaker identification, etc. The effect of accuracy

Pending Publication Date: 2021-04-13
ALIBABA GRP HLDG LTD
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the speaker identification is correct, the merged speech segment is wrong
[0006] Alternatively, the speaker ids of the two sub-segments that were split above may be identified as id1 and id3. After such identification, not only the merged speech segment is wrong, but also the speaker’s identity is incorrectly identified, and the original only Speech recognition for two user conversations for three users

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speaker recognition method and device and electronic equipment
  • Speaker recognition method and device and electronic equipment
  • Speaker recognition method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0071] In order to solve the problem of low speaker recognition accuracy under the influence of environmental noise and other factors in a multi-person speaking scenario, an embodiment of the present application provides a tool for speaker recognition based on voiceprint features and face features. The identification process is explained below with reference to specific examples.

[0072] The identification objects involved in the embodiments of the present application are mainly video files including audio and images. After the video file to be recognized is obtained, the sound and image can be separated first, and the video file can be split into an audio file that can extract voiceprint features and an image file that can extract face features. For example, the audio file and the image file can be separated from the video file by the multimedia video processing tool FFmpeg (FastForward Mpeg).

[0073] For audio files, voiceprint tracking technology can be used for voice se...

Embodiment 2

[0104] For the live broadcast of the speech, the existing technology can only extract the speech points of different speakers through manual analysis after the end of the live broadcast, and then manually add it to the video file, resulting in a lot of labor cost and time cost in the video processing process. .

[0105] Corresponding to this, this embodiment provides a tool that can automatically process video files, such as figure 1 As shown, it can include a client and a server. The client can be deployed on a terminal device associated with the user, and the server can be deployed on a cloud server, for example, a live broadcast server, and the speaker recognition tool provided in Embodiment 1 can be used as a function of the server. Speaker identity in speech videos and determine the set of speech segments associated with different speakers.

[0106] Combine below figure 2 The flow chart shown is to explain the processing procedure of the video file.

[0107] S101: Th...

Embodiment 3

[0169] This embodiment 3 is corresponding to embodiment 1, provides a kind of speaker recognition method, see Figure 5 , the method may specifically include:

[0170] S201: Separate and obtain an audio file and an image file from the video file to be identified;

[0171] S202: Perform voice segmentation on the audio file, obtain start and end time information and speaker identification information corresponding to at least one voice segment, and perform face recognition on the image file to obtain face recognition results corresponding to different times;

[0172] S203: Perform temporal alignment processing on the speaker identification information and the face recognition result to determine a speaker corresponding to the at least one speech segment.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a speaker recognition method and device and electronic equipment. The method comprises the following steps: separating an audio file and an image file from a to-be-identified video file; performing voice segmentation on the audio file to obtain start-stop time information and speaker identification information corresponding to at least one voice segment, and performing face recognition on the image file to obtain face recognition results corresponding to different time; performing alignment processing on the speaker identification information and the face recognition result in time, and determining a speaker corresponding to at least one voice segment. According to the scheme, the accuracy of speaker recognition can be improved.

Description

technical field [0001] The present application relates to the technical field of speech recognition, and in particular, to a method, device and electronic device for speaker recognition, and a method, device and electronic device for video processing. Background technique [0002] Voiceprint recognition, also known as speaker recognition, can use computer speech processing technology to analyze and process speech signals to determine the identity of the speaker. [0003] For scenarios such as live broadcasts and conferences that may involve multiple speakers, when performing speaker recognition, it is not only necessary to identify who is included in the speech signal, but also to determine the specific speech segments corresponding to each person. Taking the dialogue between user A and user B as an example, you can first perform speech segmentation to find the transition points of speech between different speakers, divide the entire dialogue into several speech segments, an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04N21/233H04N21/234H04N21/439H04N21/44H04N21/81
CPCH04N21/44008H04N21/4394H04N21/23418H04N21/233H04N21/8106
Inventor 王全剑黄鹏李波
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products