A method and device for intercepting the voice of a target person in a video

It is a technology of video and target, which is applied in the direction of equipment, computing, selective content distribution, etc., and can solve the problems of high audio clarity, difficult voice interception, and low voice interception efficiency.

Active Publication Date: 2021-08-24
SPEAKIN TECH CO LTD
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The embodiment of the present application provides a method and device for intercepting the voice of a target person in a video, which solves the problem that the current voice separation algorithm has high requirements on the clarity of the audio, and needs to perform noise reduction processing on the audio before performing voice separation. In a noisy environment, the impact of noise is large, the difficulty of voice interception exists, and the technical problems of low efficiency of voice interception

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for intercepting the voice of a target person in a video
  • A method and device for intercepting the voice of a target person in a video
  • A method and device for intercepting the voice of a target person in a video

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0040] This application designs a method and device for intercepting the voice of the target person in the video, which solves the problem that the current voice separation algorithm has high requirements on the clarity of the audio. In the environment, the influence of noise is great, the difficulty of voice interception exists, and the technical problems of low e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the present application discloses a method and device for intercepting the voice of a target person in a video. Using a lip-shaped voice activity detection model, the video frame of the target person in the audio and video file has a voice activity, and the video frame of the target person has not been given the first mark. The video frame of the voice activity is given the second mark, and the first mark sequence is obtained, and then the corresponding voice in the audio and video file is determined according to the first start and end time points of the video frames containing the first mark in the first mark sequence. The second start and end time points of the frame, so as to directly intercept the corresponding voice segment in the audio and video file according to the second start and end time point, and obtain the voice segment file of the target person, realize the separation of human voice, and solve the problem of the current human voice separation algorithm. The audio clarity requirements are high, and the audio needs to be denoised first before vocal separation. In a noisy environment, the impact of noise is large, and there are technical problems such as difficulty in voice interception and low efficiency of voice interception.

Description

technical field [0001] The present application relates to the technical field of speech processing, in particular to a method and device for intercepting the speech of a target person in a video. Background technique [0002] When the public security conducts voiceprint identification, it is necessary to compare the voiceprint of the suspect's voice. When extracting the voiceprint, some collected audio files have a noisy recording environment and many speakers. It is necessary to separate the human voice in the audio. To get the voice of the target person. At present, there is a special vocal separation algorithm, but it has high requirements on the clarity of the audio. It is necessary to perform noise reduction processing on the audio before performing vocal separation. In a noisy environment, the impact of noise is large, and it is difficult to intercept speech. The technical problem of low efficiency of voice interception. Contents of the invention [0003] The embod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): H04N21/439H04N21/845G06K9/62G06K9/00
Inventor 郑棉洲吕莉丽
Owner SPEAKIN TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products