Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

95 results about "Audio segmentation" patented technology

Sound device and sound control device

The sound device includes an audio-information output unit, an analysis unit, an audio-division-spectrum output unit, a noise-division-spectrum output unit and a correction unit. The analysis unit receives audio information from the audio-information output unit, and then outputs sound spectrum information. The noise-division-spectrum output unit outputs sound-volume information for each critical band width of a noise, and the audio-division-spectrum output unit outputs the sound-volume information for each critical band width of the sound-spectrum information. The correction unit corrects the information from the audio-division-spectrum output unit based on the information from the noise-division-spectrum output unit. The audio-signal properties can be well corrected corresponding to the auditory-sense properties of the human, and thus the audio sound, in which an uncomfortable feeling to the auditory sense of the human has been adequately controlled, can be transmitted to a user.
Owner:KAWASAKI HEAVY IND LTD

Apparatus and method for classification and segmentation of audio content, based on the audio signal

An apparatus for classifying an input audio signal into audio contents of a first and second class, comprising an audio segmentation module adapted to segment said input audio signal into segments of a predetermined length; a feature computation module adapted to calculate for the segments features characterizing said audio input signal; a threshold comparison module adapted to generate a feature vector for each of said one or more segments based on a plurality of predetermined thresholds, the thresholds including for each of the audio contents of the first class and of the second class a substantially near certainty threshold, a substantially high certainty threshold, and a substantially low certainty threshold; and a classification module adapted to analyze the feature vector and classify each one of said one or more segments as audio contents of the first class, of the second class, or as non-decisive audio contents.
Owner:WAVES AUDIO

Complex audio segmentation clustering method based on bottleneck feature

The invention discloses a complex audio segmentation clustering method based on a bottleneck feature. The method comprises the steps that a deep neural network with a bottleneck layer is constructed; a complex audio stream is read, and endpoint detection is carried out on the complex audio stream; the audio feature of a non-silent segment is extracted and input into the deep neural network; the bottleneck feature is extracted from the bottleneck layer of the deep neural network; the bottleneck feature is used as input, and an audio segmentation method based on the Bayesian information criterion is used, so that each audio segment contains only one kind of audio type and adjacent audio segments have different audio types; a spectral clustering algorithm is used to cluster segmented audio segments to acquire the number of audio types of complex audios; and the audio segments of the same audio type are merged together. According to the invention, the used bottleneck feature is a deep transform feature, can more effectively describe the feature difference of the complex audio type than a traditional audio feature, and acquires an excellent effect in complex audio segmentation clustering.
Owner:SOUTH CHINA UNIV OF TECH

Mega speaker identification (ID) system and corresponding methods therefor

A memory storing computer readable instructions for causing a processor associated with a mega speaker identification (ID) system to instantiate functions including an audio segmentation and classification function (F10) receiving general audio data (GAD) and generating segments, a feature extraction function (F12) receiving the segments and extracting features based on mel-frequency cepstral coefficients (MFCC) therefrom, a learning and clustering function (14) receiving the extracted features and reclassifying segments, when required, based on the extracted features, a matching and labeling function (16) assigning a speaker ID to speech signals within the GAD, and a database function for correlating the assigned speaker ID to the respective speech signals within the GAD. The audio segmentation and classification function can assign each segment to one of N audio signal classes including silence, single speaker speech, music, environmental noise, multiple speaker's speech, simultaneous speech and music, and speech and noise.
Owner:KONINKLIJKE PHILIPS ELECTRONICS NV

Effective Audio Segmentation and Classification

A method (400) and system (200) for classifying a audio signal are described. The method (400) operates by first receiving a sequence of audio frame feature data, each of the frame feature data characterising an audio frame along the audio segment. In response to receipt of each of the audio frame feature data, statistical data characterising the audio segment is updated with the received frame feature data. The received frame feature data is then discarded. A preliminary classification for the audio segment may be determined from the statistical data. Upon receipt of a notification of an end boundary of the audio segment, the audio segment is classified (410) based on the statistical data.
Owner:CANON KK

Audio restoration apparatus and audio restoration method

An audio restoration apparatus which restores an audio to be restored having a missing audio part and being included in a mixed audio. The audio restoration apparatus includes: a mixed audio separation unit which extracts the audio to be restored included in the mixed audio; an audio structure analysis unit which generates at least one of a phoneme sequence, a character sequence and a musical note sequence of the missing audio part in the extracted audio to be restored, based on an audio structure knowledge database in which semantics of audio are registered; an unchanged audio characteristic domain analysis unit which segments the extracted audio to be restored into time domains in each of which an audio characteristic remains unchanged; an audio characteristic extraction unit which identifies a time domain where the missing audio part is located, from among the segmented time domains, and extract audio characteristics of the identified time domain in the audio to be restored; and an audio restoration unit which restores the missing audio part in the audio to be restored, using the extracted audio characteristics and the generated one or more of phoneme sequence, character sequence and musical note sequence.
Owner:SOVEREIGN PEAK VENTURES LLC

Video music matching method and device, electronic equipment and computer readable medium

The invention provides a video music matching method and device, electronic equipment and a computer readable storage medium, and relates to the technical field of audio processing. The method comprises the following steps: acquiring a video to be matched with music and an audio for matching with music; respectively obtaining a video segmentation point of the video and an audio segmentation pointof the audio; dividing the video into video clips according to the video segmentation points; dividing the audio into audio clips of which the number is the same as that of the video clips according to the audio segmentation points; adjusting the playing speed of each video clip or the playing speed of each audio clip, so that the playing durations of each video clip and each audio clip are the same in a one-to-one correspondence manner according to the playing sequence; and connecting the adjusted video clips according to a playing sequence to obtain a target video, connecting the adjusted audio clips according to the playing sequence to obtain a target audio, and jointly playing the target video and the target audio. Video picture features and music rhythm features in audios can be automatically and effectively combined, and the watching immersion of users is improved.
Owner:BEIJING BYTEDANCE NETWORK TECH CO LTD

Video processing method and device, electronic equipment and storage medium

ActiveCN110213670AAvoid too long questionsAvoid long questionsSelective content distributionFeature vectorScene segmentation
The invention provides a video processing method and device, electronic equipment and a storage medium. The video processing method comprises the steps of obtaining a to-be-processed video, and dividing the to-be-processed video into a plurality of units of to-be-processed videos; obtaining a scene feature vector and an audio feature vector corresponding to each unit of to-be-processed video; determining scene pre-segmentation points according to the scene feature vectors corresponding to every two adjacent units of to-be-processed videos, and determining audio pre-segmentation points according to the audio feature vectors corresponding to every two adjacent units of to-be-processed videos; performing scene segmentation on the to-be-processed video according to the scene pre-segmentation point, and searching a video clip of which the duration exceeds a set maximum duration threshold from video clips obtained by scene segmentation to serve as a to-be-segmented video clip; and carrying out audio segmentation on the to-be-segmented video clip according to the audio pre-segmentation point to obtain a segmented video clip. According to the invention, the accuracy of splitting is improved, and the requirements of users are better met.
Owner:BEIJING QIYI CENTURY SCI & TECH CO LTD

Video structuring method based on multi-dimensional segmentation

The invention mainly provides a video structuring method based on multi-dimensional segmentation. The video structuring method specifically comprises the steps of 1, analyzing a video; 2, extracting akey frame in scene segmentation; 3, segmenting the scene based on the key frame; 4, segmenting the video audio; 5, performing semantic segmentation on the video; and 6, taking the information entropyas a segmentation rule of an objective function. According to the invention, after the same video segment is segmented in three dimensions of scene, sound and text, the segmentation rule is evaluatedin the form of information entropy. Compared with other video structuring methods, the video is well segmented in the image dimension by combining the change of the pixels in the image sequence in the time domain, the correlation between the adjacent frames and the corresponding relation between the previous frame and the current frame, the key information of the video is reserved, and an effective video structuring method can be provided.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Audio segmentation and classification

A portion of an audio signal is separated into multiple frames from which one or more different features are extracted. These different features are used, in combination with a set of rules, to classify the portion of the audio signal into one of multiple different classifications (for example, speech, non-speech, music, environment sound, silence, etc.). In one embodiment, these different features include one or more of line spectrum pairs (LSPs), a noise frame ratio, periodicity of particular bands, spectrum flux features, and energy distribution in one or more of the bands. The line spectrum pairs are also optionally used to segment the audio signal, identifying audio classification changes as well as speaker changes when the audio signal is speech.
Owner:MICROSOFT TECH LICENSING LLC

A system and method for quickly playing multimedia information

The invention discloses a system and a method, which can quickly play multimedia information. The invention includes: a quick audio decoding module used to quickly decode audio signal; a time domain audio segmentation module used to segment the pulse coding signal which is quickly decoded from the audio signal by the quick audio decoding module; a fast-forward and fast-backward sorting module used to sort the audio segments after the time domain audio segmentation module conducts segmentation; an audio time domain deleting and coupling module used to determine cross correlations among audio segments and further execute coupling and deletion after the fast-forward and fast-backward sorting module conducts sorting; and an audio-video separation module used to separate audio and video from the multimedia information before the audio signal is quickly decoded. The invention solves the problem during quick synchronous play that the accompaniment of high quality sound can not be realized when the multimedia information is fast forward and fast backward.
Owner:COMMUNICATION UNIVERSITY OF CHINA

Segmentation clustering method and system for multi-person voice in complex environment

The invention discloses a segmentation clustering method and system for multi-person voice in a complex environment. The method comprises the following steps of: acquiring multiple continuous multi-person speaking voice segment audios according to multi-person speaking audios; and normalizing the multi-person speaking voice segment audios according to acoustic features to obtain normalized audios;acquiring multiple sections of to-be-processed audios; extracting voiceprint information characteristics of the multiple sections of to-be-processed audios; acquiring scores among all the to-be-processed audio segments by setting scoring criteria; according to the similarity scores among all the to-be-processed audio segments, acquiring category labels of a plurality of persons through a multi-stage redundant clustering algorithm; and segmenting and clustering the multi-person speaking audios according to the category labels of the plurality of persons. By using the redundant clustering method, the clustering center of a target speaker can be improved to be more dispersed, and the distinction degree is higher. And for an unclear voice segment of the target speaker in a complex environment, a better discrimination capability is realized, so that the classification error of speaker classification in a segmentation clustering task in the complex environment is reduced.
Owner:AISPEECH CO LTD

System and method using blind change detection for audio segmentation

A system, method and computer program product for performing blind change detection audio segmentation that combines hypothesized boundaries from several segmentation algorithms to achieve the final segmentation of the audio stream. Automatic segmentation of the audio streams according to the system and method of the invention may be used for many applications like speech recognition, speaker recognition, audio data mining, online audio indexing, and information retrieval systems, where the actual boundaries of the audio segments are required.
Owner:IBM CORP

An audio classification method and system based on an SVM

The invention belongs to the technical field of audio data analysis, and discloses an audio classification method and system based on an SVM. The audio automatic classification and segmentation is animportant means for extracting the structured information and semantic content from audio and is the basis for understanding, analyzing and retrieving the audio content. In essence, classification ofthe audio data is a pattern recognition problem and comprises two basic aspects of feature extraction selection and classification. How to extract the information most capable of representing audio signal characteristics from the audio signals is crucial for audio classification; the audio feature extraction can be based on an audio frame feature analysis and extraction method and an audio-based feature analysis and extraction method; and in a method of extracting these characteristics, characteristics of audio are extracted using time domain characteristics and frequency domain characteristics, respectively. An SVM-based audio classification algorithm has a good classification effect, and the smooth audio segmentation result is more accurate.
Owner:CHONGQING UNIV OF EDUCATION

Audio segmentation and classification

A portion of an audio signal is separated into multiple frames from which one or more different features are extracted. These different features are used, in combination with a set of rules, to classify the portion of the audio signal into one of multiple different classifications (for example, speech, non-speech, music, environment sound, silence, etc.). In one embodiment, these different features include one or more of line spectrum pairs (LSPs), a noise frame ratio, periodicity of particular bands, spectrum flux features, and energy distribution in one or more of the bands. The line spectrum pairs are also optionally used to segment the audio signal, identifying audio classification changes as well as speaker changes when the audio signal is speech.
Owner:MICROSOFT TECH LICENSING LLC

Audio segmentation method and device

The embodiment of the invention discloses an audio segmentation method and a device, and the method comprises the steps: target characteristic value of target audio can be extracted according to a preset characteristic extraction algorithm; according to the target characteristic value, the target audio is segmented into a target voice part and a target mute part; the target characteristic value serves as an input parameter of a preset Gaussian model, and a posterior probability for the target audio can be obtained; according to the posterior probability and a preset classification model, the target voice part is segmented, a target music part and a non-target music part are obtained, wherein the preset classification model is a classification model based on multi-characteristic fusion and context association; according to the target mute part, the target music part and the non-target music part generate a segmentation result for the target audio. According to the invention, the audio can be segmented into a mute part, a music part and a non-music part.
Owner:BEIJING QIYI CENTURY SCI & TECH CO LTD

Data processing method and device, electronic equipment and storage medium

The embodiment of the invention discloses a data processing method and device, electronic equipment and a storage medium. The method comprises the following steps: collecting a video stream, and extracting audio data and video data from the video stream; segmenting the audio data to obtain an audio segmentation result; performing voice processing on each voice segment in the audio segmentation result to obtain a recognition result segment corresponding to the corresponding voice segment; adding a recognition result fragment corresponding to each voice fragment in the at least one voice fragment to a video fragment corresponding to the corresponding voice fragment determined from the video data; combining the at least one video clip added with the voice with the video clip corresponding tothe at least one mute clip to obtain a target video stream; wherein the video clip corresponding to the voice clip is used for presenting when the recognition result clip corresponding to the voice clip is played, and the video clip corresponding to the mute clip is used for presenting when the mute clip is played.
Owner:GUANGDONG OPPO MOBILE TELECOMM CORP LTD

Paragraph association rule evaluation method based on multi-dimensional element video segmentation

The invention mainly provides a paragraph association rule evaluation method based on multi-dimensional element video segmentation. The paragraph association rule evaluation method specifically comprises the steps of 1, carrying out video analysis; 2, extracting a key frame in scene segmentation; step 3, carrying out scene segmentation based on the key frame; step 4, carrying out video audio segmentation; step 5, performing semantic segmentation on the video; 6, judging the paragraph association rule of the segmented video of the GNN network; and step 7, constructing an association network. According to the method, after the same video segment is subjected to multi-dimensional segmentation, a paragraph association rule construction mode is adopted to carry out matching on corresponding multi-dimensional elements. Compard with other paragraph association rule evaluation methods for video segmentation, the video is well segmented in the image dimension by combining the change of the pixels in the image sequence in the time domain and the correlation between the adjacent frames, the key information of the video is reserved, and an effective multi-dimensional element video segmentationparagraph association rule judgment method can be provided.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Audio segmentation method and system

The invention relates to an audio segmentation method and system. The method comprises following steps: reading each audio frame of the audio data to be segmented, respectively extracting features ofeach audio frame to obtain audio signal features corresponding to each audio frame; inputting the audio signal features into a pre-trained audio classifier, respectively calculating probability valuesof audio frames corresponding to the audio signal features belonging to audio classes, and obtaining a target audio category to which the audio frame corresponding to the audio signal feature belongsaccording to the probability values; performing audio-segmentation on the audio data according to a target audio category to which each audio frame belongs. The audio segmentation method and system can segment audio data into small pieces, and the audio segmentation accuracy is high.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD

Video resource synthesis method and device, storage medium and electronic device

ActiveCN112188307AImprove shooting efficiencySolve technical problems with low shooting efficiencySelective content distributionComputer graphics (images)Synthesis methods
The invention discloses a video resource synthesis method and device, a storage medium and an electronic device. The video resource synthesis method comprises the steps of: acquiring an audio segmentation request on a client, wherein the audio segmentation request is used for requesting to segment a target audio resource into a plurality of audio clips; segmenting the target audio resource into aplurality of audio clips in response to the audio segmentation request on the client; under the condition that a video shooting request is received, sequentially shooting a plurality of video clips inone-to-one correspondence with the plurality of audio clips on the client in response to the video shooting request, wherein the playing duration of each video clip in the plurality of video clips isthe same as the playing duration of the audio clip corresponding to each video clip; and synthesizing the target audio resource or the plurality of audio clips and the plurality of video clips through the client to obtain the target video resource. According to the video resource synthesis method and the device, the technical problem of low video shooting efficiency in related technologies is solved.
Owner:TENCENT TECH (SHENZHEN) CO LTD

In-vehicle safety monitoring and help seeking system based on automobile data recorder and crying recognition

The invention belongs to the technical field of safety monitoring, and discloses an in-vehicle safety monitoring and help seeking system based on an automobile data recorder and crying recognition. The system comprises an audio input module which is used for collecting a basic cry sample, wherein when an automobile flames out, the automobile data recorder triggers a parking monitoring state, a child safety monitoring help seeking system is started, and a built-in audio receiver receives a sound signal, carries out audio segmentation processing and stores audio clips, and a voice recognition module carries out next-step recognition; the voice recognition module which is used for confirming and recognizing the acquired audio signal; and a signal sending module which is used for sending a help seeking message to a vehicle owner through a wireless network of the automobile data recorder so as to complete the external output function of the help seeking signal. According to the invention, the automobile data recorder widely used in the existing automobile or school bus is fully utilized, a mobile phone dialing system and a crying recognition system are integrated, and the system can beintegrated with the Internet of Vehicles, so that corresponding responsible persons can be timely notified when crying occurs, so that trapped infants in the automobile can be timely discovered.
Owner:WUHAN UNIV OF SCI & TECH

Audio message segmentation method and device, storage medium and electronic equipment

The invention relates to an audio message segmentation method and device, a storage medium and electronic equipment. The technical problems that in the prior art, when positioning and broadcast progress control are performed on the specific part in audio messages in the prior art, operation difficulty is large, and the accuracy is low are solved. The method comprises the steps that it is determined that a first audio message received by an instant messaging application contains human voices through a preset human voice recognition algorithm; if it is determined that the first audio message hashuman voices, the first audio message is converted into one or more segments of second audio messages through the preset audio segmentation algorithm, and the second audio messages are composed of continuous human voices. By dividing the audio messages into one or more audio messages only containing continuous human voices, a user can precisely position and repeatedly receive paragraphs containing human voices in the audio messages, the control difficulty of the audio message broadcasting progress is lowered, and the user experience is improved.
Owner:BEIJING SANKUAI ONLINE TECH CO LTD

Audio segmentation method based on signal energy spike identification

The invention relates to an audio segmentation method based on signal energy spike identification. The audio segmentation method comprises the steps of carrying out short-time Fourier transform on aninput audio signal, and converting the input audio signal into a power spectrum matrix; extracting intermediate frequency energy characteristics based on a power spectrum; carrying out peak identification on the extracted intermediate frequency energy characteristics; carrying out error division correction on the signal after peak identification; and outputting a time coordinate of the division point of the audio signal. The audio segmentation method does not need to set a threshold value, does not need to be trained in advance, analysis can be realized on the basis of audio signals in real time quickly and accurately, the method can be deployed at the edge end, other operating parameters do not need to be accessed, and parameter-free dynamic segmentation is basically realized.
Owner:CYBERINSIGHT TECH CO LTD

Audio segmentation based on spatial metadata

A method of encoding adaptive audio, comprising receiving N objects and associated spatial metadata that describes the continuing motion of these objects, and partitioning the audio into segments based on the spatial metadata. The method encodes adaptive audio having objects and channel beds by capturing a continuing motion of a number N objects in a time-varying matrix trajectory comprising a sequence of matrices, coding coefficients of the time-varying matrix trajectory in spatial metadata to be transmitted via a high-definition audio format for rendering the adaptive audio through a number M output channels, and segmenting the sequence of matrices into a plurality of sub-segments based on the spatial metadata, wherein the plurality of sub-segments are configured to facilitate coding of one or more characteristics of the adaptive audio.
Owner:DOLBY LAB LICENSING CORP

Montage video determination method, device and equipment and storage medium

The embodiment of the invention discloses a montage video determination method and device, equipment and a storage medium. The method comprises the following steps: according to a preset background audio, sequentially determining a first video clip set corresponding to each audio clip, wherein the audio clips are obtained by segmenting a preset background audio according to the audio key points; determining a first video feature representation data set corresponding to each first video clip set based on the first video features, the first video features comprising at least two video features;according to each first video feature representation data set, determining each first video clip used for montage, wherein every two adjacent first video clips belong to a first video clip set corresponding to two adjacent audio clips; and sequentially splicing the adjacent first video clips by using a video splicing algorithm to obtain a target montage video. According to the technical scheme ofthe embodiment of the invention, the coherence of the montage video and the visual experience of a user are improved.
Owner:BEIJING BYTEDANCE NETWORK TECH CO LTD

Power equipment environment noise identification method based on time domain and frequency domain self-similarity

The invention discloses a power equipment environment noise identification method based on time domain and frequency domain self-similarity. The method comprises the following steps: firstly, acquiring an operation sound signal of power equipment to be monitored; segmenting the collected audio into minute-level recording samples, setting a proper frame length, framing each sample, and extracting time domain and frequency domain features of each frame; and performing similarity analysis on the features by utilizing a clustering-based similarity analysis method, and considering that the sampleswhich can only be clustered into one class cluster have time domain and frequency domain self-similarity characteristics, otherwise, the samples do not have similarity. When the recording sample has time domain and frequency domain self-similarity, the recording sample is reserved; otherwise, the recording sample is rejected. According to the method, the recording samples without time domain and frequency domain self-similarity noise interference can be effectively recognized and eliminated, effective samples are screened out, and support is provided for subsequent recognition of the operationstate of the power equipment based on sound signals.
Owner:CHANGSHA UNIVERSITY OF SCIENCE AND TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products