Automatic audio summary generation method and device

An automatic generation, audio technology, applied in audio data retrieval, audio data browsing/visualization, character and pattern recognition, etc., can solve the problem of inaccurate audio summary description
CN112784094AActive Publication Date: 2021-05-11AISPEECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
AISPEECH CO LTD
Publication Date
2021-05-11

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses an automatic audio abstract generation method and device, and the method comprises the steps: pre-training a sound event detection model which comprises an audio feature extraction part and an output part; enabling the audio encoder to take the audio feature extraction part as an audio abstract automatic generation model; and training the audio abstract automatic generation model in an end-to-end manner. According to the scheme provided by the embodiment of the invention, a better audio encoder is obtained through pre-training and transfer learning on the sound event detection task, so that more accurate audio abstract description is generated, corresponding text description can be generated for any new audio, the audio-text database is automatically established, and practical application of similar audio retrieval engines based on natural languages in unlimited forms can be supported.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of audio summaries, in particular to a method and device for automatically generating audio summaries. Background technique

[0002] In related technologies, Automated audio captioning (Automated audio captioning, AAC) aims to generate a summary description of an audio clip. Many concepts are described in audio summarization, ranging from local information such as sound events to global information such as the acoustic scene. Currently, the mainstream method of AAC is an end-to-end encoder-decoder structure, and it is hoped that the encoder can automatically learn all the concepts embedded in the audio.

[0003] The automatic audio summary generation task can be based on an input audio, an encoder encodes the audio into a series of vectors, and then a decoder decodes the encoded vectors into natural language summaries. The inventor found in the process of implementing the present application that the gener...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More