An audio segment transcoding method and device

By employing an audio segmentation transcoding method and utilizing the HLS-TS container format and transcoding constraints, the inefficiency of high-bitrate audio sources and the problem of silent data in distributed transcoding systems are solved. This achieves an efficient and low-complexity transcoding process, improving the continuity of audio files.

CN116543779BActive Publication Date: 2026-06-30HUNAN HAPPLY SUNSHINE INTERACTIVE ENTERTAINMENT MEDIA CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUNAN HAPPLY SUNSHINE INTERACTIVE ENTERTAINMENT MEDIA CO LTD
Filing Date
2023-06-15
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing distributed transcoding systems suffer from low execution efficiency when transcoding high-bitrate, long-duration audio sources in their entirety, and introduce silence data at the junctions of file segments, affecting the listener's continuous viewing experience.

Method used

An audio segmentation transcoding method is adopted. By determining the first transcoding constraint T, the audio is segmented and transcoded under the condition of satisfying the second transcoding constraint. Specific TS segments are discarded and spliced ​​using the HLS-TS container format to avoid the introduction of silence data.

Benefits of technology

It improves transcoding efficiency, reduces computational complexity and storage resource consumption, eliminates silence defects at paragraph transitions, and enhances the audience's continuous viewing experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116543779B_ABST
    Figure CN116543779B_ABST
Patent Text Reader

Abstract

This invention discloses an audio segmentation transcoding method and apparatus, comprising: segmenting the current audio into segments under a first transcoding constraint T, each segment containing a preset number of TS segments, each segment satisfying a second transcoding constraint ensuring that adjacent segments generate overlapping areas of two TS segments; transcoding each segment; discarding the tail TS segment of the first segment, discarding the head and tail TS segments of the middle segments, and discarding the head TS segment of the tail segment, thus obtaining target TS segments; and concatenating the target TS segments according to time order to obtain the target audio. The above process removes TS segments at segment transitions, avoiding the introduction of silent data at these transitions. Furthermore, it operates at the bitstream layer, eliminating the need to construct new source segments, resulting in low computational complexity and improved transcoding efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, and in particular to an audio segmentation transcoding method and apparatus. Background Technology

[0002] Lossy audio coding standards such as AAC, MP3, and Opus are currently the most widely used audio transcoding standards. They all use overlapping time / frequency transform technology, which introduces extra silence segments at the beginning and end of each encoded file. In a distributed transcoding system, audio is transcoded in segments. If the transcoded segments are directly concatenated into the final file, the aforementioned silence data will be introduced at the junctions of the segments, resulting in slight staccato imperfections that affect the listener's continuous listening experience.

[0003] Currently, most distributed transcoding systems use whole-segment transcoding of audio, thus intentionally or unintentionally circumventing this problem. However, transcoding high-bitrate, long-duration audio sources as whole segments will affect the execution efficiency of the task, becoming a bottleneck for distributed transcoding systems. Summary of the Invention

[0004] In view of this, the present invention provides an audio segmentation transcoding method and apparatus to solve the problem that most existing distributed transcoding systems transcode audio segments as a whole, which affects the execution efficiency of high-bitrate, long-duration audio sources and becomes a bottleneck for distributed transcoding systems. The specific solution is as follows:

[0005] An audio segmentation transcoding method includes:

[0006] Determine a first transcoding constraint T associated with the current audio, wherein the ts segment duration of the current audio in the first transcoding constraint T is an integer multiple of the first transcoding constraint T, and the segment start time and segment transcoding duration of the current audio are integer multiples of the ts segment duration;

[0007] Under the condition that the first transcoding constraint T is satisfied, the current audio is segmented to obtain each segment, wherein each segment contains a preset number of TS segments, each segment satisfies the second transcoding constraint, the second transcoding constraint ensures that adjacent segments generate an overlapping area of ​​two TS segments, and each segment is in HLS-TS container format;

[0008] Each paragraph is transcoded. For the first paragraph after transcoding, the tail TS segment is discarded. For the middle paragraphs, the head and tail TS segments are discarded. For the tail paragraph, the head TS segment is discarded, thus obtaining each target TS segment.

[0009] The target TS segments are spliced ​​together in chronological order to obtain the target audio.

[0010] Optionally, in the above method, determining the first transcoding constraint T associated with the current audio includes:

[0011] Obtain the encoded frame length and transcoding sampling rate of the current audio;

[0012] Set transcoding sampling rate / 10 n The value of parameter n is a positive integer, determined based on the transcoding sampling rate.

[0013] Based on T = encoded frame length / 10 n Determine each of the candidate values ​​for the first transcoding constraint T;

[0014] Choose any one of the candidate values ​​that is less than the duration of the current audio source as the first transcoding constraint T. In the first transcoding constraint T, the ts segment duration is an integer multiple of the first transcoding constraint T, and the segment start time and the segment transcoding duration of the current audio are integer multiples of the ts segment duration.

[0015] Optionally, the above method may also include: setting the transcoding duration of the paragraph.

[0016] Setting the paragraph transcoding duration includes:

[0017] Obtain the source duration and number of segments of the current audio; divide the current audio into segments based on the source duration and number of segments to obtain the initial segment transcoding duration;

[0018] The initial segment transcoding duration is adjusted based on the TS segment duration so that the segment transcoding duration is set to an integer multiple of the TS segment duration.

[0019] Optionally, in the above method, segmenting the current audio to obtain individual segments includes:

[0020] The start time of each paragraph is determined based on the paragraph transcoding time.

[0021] The current audio is segmented based on the start time and the segment transcoding duration. Each segment is then further segmented based on the TS segmentation duration to obtain individual paragraphs.

[0022] Optionally, in the above method, determining the start time of each paragraph based on the paragraph transcoding duration includes:

[0023] Set the paragraph start time for the first paragraph to 0;

[0024] The start time of the remaining paragraphs is determined to be the end time of the previous paragraph minus twice the duration of the ts segment.

[0025] Optionally, in the above method, the transcoding of each paragraph, for the first paragraph of each transcoded paragraph, discarding the tail TS fragment, discarding the head and tail TS fragments of the middle paragraphs, and discarding the head TS fragment of the tail paragraph, to obtain each target TS fragment, includes:

[0026] Each paragraph is transcoded based on a preset transcoding standard to obtain transcoded paragraphs.

[0027] Identify the first paragraph, middle paragraphs, and last paragraph in each of the transcoded paragraphs;

[0028] For the first paragraph, discard the tail TS fragment; for the middle paragraph, discard the head and tail TS fragments; for the tail paragraph, discard the head TS fragment, thus obtaining each target TS fragment.

[0029] An audio segmentation transcoding device, comprising:

[0030] The determining module is used to determine the first transcoding constraint T associated with the current audio, wherein the TS segment duration of the current audio in the first transcoding constraint T is an integer multiple of the first transcoding constraint T, and the segment start time and segment transcoding duration of the current audio are integer multiples of the TS segment duration;

[0031] The segmentation module is used to segment the current audio into segments under the condition of satisfying the first transcoding constraint T, thereby obtaining each segment. Each segment contains a preset number of TS segments. Each segment satisfies the second transcoding constraint, which ensures that adjacent segments generate an overlapping area of ​​two TS segments. Each segment is in HLS-TS container format.

[0032] The processing module is used to transcode each paragraph. For the first paragraph of each transcoded paragraph, the tail TS segment is discarded, the middle paragraphs are discarded and the head and tail TS segments are discarded, and the tail paragraph is discarded and the head TS segment is discarded, so as to obtain each target TS segment.

[0033] The splicing module is used to splice the various target TS segments in chronological order to obtain the target audio.

[0034] Optionally, in the aforementioned apparatus, the determining module includes:

[0035] The acquisition unit is used to acquire the encoded frame length and transcoding sampling rate of the current audio.

[0036] The first determining unit is used to set the transcoding sampling rate / 10. n The value of parameter n is a positive integer, determined based on the transcoding sampling rate.

[0037] The second determining unit is used based on T = coded frame length / 10. n Determine each of the candidate values ​​for the first transcoding constraint T;

[0038] The selection unit is used to select any one of the candidate values ​​that is less than the duration of the current audio source as the first transcoding constraint T, wherein the ts segment duration in the first transcoding constraint T is an integer multiple of the first transcoding constraint T, and the segment start time and the segment transcoding duration of the current audio are integer multiples of the ts segment duration.

[0039] Optionally, the determining module in the aforementioned apparatus further includes a setting unit, wherein the setting unit includes:

[0040] The acquisition subunit is used to acquire the source duration and the number of segments of the current audio, and to divide the current audio into segments based on the source duration and the number of segments to obtain the initial segment transcoding duration;

[0041] The adjustment subunit is used to adjust the initial segment transcoding duration based on the TS segment duration, so that the segment transcoding duration is set to an integer multiple of the TS segment duration.

[0042] Optionally, the processing module in the aforementioned apparatus includes:

[0043] The transcoding unit is used to transcode each paragraph based on a preset transcoding standard to obtain transcoded paragraphs.

[0044] The identification unit is used to identify the first paragraph, middle paragraphs, and last paragraph in each of the transcoded paragraphs;

[0045] The processing unit is configured to discard the tail TS fragment for the first paragraph, discard the head and tail TS fragments for the middle paragraph, and discard the head TS fragment for the tail paragraph, thereby obtaining each target TS fragment.

[0046] Compared with the prior art, the present invention has the following advantages:

[0047] This invention discloses an audio segmentation transcoding method and apparatus, comprising: under the condition of satisfying a first transcoding constraint T, segmenting the current audio based on the source duration and the number of segments, determining the start time and transcoding duration of each segment to obtain each segment, each segment containing a preset number of TS segments, each segment satisfying a second transcoding constraint condition, the second transcoding constraint condition ensuring that adjacent segments generate an overlapping area of ​​two TS segments, transcoding each segment, for the first segment of each transcoded segment, discarding the tail TS segment, discarding the head and tail TS segments of the middle segments, and discarding the head TS segment of the tail segment, to obtain each target TS segment; concatenating each target TS segment according to time order to obtain the target audio. The above process deletes TS segments at the segment transitions, avoiding the introduction of the aforementioned silence data at the segment transitions, and operates at the bitstream layer, without needing to construct new source segments, resulting in low computational complexity and improved transcoding efficiency. Attached Figure Description

[0048] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0049] Figure 1 This is a flowchart of an audio segmentation transcoding method disclosed in an embodiment of the present invention;

[0050] Figure 2 This is a schematic diagram illustrating the generation of two overlapping TS segments between paragraphs, as disclosed in an embodiment of the present invention.

[0051] Figure 3 This is a schematic diagram of a multi-segment splicing method disclosed in an embodiment of this application;

[0052] Figure 4 This is a structural block diagram of an audio segmentation transcoding device disclosed in an embodiment of the present invention. Detailed Implementation

[0053] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0054] This invention discloses an audio segmentation transcoding method and apparatus, applied in the process of audio segmentation transcoding. In the prior art, with the continuous evolution of the Internet and mobile networks, there are more and more media transmission platforms, and correspondingly, these platforms support a wider range of media formats. Different platforms have different requirements for content formats, including variations in parameters such as encoding standards, resolution, frame rate, and container formats. Audio and video transcoding technology is the process of converting one audio and video encoding format or attribute into another or more audio and video encoding formats or attributes. Audio and video transcoding is a computationally intensive process, and distributed computing technology is often used to improve transcoding efficiency. In a distributed cluster, audio and video are physically or logically divided into multiple segments, and parallel transcoding is performed on multiple machines. The method described in this embodiment focuses on the audio transcoding portion of distributed transcoding.

[0055] Lossy audio coding standards such as AAC, MP3, and Opus are currently the most widely used audio transcoding standards. They all use overlapping time / frequency transform technology, which introduces extra silence segments at the beginning and end of each encoded file. In a distributed transcoding system, audio is transcoded in segments. If the transcoded segments are directly concatenated into the final file, the aforementioned silence data will be introduced at the junctions of the segments, resulting in slight staccato imperfections that affect the listener's continuous listening experience.

[0056] Most distributed transcoding systems currently use segmented transcoding for video and whole-segment transcoding for audio, thus intentionally or unintentionally avoiding this problem. However, whole-segment transcoding of audio for high-bitrate, long-duration sources will affect the execution efficiency of the task, becoming a bottleneck for distributed transcoding systems. Some commercial systems have reduced the probability of this problem by lengthening the audio segment time, but this does not fundamentally solve the problem of introducing silent data at the segment transitions. Based on the above problems, this invention provides an audio segmented transcoding method that uses the HLS-TS (HTTP Live Streaming Based on TS) intermediate container to solve the stuttering artifact problem introduced by the aforementioned audio segmented transcoding. This method operates at the bitstream layer, without constructing new source segments, and directly performs non-packet-level segment merging to output the finished product after transcoding. It has the advantages of low computational complexity, small storage resource consumption, and easy engineering implementation. Among them, HLS is an adaptive bitrate streaming communication protocol based on HTTP, which is widely supported by media players, web browsers, mobile devices, and streaming media servers, and plays an important role in audio and video live streaming and on-demand applications. HLS consists of an m3u8 index file and media segment files. A common media segment file is the Transport Stream (TS) segment, in which case we call HLS-TS. TS is a standard digital container format defined in the MPEG2 standard, used for transmitting and storing audio, video, and program and system information data.

[0057] HLS-TS manages media fragments using text indexing and offers controllable fragment duration, providing a solid foundation for the implementation of this solution. This solution design allows the HLS-TS files, after transcoding each segment of the source video, to generate an integer number of TS fragment intersections at segment junctions. Combined with text indexing, this enables rapid deduplication and concatenation. Furthermore, HLS-TS, as a universal protocol, is widely supported by tools such as ffmpeg, reducing the difficulty of project implementation.

[0058] The execution flow of the method is as follows: Figure 1 As shown, the steps include:

[0059] S101. Determine the first transcoding constraint T associated with the current audio, where,

[0060] In the first transcoding constraint T, the TS segment duration of the current audio is an integer multiple of the first transcoding constraint T, and the segment start time and segment transcoding duration of the current audio are integer multiples of the TS segment duration;

[0061] In this embodiment of the invention, the encoded frame length, transcoding sampling rate, segment start time, segment transcoding duration, and TS segment duration of the current audio are obtained. The encoded frame length, transcoding sampling rate, and TS segment duration are obtained from a first transcoding constraint T. Preferably, in this embodiment, the TS segment duration is set to 1 times the transcoding constraint T, i.e.:

[0062] TS fragmentation duration S = first transcoding constraint T

[0063] The process for obtaining the paragraph transcoding duration is as follows:

[0064] First, the duration of the current audio source and the number of segments are obtained. The number of segments is generally determined by the distributed system scheduling layer based on system resources. When resources are plentiful, dividing the audio into more segments increases parallelism and results in faster output of the transcoded file. Based on the duration of the current audio source and the number of segments, the current audio is divided to obtain an initial segment transcoding duration. The initial segment transcoding duration = FLOOR(source duration / number of segments). The initial segment transcoding duration is then adjusted based on the TS segment duration S to obtain the final segment transcoding duration. The specific adjustment process is as follows: Segment transcoding duration = CEIL(initial segmentation duration / S) * S. Here, FLOOR is rounded down, CEIL is rounded up, and S is the TS segment duration.

[0065] Furthermore, the process for determining the start time of the paragraph is as follows:

[0066] The paragraph start time for the first paragraph is 0;

[0067] The start time of the remaining paragraphs is the end time of the previous paragraph minus twice the duration of the TS segment.

[0068] The ts segmentation duration mentioned in the first encoding constraint is an integer multiple of T:

[0069] T = coded frame length / 10 n

[0070] Where SampleRate is the transcoding sampling rate; FrameSize is the encoded frame length; n is a positive integer, and satisfies: transcoding sampling rate / 10 nSince the transcoding sampling rate is known, at least one value of parameter n is determined. Among these values, a value of parameter n is selected based on the specific application scenario. Based on the value of parameter n and the encoded frame length, various candidate values ​​of the first transcoding constraint T are determined. Among these candidate values, any one less than the source duration of the current audio is selected as the first transcoding constraint T. In the first transcoding constraint T, the TS segment duration is set to be an integer multiple of the first transcoding constraint T, and the segment start time and the segment transcoding duration of the current audio are set to be integer multiples of the TS segment duration.

[0071] Taking AAC-LC 44.1kHz as an example, the SampleRate is 44100 and the FrameSize is 1024. The possible values ​​for n are 1 or 2. If n is 1, T is 102.4 seconds; if n is 2, T is 10.24 seconds.

[0072] S102. Under the condition of satisfying the first transcoding constraint T, the current audio is segmented to obtain each segment, wherein each segment contains a preset number of TS segments, each segment satisfies the second transcoding constraint, the second transcoding constraint ensures that adjacent segments generate an overlapping area of ​​two TS segments, and each segment is in HLS-TS container format.

[0073] In this embodiment of the invention, under the condition of satisfying the first transcoding constraint, the current audio is segmented to obtain various segments, wherein each segment contains a preset number of TS segments, wherein the preset number is equal to the ratio of the segment transcoding duration to the TS segment duration. Each segment satisfies a second transcoding constraint, which ensures that adjacent segments generate an overlapping area of ​​two TS segments. Each segment is in HLS-TS container format, as illustrated in the attached diagram. Figure 2 As shown, the first paragraph (first paragraph) 0.m3u8 includes: 0.ts, 1.ts, 2.ts, 3.ts, 4.ts, and 5.ts; the second paragraph (middle paragraph) 1.m3u8 includes: 0.ts, 1.ts, 2.ts, 3.ts, 4.ts, and 5.ts; and the third paragraph (end paragraph) 2.m3u8 includes: 0.ts, 1.ts, 2.ts, 3.ts, 4.ts, and 5.ts. Figure 2The start time of the second paragraph is the start time of the second-to-last TS segment of the first paragraph, thus creating an overlap between the two TS segments. The duration of the ending paragraph is not subject to the constraints of the first transcoding step. Each paragraph is transcoded into an HLS-TS container, and m3u8 is the list file for that container. The start time of the first paragraph and the second paragraph 1.m3u8 is the start time of the second-to-last TS segment of the first paragraph 0.m3u8. Assuming the first paragraph duration is T1, the TS segment duration is S1, and the second paragraph start time is T1 - 2*S1.

[0074] S103. Transcode each paragraph. For the first paragraph after transcoding, discard the tail TS segment. For the middle paragraph, discard the head and tail TS segments. For the tail paragraph, discard the head TS segment to obtain each target TS segment.

[0075] In this embodiment of the invention, each segment of each target is transcoded according to the target encoding standard, bitrate, and other parameters, and output as an HLS-TS container format. In this embodiment, the processing of each segment after transcoding is only performed at the bitstream layer, without the need for transcoding or packet-level filtering. The splicing rule is that the first segment discards the tail TS fragment, each intermediate segment discards the head and tail TS fragments, and the tail segment discards the head TS fragment, resulting in each target TS fragment, as illustrated in the diagram below. Figure 3 As shown, after transcoding the source file, an intermediate M3U8 file is obtained. This intermediate M3U8 file includes: a first segment 0.m3u8, a second segment 1.m3u8, and a third segment 2.m3u8. The first segment (initial segment) 0.m3u8 includes 0.ts, 1.ts, 2.ts, 3.ts, 4.ts, and 5.ts. The second segment (intermediate segment) 1.m3u8 includes 0.ts, 1.ts, 2.ts, 3.ts, 4.ts, and 5.ts. Furthermore, each intermediate segment is at least one... When there are multiple intermediate segments, the same processing is performed on each intermediate segment. The third segment (tail segment) 2.m3u8 includes: 0.ts, 1.ts, 2.ts, 3.ts, 4.ts and 5.ts. The 5.ts (tail ts segment) in the first segment 0.m3u8 is discarded, the 0.ts (head ts segment) and 5.ts (tail ts segment) in the second segment 1.m3u8 are discarded, and the 0.ts (head ts segment) in the third segment 2.m3u8 is discarded to obtain each target ts segment.

[0076] S104. The target TS segments are spliced ​​together in chronological order to obtain the target audio.

[0077] In embodiments of the present invention, such as Figure 3As shown, the various target TS segments are concatenated in chronological order to form a finished file (target audio). The finished file can be MP4, TS, or HLS-TS, etc. Preferably, an unspecified container format can be output during concatenation.

[0078] This invention discloses an audio segmentation transcoding method and apparatus, comprising: under the condition of satisfying a first transcoding constraint T, segmenting the current audio based on the source duration and the number of segments, determining the start time and transcoding duration of each segment to obtain each segment, each segment containing a preset number of TS segments, each segment satisfying a second transcoding constraint condition, the second transcoding constraint condition ensuring that adjacent segments generate an overlapping area of ​​two TS segments, transcoding each segment, for the first segment of each transcoded segment, discarding the tail TS segment, discarding the head and tail TS segments of the middle segments, and discarding the head TS segment of the tail segment, to obtain each target TS segment; concatenating each target TS segment according to time order to obtain the target audio. The above process deletes TS segments at the segment transitions, avoiding the introduction of the aforementioned silence data at the segment transitions, and operates at the bitstream layer, without needing to construct new source segments, resulting in low computational complexity and improved transcoding efficiency. This invention introduces an HLS-TS container as an intermediate container. By specifying the segment start time, segment duration, and HLS-TS ts segment duration according to defined rules, the various HLS-TS files generated during transcoding are time-aligned by integer ts segments, creating favorable conditions for subsequent splicing and merging. The process eliminates the need to create new PCM source files for transcoding, reducing memory overhead and implementation complexity. Finally, leveraging the text-based nature of the HLS-TS list file, the final file splicing output can be achieved simply and efficiently. This method eliminates the need for cumbersome source reconstruction and package-level filtering, significantly lowering the engineering implementation threshold. It supports various target sampling rates and target container formats. Extensive testing shows that this method can completely eliminate imperfections at the audio splicing point.

[0079] To illustrate the segmented transcoding method described above, consider an audio source that is 30 minutes long. The external resource requires transcoding in 5 segments, with a transcoding sampling rate of 44.1kHz. The target container is MP4, and ffmpeg is used for AAC audio transcoding. It should be noted that this invention does not specify the number or duration of segments; it only requires that the segment start time, segment transcoding duration, and TS segment duration meet the first transcoding constraint.

[0080] A. Calculate the start time and duration of the paragraph.

[0081] Based on constraint one, the TS segment duration for HLS-TS is selected as 10.24 seconds. The transcoding duration for other segments, excluding the end segment, is calculated as follows:

[0082] Initial paragraph transcoding time: FLOOR(1800 / 5) = 360 seconds, where FLOOR function is for rounding down.

[0083] Paragraph transcoding time: CEIL(360 / 10.24)*10.24=368.64 seconds, where the CEIL function rounds up.

[0084] In this way, we can obtain the start time and duration of each paragraph.

[0085] The first paragraph starts at 0.00 seconds and lasts for 368.64 seconds, ranging from [0.00, 368.64].

[0086] The second segment will overlap with the end of the first segment in terms of two TS segments, therefore its start time is:

[0087] 368.64 - 2 * 10.24 = 348.16 seconds, duration is 368.64 seconds, range [348.16, 716.80].

[0088] The third segment will overlap with the end of the second segment in terms of two TS segments, therefore its start time is:

[0089] 716.80 - 2 * 10.24 = 696.32 seconds, duration is 368.64 seconds, range [696.32, 1064.96].

[0090] The fourth segment will overlap with the end of the third segment in two TS segments, therefore its start time is:

[0091] 1064.96 - 2 * 10.24 = 1044.48 seconds, duration is 368.64 seconds, range [1044.48, 1413.12].

[0092] The fifth segment will overlap with the end of the fourth segment in two TS segments, therefore its start time is:

[0093] 1413.12 - 2 * 10.24 = 1392.64 seconds, range [1392.64, 1800.00].

[0094] B. Use ffmpeg to transcode and generate HLS-TS files for each segment. Examples of transcoding commands are shown below; you can choose the transcoding parameters or tools according to your actual needs.

[0095] First transcoding command:

[0096] ffmpeg-isource.ts-muxdelay 0-vn-t 368.64-acodec aac-b:a 128k-strict-2-ac 2-ar 44100-y-hls_time 10.24-hls_segment_type mpegts-start_number 0

[0097] -hls_list_size 0-hls_segment_filename 0_%d.ts 0.m3u8

[0098] Second paragraph transcoding command:

[0099] ffmpeg-isource.ts-muxdelay 0-vn-ss 348.16-t 368.64-acodec aac-b:a128k-strict-2-ac 2-ar 44100-y-hls_time 10.24-hls_segment_type mpegts-start_number 0-hls_list_size 0-hls_segment_filename 1_%d.ts 1.m3u8

[0100] In the first transcoding command, the `-i` parameter specifies the source file to be transcoded, which is the `source.ts` file in this case. `-muxdelay` specifies that the output timestamp starts from 0. `-vn` tells the user that this transcoding only processes audio data and not video data. The `-t` parameter indicates the transcoding duration; if the `-ss` parameter is not specified, it will start from 0.00 seconds. The `-hls_time` parameter specifies the maximum duration (in seconds) of the TS slice in the output HLS-TS container. The `-acodec` parameter specifies the target encoding standard, `-b:a` specifies the target bitrate, `-ac` specifies the target number of channels, and `-ar` specifies the target sampling rate. The generated HLS index file is `0.m3u8`. The second command is similar, except that it adds the `-ss` parameter to specify the start time of the transcoding. Both commands decode the source audio file and encode it according to the target requirements starting from the specified time, generating a bitstream that is encapsulated in an HLS-TS container. The final output file is an HLS-TS file with the required start time, duration, and target transcoding format.

[0101] C. Filter and merge the various m3u8 list files after transcoding.

[0102] The first paragraph's m3u8 file will contain 366 TS segments (368.64 / 10.24 = 36), discarding the last TS segment. The second to fourth paragraphs also contain 36 TS segments, discarding the first and last TS segments. The last paragraph discards the first TS segment. The remaining TS segments are then merged using the ffmpeg command to output the final MP4 file. The command is as follows:

[0103] ffmpeg -f concat-safe 0-ifilelist.txt -c copy -y output.mp4

[0104] This command reads the media files listed in the filelist.txt file, concatenates them sequentially according to the list, and outputs the file output.mp4. This concatenation process does not transcode the media files in the list; it only copies and concatenates the bitstreams. The text file filelist.txt stores the paths to the retained TS segment files in order. output.mp4 is the final output file; the merging process does not involve transcoding, only changes to the container encapsulation.

[0105] The embodiments of the present invention introduce an intermediate container called HLS-TS and constrain the start time, duration, and segment duration of each segment, which greatly reduces the engineering complexity of solving the problem of defects at the joint of segmented audio transcoding. The method does not require the reconstruction of new PCM source segments, thus saving storage resources.

[0106] Based on the above-described audio transcoding method, this embodiment of the invention provides an audio transcoding device, the structural block diagram of which is shown below. Figure 4 As shown, it includes:

[0107] The module consists of module 201, segmentation module 202, processing module 203, and splicing module 204.

[0108] in,

[0109] The determining module 201 is used to determine a first transcoding constraint T associated with the current audio, wherein the TS segment duration of the current audio in the first transcoding constraint T is an integer multiple of the first transcoding constraint T, and the segment start time and segment transcoding duration of the current audio are integer multiples of the TS segment duration;

[0110] The segmentation module 202 is used to segment the current audio to obtain various segments when the first transcoding constraint T is satisfied. Each segment contains a preset number of TS segments. Each segment satisfies the second transcoding constraint, which ensures that adjacent segments generate an overlapping area of ​​two TS segments. Each segment is in HLS-TS container format.

[0111] The processing module 203 is used to transcode each paragraph. For the first paragraph of each transcoded paragraph, the tail TS segment is discarded, the middle paragraphs are discarded, the head and tail TS segments are discarded, and the tail paragraph is discarded, the head TS segment is discarded, so as to obtain each target TS segment.

[0112] The splicing module 204 is used to splice the various target TS segments in time order to obtain the target audio.

[0113] This invention discloses an audio segmentation and transcoding device, comprising: under the condition of satisfying a first transcoding constraint T, segmenting the current audio based on the source duration and the number of segments, determining the start time and transcoding duration of each segment to obtain each segment, each segment containing a preset number of TS segments, each segment satisfying a second transcoding constraint condition, the second transcoding constraint condition ensuring that adjacent segments generate an overlapping area of ​​two TS segments, transcoding each segment, for the first segment of each transcoded segment, discarding the tail TS segment, discarding the head and tail TS segments of the middle segments, and discarding the head TS segment of the tail segment, to obtain each target TS segment; concatenating each target TS segment according to time order to obtain the target audio. The above process deletes TS segments at the segment transitions, avoiding the introduction of the aforementioned silence data at the segment transitions, and operates at the bitstream layer, without needing to construct new source segments, resulting in low computational complexity and improved transcoding efficiency.

[0114] In this embodiment of the invention, the determining module 201 includes:

[0115] The acquisition unit 205, the first determination unit 206, the second determination unit 207, and the selection unit 208.

[0116] in,

[0117] The acquisition unit 205 is used to acquire the encoded frame length and transcoding sampling rate of the current audio.

[0118] The first determining unit 206 is used to determine the value of parameter n based on the transcoding sampling rate, wherein the transcoding sampling rate / 10 n It is a positive integer;

[0119] The second determining unit 207 is used to determine based on T = encoded frame length / 10 n Determine each alternative value of the first transcoding constraint T, where T is the first transcoding constraint and n is a positive integer;

[0120] The selection unit 208 is used to select any one of the candidate values ​​that is less than the duration of the current audio source as the first transcoding constraint T, wherein the ts segment duration in the first transcoding constraint T is an integer multiple of the first transcoding constraint T, and the segment start time and the segment transcoding duration of the current audio are integer multiples of the ts segment duration.

[0121] This invention is for home cultivation. The determining module 201 further includes: a setting unit 209, wherein the setting unit 209 includes:

[0122] Get subunit 210 and adjust subunit 211.

[0123] in,

[0124] The acquisition subunit 210 is used to acquire the source duration and the number of segments of the current audio, and to divide the current audio into segments based on the source duration and the number of segments to obtain the initial segment transcoding duration.

[0125] The adjustment subunit 211 is used to adjust the initial segment transcoding duration based on the TS segment duration, so that the segment transcoding duration is set to an integer multiple of the TS segment duration.

[0126] In this embodiment of the invention, the processing module 203 includes:

[0127] The transcoding unit 212, the recognition unit 213, and the processing unit 214.

[0128] in,

[0129] The transcoding unit 212 is used to transcode each paragraph based on a preset transcoding standard to obtain transcoded paragraphs.

[0130] The identification unit 213 is used to identify the first paragraph, the middle paragraph and the last paragraph in each of the transcoded paragraphs;

[0131] The processing unit 214 is used to discard the tail TS segment for the first paragraph, discard the head and tail TS segments for the middle paragraph, and discard the head TS segment for the tail paragraph, thereby obtaining each target TS segment.

[0132] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0133] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0134] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0135] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0136] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0137] Memory may include non-persistent memory in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0138] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0139] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0140] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0141] The above are merely embodiments of this application and are not intended to limit the scope of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.

Claims

1. An audio segmentation transcoding method, characterized in that, include: Determine a first transcoding constraint T associated with the current audio, wherein the ts segment duration of the current audio in the first transcoding constraint T is an integer multiple of the first transcoding constraint T, and the segment start time and segment transcoding duration of the current audio are integer multiples of the ts segment duration; Under the condition that the first transcoding constraint T is satisfied, the current audio is segmented to obtain each segment, wherein each segment contains a preset number of TS segments, each segment satisfies the second transcoding constraint, the second transcoding constraint ensures that adjacent segments generate an overlapping area of ​​two TS segments, and each segment is in HLS-TS container format; Each paragraph is transcoded. For the first paragraph after transcoding, the tail TS segment is discarded. For the middle paragraphs, the head and tail TS segments are discarded. For the tail paragraph, the head TS segment is discarded, thus obtaining each target TS segment. The target TS segments are spliced ​​together in chronological order to obtain the target audio. The first transcoding constraint T associated with the current audio includes: Obtain the encoded frame length and transcoding sampling rate of the current audio; Set transcoding sample rate / 10 n is a positive integer, and the value of parameter n is determined based on the transcoding sample rate, and n is a positive integer; The value of parameter n is determined based on the transcoding sampling rate, where the transcoding sampling rate / 10 n It is a positive integer; Based on T = encoded frame length / 10 n Determine each of the candidate values ​​for the first transcoding constraint T; Choose any one of the candidate values ​​that is less than the duration of the current audio source as the first transcoding constraint T. In the first transcoding constraint T, the ts segment duration is an integer multiple of the first transcoding constraint T, and the segment start time and the segment transcoding duration of the current audio are integer multiples of the ts segment duration.

2. The method according to claim 1, characterized in that, Also includes: Setting the paragraph transcoding duration includes: Obtain the source duration and number of segments of the current audio; divide the current audio into segments based on the source duration and number of segments to obtain the initial segment transcoding duration; The initial segment transcoding duration is adjusted based on the TS segment duration so that the segment transcoding duration is set to an integer multiple of the TS segment duration.

3. The method according to claim 2, characterized in that, The segmentation of the current audio to obtain various segments includes: The start time of each paragraph is determined based on the paragraph transcoding time. The current audio is segmented based on the start time and the segment transcoding duration. Each segment is then further segmented based on the TS segmentation duration to obtain individual paragraphs.

4. The method according to claim 3, characterized in that, The process of determining the start time of each segment based on the segment transcoding duration includes: Set the paragraph start time for the first paragraph to 0; The start time of the remaining paragraphs is determined to be the end time of the previous paragraph minus twice the duration of the ts segment.

5. The method according to claim 1, characterized in that, The process involves transcoding each paragraph, discarding the trailing TS fragment from the first paragraph, discarding the header and trailing TS fragments from the middle paragraphs, and discarding the header TS fragment from the trailing paragraph, resulting in target TS fragments, including: Each paragraph is transcoded based on a preset transcoding standard to obtain transcoded paragraphs. Identify the first paragraph, middle paragraphs, and last paragraph in each of the transcoded paragraphs; For the first paragraph, discard the tail TS fragment; for the middle paragraph, discard the head and tail TS fragments; for the tail paragraph, discard the head TS fragment, thus obtaining each target TS fragment.

6. An audio segmentation transcoding device, characterized in that, include: The determining module is used to determine the first transcoding constraint T associated with the current audio, wherein the TS segment duration of the current audio in the first transcoding constraint T is an integer multiple of the first transcoding constraint T, and the segment start time and segment transcoding duration of the current audio are integer multiples of the TS segment duration; The segmentation module is used to segment the current audio into segments under the condition of satisfying the first transcoding constraint T, thereby obtaining each segment. Each segment contains a preset number of TS segments. Each segment satisfies the second transcoding constraint, which ensures that adjacent segments generate an overlapping area of ​​two TS segments. Each segment is in HLS-TS container format. The processing module is used to transcode each paragraph. For the first paragraph of each transcoded paragraph, the tail TS segment is discarded, the middle paragraphs are discarded and the head and tail TS segments are discarded, and the tail paragraph is discarded and the head TS segment is discarded, so as to obtain each target TS segment. The splicing module is used to splice the various target TS segments in chronological order to obtain the target audio. The determining module includes: The acquisition unit is used to acquire the encoded frame length and transcoding sampling rate of the current audio. The first determining unit is used to set the transcoding sampling rate / 10. n The value of parameter n is a positive integer, determined based on the transcoding sampling rate. The second determining unit is used based on T = encoded frame length / 10. n Determine each of the candidate values ​​for the first transcoding constraint T; The selection unit is used to select any one of the candidate values ​​that is less than the duration of the current audio source as the first transcoding constraint T, wherein the ts segment duration in the first transcoding constraint T is an integer multiple of the first transcoding constraint T, and the segment start time and the segment transcoding duration of the current audio are integer multiples of the ts segment duration.

7. The apparatus according to claim 6, characterized in that, The determining module further includes: a setting unit, wherein the setting unit includes: The acquisition subunit is used to acquire the source duration and the number of segments of the current audio, and to divide the current audio into segments based on the source duration and the number of segments to obtain the initial segment transcoding duration; The adjustment subunit is used to adjust the initial segment transcoding duration based on the TS segment duration, so that the segment transcoding duration is set to an integer multiple of the TS segment duration.

8. The apparatus according to claim 6, characterized in that, The processing module includes: The transcoding unit is used to transcode each paragraph based on a preset transcoding standard to obtain transcoded paragraphs. The identification unit is used to identify the first paragraph, middle paragraphs, and last paragraph in each of the transcoded paragraphs; The processing unit is configured to discard the tail TS fragment for the first paragraph, discard the head and tail TS fragments for the middle paragraph, and discard the head TS fragment for the tail paragraph, thereby obtaining each target TS fragment.