Audio playing control method and device, electronic equipment and storage medium

By layering the application into a cross-platform UI layer, a native playback layer, and a communication bridging layer, audio playback capabilities are brought to the lower level and the interface display is separated from user interaction. This solves the problems of response lag and audio-text synchronization deviation in cross-platform playback, improves user experience, and reduces development and maintenance costs.

CN122240056APending Publication Date: 2026-06-19HANGZHOU YUNYUEDU NETWORK CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HANGZHOU YUNYUEDU NETWORK CO LTD
Filing Date
2026-04-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Cross-platform audio playback control suffers from lengthy and complex communication links in multi-platform scenarios, leading to delayed playback response and audio-text synchronization deviation, which affects user experience.

Method used

The application is layered into a cross-platform UI layer, a native playback layer, and a communication bridging layer. The native playback layer is responsible for playback control, the cross-platform UI layer focuses on interface display and user interaction, and the communication bridging layer is used for data forwarding and text matching. This achieves layered decoupling of each layer and shortens the communication link.

Benefits of technology

It effectively solves the problems of response lag and audio-text asynchrony in cross-platform playback, improves user experience, meets the needs of multi-platform reuse, and reduces development and maintenance costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240056A_ABST
    Figure CN122240056A_ABST
Patent Text Reader

Abstract

This invention provides an audio playback control method, apparatus, electronic device, and storage medium. It receives an audio playback request for a target audio file through a cross-platform UI layer, and sends the audio resources of the target audio to the native playback layer via a communication bridging layer. The native playback layer obtains the audio data corresponding to the network playback address, plays the audio data, and sends playback progress data of the audio data to the communication bridging layer. The communication bridging layer determines the first target text corresponding to the playback progress data from the audio text, and sends the playback progress data and the identification information of the first target text to the cross-platform UI layer. The cross-platform UI layer controls the display of the target audio playback interface based on the playback progress data and the identification information of the first target text. This effectively solves the problems of delayed response and audio-text asynchrony in existing cross-platform playback technologies, while also meeting the needs of multi-platform reuse, ensuring the smoothness and accuracy of playback control.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of communication technology, and in particular to an audio playback control method, apparatus, electronic device, and storage medium. Background Technology

[0002] In audio-linked scenarios such as text reading aloud, audiobooks, and synchronized lyrics playback, in order to reduce the development and maintenance costs of applications on multiple platforms such as iOS and Android, the business logic of the target application, including interface rendering, user interaction response, and audio playback control, is usually deployed in a cross-platform development framework. Relying on the unified business code compilation and execution mechanism of the cross-platform framework, the UI interface and business logic can be integrated and reused across platforms, ensuring the consistency of interface interaction and functional logic across multiple platforms and reducing the workload of redundant development on both platforms.

[0003] However, the cross-platform code in the cross-platform framework cannot directly call the system audio player built into the terminal device to execute playback control commands. Therefore, after receiving operation commands generated by user interaction, it is necessary to first obtain underlying audio data such as playback timestamp and audio player running status through the native development code of the adapted platform, and then forward it to the cross-platform code to complete business logic calculations. After the cross-platform code generates the corresponding playback control command, it still needs to be sent back to the native development code for final execution. In this way, the communication link is lengthy and complex, generating additional interaction overhead. It is easy to amplify communication latency on low-computing-power terminals, causing problems such as playback response lag and audio-text synchronization offset, which affects the user experience. Summary of the Invention

[0004] In view of this, the purpose of the present invention is to provide an audio playback control method, device, electronic device and storage medium to shorten the communication link, reduce the interaction overhead between the cross-end and the native layer, reduce the audio playback control latency, and thus solve the technical problems of playback response lag and audio-text synchronization deviation, thereby improving the user experience.

[0005] In a first aspect, embodiments of the present invention provide an audio playback control method, the method comprising: receiving an audio playback request for a target audio through a cross-platform UI layer, and sending the audio resources of the target audio to a native playback layer through a communication bridging layer; wherein the audio resources include: a network playback address and audio text of the target audio; obtaining audio data corresponding to the network playback address through the native playback layer, playing the audio data, and sending playback progress data of the audio data to the communication bridging layer; determining a first target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sending the playback progress data and the identification information of the first target text to the cross-platform UI layer; and controlling the display of a playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the first target text.

[0006] Secondly, embodiments of the present invention provide an audio playback control device, comprising: a first sending module, configured to receive an audio playback request for a target audio through a cross-platform UI layer, and send the audio resources of the target audio to a native playback layer through a communication bridging layer; wherein the audio resources include: a network playback address and audio text of the target audio; a second sending module, configured to obtain audio data corresponding to the network playback address through the native playback layer, play the audio data, and send playback progress data of the audio data to the communication bridging layer; a third sending module, configured to determine a first target text corresponding to the playback progress data from the audio text through the communication bridging layer, and send the playback progress data and the identification information of the first target text to the cross-platform UI layer; and a first display module, configured to control the display of the playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the first target text.

[0007] Thirdly, embodiments of the present invention provide an electronic device, including a processor and a memory, wherein the memory stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions to implement the above-described audio playback control method.

[0008] Fourthly, embodiments of the present invention provide a storage medium storing machine-executable instructions. When the machine-executable instructions are invoked and executed by a processor, the machine-executable instructions cause the processor to implement the aforementioned audio playback control method.

[0009] The embodiments of the present invention bring the following beneficial effects: The aforementioned audio playback control method, apparatus, electronic device, and storage medium include the following steps: receiving an audio playback request for a target audio through a cross-platform UI layer, and sending the audio resources of the target audio to a native playback layer through a communication bridging layer; wherein the audio resources include: a network playback address and audio text of the target audio; obtaining audio data corresponding to the network playback address through the native playback layer, playing the audio data, and sending playback progress data of the audio data to the communication bridging layer; determining a first target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sending the playback progress data and the identification information of the first target text to the cross-platform UI layer; and controlling the display of the playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the first target text.

[0010] This approach achieves layered decoupling of functions by dividing the application into a cross-platform UI layer, a native playback layer, and a communication bridging layer. Audio playback capabilities are pushed down to the native playback layer, reducing redundant interactions with the cross-platform layer. The native playback layer handles playback control, the cross-platform UI layer focuses on interface display and user interaction, and the communication bridging layer is used for data forwarding, text matching, and cross-layer collaboration. This avoids excessive coupling between layers, shortens communication links, reduces transmission latency, and effectively solves the problems of delayed response and audio-text asynchrony in existing cross-platform playback technologies. At the same time, it meets the needs of multi-platform reuse, ensuring smooth and accurate playback control and improving user experience.

[0011] Other features and advantages of the invention will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the invention. The objects and other advantages of the invention are realized and obtained in accordance with the structures particularly pointed out in the description, claims and drawings.

[0012] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description

[0013] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0014] Figure 1 A flowchart illustrating an audio playback control method provided in an embodiment of the present invention; Figure 2 An architecture diagram of a target application provided in an embodiment of the present invention; Figure 3 A schematic diagram of the structure of an audio playback control device provided in an embodiment of the present invention; Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0015] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0016] First, let me explain the technical terms involved in this invention: AVPlayer: A native audio and video playback framework provided by the iOS (iPhone Operating System) platform.

[0017] ExoPlayer: A commonly used high-performance audio and video playback framework on the Android platform.

[0018] Cross-platform: refers to a set of code or solutions that can run on different systems such as iOS and Android.

[0019] Audiobooks: The content of the article is read aloud in audio format, and users can obtain information from the article through hearing.

[0020] Text reading aloud: During audio playback, the UI (User Interface) displays the text being read aloud in real time.

[0021] App (Application): refers to a software program that can run on smartphones, tablets, or other mobile devices.

[0022] Cross-platform development framework: refers to a software development framework that enables a single codebase to be compiled and run on multiple terminal platforms (such as iOS platform, Android platform, etc.). Through a unified interface description and logic encapsulation method, the application can maintain similar display effects and interaction logic on different operating systems.

[0023] In current audio-enabled scenarios such as text-based reading aloud, audiobooks, and synchronized lyrics playback, the following technical challenges are typically encountered when implementing these features across multiple platforms: Firstly, cross-platform functionality consistency and maintenance costs are significant issues. While developing audio players using native code for iOS and Android platforms may offer high execution efficiency, it requires repeated implementation and iterative maintenance on both platforms, easily leading to inconsistent functionality, high development costs, and cumbersome later maintenance, making it difficult to guarantee a unified user experience.

[0024] On the other hand, deploying the application's interface rendering, user interaction response, and audio playback control business logic uniformly within a cross-platform front-end development framework, and relying on the unified business code compilation and execution mechanism of the cross-platform framework, achieves integrated cross-platform reuse of UI interface and business logic, ensuring consistency of interface interaction and functional logic across multiple platforms, and reducing the workload of redundant development on both platforms. However, the data flow chain is relatively long. Especially on low-configuration terminal devices, the latency caused by communication will be further amplified, which can easily lead to problems such as delayed playback response and audio-text synchronization misalignment, affecting the user experience in audiobook scenarios.

[0025] Based on this, the audio playback control method, device, electronic device and storage medium provided by the embodiments of the present invention can be applied to audio linkage scenarios such as audiobook listening, text reading, audiobooks, and synchronized lyrics playback in mobile applications, and are especially suitable for scenarios that need to balance UI consistency across multiple platforms such as iOS and Android with smooth audio playback.

[0026] To facilitate understanding of this embodiment, a detailed description of the audio playback control method disclosed in this invention will be provided first. Here, the terminal device runs a target application, which can be a standalone player app or integrated as a functional module into other types of apps, such as an audio reading module in an educational app, an audio playback module in a live streaming app, or an audiobook module in a reading app. The target application adopts a cross-platform hybrid architecture, including a cross-platform UI layer, a communication bridging layer, and a native playback layer.

[0027] The cross-platform UI layer refers to the user interface and interaction logic layer built on a cross-platform development framework. The cross-platform code corresponding to the UI layer can be compiled and run on multiple terminal platforms, maintaining a consistent interface style and interactive experience across different operating systems. For example, the cross-platform development framework can be a framework with cross-platform compilation and rendering capabilities, such as Flutter, React Native, or UniApp. The native playback layer is built using the native code of the terminal device. Native code is written in a programming language supported by the terminal device's operating system and can directly call all system capabilities and low-level interfaces provided by the operating system, without runtime environment isolation or function call restrictions. For different operating systems, the native playback layer can be implemented using native code adapted to the corresponding platform and call the native player components provided by each platform, such as AVPlayer for iOS and ExoPlayer for Android, thereby fully leveraging the platform's native playback capabilities and improving the stability and efficiency of audio playback.

[0028] like Figure 1 As shown, the above audio playback control method includes the following steps: Step S102: Receive the audio playback request for the target audio through the cross-platform UI layer, and send the audio resources of the target audio to the native playback layer through the communication bridge layer; wherein, the audio resources include: the network playback address of the target audio and the audio text; The aforementioned audio playback request for the target audio can be generated when the user performs a playback operation on the target audio within the display interface provided by the cross-platform UI layer. For example, when the user clicks the playback control corresponding to the target audio in the display interface, the cross-platform UI layer determines that it has received an audio playback request for the target audio when it captures this user interaction event. The aforementioned network playback address refers to the resource access address corresponding to the target audio on the server.

[0029] After receiving an audio playback request for the target audio, the cross-platform UI layer determines the audio resources of the target audio. These audio resources include the network playback address and audio text of the target audio. The audio text is the text content corresponding to the target audio content. The audio text can be composed of multiple text segments, each corresponding to a different playback period of the target audio, and is used to display the audio segments synchronously according to the audio playback progress during the audio playback process.

[0030] The cross-platform UI layer first sends the audio resources of the target audio to the communication bridge layer through the communication channel provided by the cross-platform development framework, and then sends them to the native playback layer through the communication bridge layer, so that the native playback layer can load data based on the audio resources.

[0031] In this step, the cross-platform UI layer does not directly participate in the loading and playback logic of audio data; it is only responsible for initiating resource transfer and interface interaction.

[0032] Step S104: Obtain the audio data corresponding to the network playback address through the native playback layer, play the audio data, and send the playback progress data of the audio data to the communication bridging layer.

[0033] The playback progress data mentioned above may include one or more pieces of information such as the playback duration of the target audio at the current moment and the percentage of playback progress, which are used to accurately indicate the playback progress of the audio data.

[0034] During the audio data loading process, the native playback layer can perform streaming loading and local caching of the audio data, thereby enabling the download-while-playing function. Playback can be started without waiting for all audio data to be loaded, effectively shortening the audio startup latency and improving the overall efficiency of audio loading and playback.

[0035] When the native playback layer starts playing audio data, it can immediately send playback progress data to the communication bridge layer. Then, every preset time interval, the native playback layer automatically sends playback progress data to the communication bridge layer periodically, realizing the synchronization of playback progress between the native playback layer and the communication bridge layer. Subsequently, the cross-platform UI layer updates the playback progress display of the target audio playback interface through the communication bridge layer, and renders the corresponding audio text synchronously according to the playback progress.

[0036] In this step, audio playback, progress acquisition, and loading are all completed independently by the native playback layer, without relying on the cross-platform UI layer to participate in real-time logic processing, further reducing the frequency of cross-layer communication and ensuring the continuity and stability of the playback process.

[0037] Step S106: Determine the first target text corresponding to the playback progress data from the audio text through the communication bridging layer, and send the playback progress data and the identification information of the first target text to the cross-platform UI layer.

[0038] The aforementioned identification information may be the location information of the first target text in the audio text, the index information of the first target text, the start and end times of the playback of the first target text, and other information used to uniquely identify the text.

[0039] After receiving the playback progress data, the communication bridging layer determines the first target text corresponding to the playback progress data from the audio text. For example, the audio text may consist of multiple text segments, each corresponding to a different playback period of the target audio. Based on the playback period to which the playback progress data belongs, the corresponding first target text is obtained, and then the identification information of the first target text is sent to the cross-platform UI layer along with the playback progress data.

[0040] Step S108: Control the display of the target audio playback interface through the cross-platform UI layer based on the playback progress data and the identification information of the first target text.

[0041] After receiving the identifier information of the first target text and the playback progress data, the cross-platform UI layer can accurately locate and extract the corresponding first target text from the pre-stored audio text based on the identifier information. According to the received playback progress data, it determines the display state of the progress display controls within the playback interface, such as the fill ratio and corresponding position of the progress bar, allowing users to intuitively obtain the current audio playback progress.

[0042] Finally, the cross-platform UI layer controls the playback interface of the target audio. Within this playback interface, the first matched target text is highlighted or presented in a prominent style, and a progress display control is displayed simultaneously. This achieves a synchronized linkage effect between the audio playback process and the text content display, fully presenting the real-time playback status of the target audio and enhancing the user interaction experience.

[0043] Here, the cross-platform UI layer only performs interface rendering and text display based on the data sent by the communication bridging layer, and does not participate in playback logic calculations or progress scheduling. If the application needs to run on different terminals, it only needs to adapt the native playback layer to the platform. The cross-platform UI layer and the communication bridging layer can be directly reused without repeated development, thereby significantly reducing the cost of multi-platform adaptation and improving development and iteration efficiency.

[0044] This approach divides the target application into a cross-platform UI layer, a native playback layer, and a communication bridging layer. The native playback layer handles playback control, while the cross-platform UI layer focuses on interface display and user interaction. This avoids excessive coupling between the code layers and shortens the communication links, ensuring high performance and stability of audio playback while maintaining consistent interface display across iOS, Android, and other platforms, effectively reducing development and maintenance costs. This solution innovatively pushes audio playback capabilities down to the native layer and elevates interface display and user interaction to the cross-platform UI layer. Through layered decoupling, it achieves an optimal balance between playback performance and development efficiency, and reliably synchronizes audio playback and text display.

[0045] The aforementioned audio playback control method, apparatus, electronic device, and storage medium include the following steps: receiving an audio playback request for a target audio through a cross-platform UI layer, and sending the audio resources of the target audio to a native playback layer through a communication bridging layer; wherein the audio resources include: a network playback address and audio text of the target audio; obtaining audio data corresponding to the network playback address through the native playback layer, playing the audio data, and sending playback progress data of the audio data to the communication bridging layer; determining a first target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sending the playback progress data and the identification information of the first target text to the cross-platform UI layer; and controlling the display of the playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the first target text.

[0046] This approach achieves layered decoupling of functions by dividing the application into a cross-platform UI layer, a native playback layer, and a communication bridging layer. Audio playback capabilities are pushed down to the native playback layer, reducing redundant interactions with the cross-platform layer. The native playback layer handles playback control, the cross-platform UI layer focuses on interface display and user interaction, and the communication bridging layer is used for data forwarding, text matching, and cross-layer collaboration. This avoids excessive coupling between layers, shortens communication links, reduces transmission latency, and effectively solves the problems of delayed response and audio-text asynchrony in existing cross-platform playback technologies. At the same time, it meets the needs of multi-platform reuse, ensuring smooth and accurate playback control and improving user experience.

[0047] In one approach, an audio playback request for the target audio is received through a cross-platform UI layer to determine the audio resources of the target audio; the audio resources of the target audio are sent to a communication bridge layer through a communication channel provided by a cross-platform development framework; the audio resources are encapsulated through the communication bridge layer and then sent to the native playback layer.

[0048] Specifically, after receiving an audio playback request, the cross-platform UI layer can retrieve the network playback address of the target audio and the corresponding audio text from the server, and pass it to the communication bridge layer through the communication channel defined by the cross-platform development framework corresponding to the cross-platform UI layer. The communication bridge layer performs structured encapsulation of the received resource data to adapt it to the interface specification of the native playback layer, and then forwards it to the native playback layer to start the subsequent loading process.

[0049] Taking the cross-platform UI layer built by the Flutter cross-platform development framework as an example, data transmission can be achieved through the MethodChannel communication channel provided by the Flutter framework. The cross-platform UI layer sends audio resources to the communication bridge layer through the MethodChannel. After the communication bridge layer completes data parsing and structured encapsulation, it forwards the data to the native playback layer to perform audio loading and playback operations.

[0050] In one approach, the audio resource is determined as follows: Obtain the network playback address and initial audio text of the target audio. Then, segment the initial audio text through the cross-platform UI layer to obtain audio text containing multiple text segments. Each text segment has a corresponding playback start and end time, which indicates the playback period of the audio corresponding to the text segment in the target audio.

[0051] The aforementioned playback start and end times include the playback start time and playback stop time, used to uniquely identify the playback time interval corresponding to the text segment in the target audio. That is, the content of the text segment corresponds to the audio content of the target audio during the playback period from the playback start time to the playback stop time.

[0052] In one implementation, after the cross-platform UI layer retrieves the network playback address of the target audio and the corresponding initial audio text from the server, it splits the initial audio text into multiple independent text segments according to the audio playback sequence, and binds a corresponding playback start and end time to each text segment. The playback start and end time corresponds to the playback period of the audio corresponding to the text segment in the target audio, so that each text segment corresponds precisely to the playback progress of the target audio, providing a data foundation for subsequent audio-text synchronization display.

[0053] The following embodiments provide a specific implementation method for playing audio data in the native playback layer.

[0054] In one approach, the network playback address is converted into a local cache proxy address through the native playback layer; based on the local cache proxy address, the audio data corresponding to the network playback address is streamed and cached; in response to the amount of cached audio data reaching a preset threshold, the audio data is played through the native playback layer.

[0055] The aforementioned local cache proxy address refers to a local access address created by the native playback layer on the terminal device to proxy network audio resources. The native playback layer receives audio playback requests through this local proxy address and completes the network retrieval, streaming caching, and local distribution of audio data in the background.

[0056] In other words, the native playback layer proxies and controls the audio data loading process by converting the network playback address into a local cache proxy address. While streaming and caching audio data based on the local cache proxy address, it does not need to wait for all audio data to be loaded. As long as the amount of audio data cached for the target audio reaches a preset threshold, the corresponding native audio player in the native playback layer can decode and play the audio data, thereby achieving a smooth playback effect of caching and playing simultaneously, effectively improving the response speed and stability of audio playback.

[0057] In one approach, the native playback layer sends playback progress data to the communication bridging layer by sending the data at preset intervals.

[0058] Specifically, the native playback layer can periodically send playback progress data to the communication bridge layer at preset intervals. In actual implementation, the native playback layer can start a progress timer with a 100ms interval, periodically obtain the playback progress data of the target audio through the timer, and continuously send the playback progress data to the communication bridge layer to ensure that the cross-platform UI layer can receive high-frequency and high-precision playback progress feedback, thus ensuring the smoothness of audio-text synchronization and interface refresh.

[0059] In one approach, the playback progress data includes: the first playback duration of the target audio; determining the first target text corresponding to the first playback duration from the audio text through a communication bridging layer, and sending the playback duration of the target audio and the identification information of the first target text to the cross-platform UI layer.

[0060] The first playback duration mentioned above refers to the current playback duration of the target audio.

[0061] In other words, the playback progress data includes the current playback duration of the target audio; the communication bridging layer matches the first target text corresponding to the playback duration from the audio text based on the playback duration, and sends the playback duration of the target audio and the identification information of the first target text together to the cross-platform UI layer.

[0062] The following embodiments provide a specific implementation for determining the first target text.

[0063] Specifically, the audio text contains multiple text segments; each text segment has a corresponding playback start and end time, which indicates the playback period of the audio corresponding to the text segment in the target audio; the text segments are sorted according to the playback start time in the playback start and end times; based on the sorted multiple text segments, a preset search method is used to determine the first text segment corresponding to the latest playback start time no later than the first playback duration; the first text segment is determined as the first target text.

[0064] In one example, the audio text comprises multiple text segments, each with a corresponding playback start and end time. The playback start and end times uniquely identify the playback time interval of the text segment within the target audio. In other words, the content of the text segment corresponds to the audio content of the target audio during the playback period from the start time to the stop time.

[0065] The text segments are sorted from earliest to latest according to their playback start time. A binary search is then used to locate the first text segment in the sorted text segments whose playback start time is no later than the latest playback duration. This first text segment is then identified as the first target text. This method allows for the rapid and accurate identification of the first target text that matches the current playback progress, ensuring real-time synchronization between audio playback and text display.

[0066] In one scenario, if the start time of each text segment is greater than the first playback duration, it is determined that there is no matching text segment, and the cross-platform UI layer does not display the corresponding text content or displays a default prompt message.

[0067] In this method, the timestamp matching based on binary search can locate the first target text with fewer retrievals, effectively improving synchronization efficiency and ensuring the fluency and accuracy of text reading.

[0068] In one approach, the display state of a specified control in the playback interface is determined by a cross-platform UI layer based on playback progress data; the first target text is determined based on the identification information, and the first target text is displayed in the playback interface in a specified display format.

[0069] The specified controls mentioned above include a progress display control. The specified display format can be to highlight, enlarge, bold, color-change, or dynamically fade in the currently matched first target text, while displaying the remaining unmatched text segments in a normal format. This allows for a clear distinction of the text content corresponding to the current playback progress in the playback interface, achieving a synchronized display effect between audio and text.

[0070] After receiving the identification information of the first target text and the playback progress data, the cross-platform UI layer can locate and obtain the corresponding first target text from the audio text based on the identification information. The first target text is then highlighted or emphasized in the playback interface according to the specified display format. The remaining unmatched text segments in the audio text are displayed in a regular format to achieve a synchronized effect between audio playback and text display.

[0071] At the same time, the display status of the progress display control in the playback interface is updated according to the playback progress data. For example, the fill position of the progress bar is adjusted, and the display content of the current played time value and the total duration value of the target audio is refreshed, so that users can intuitively view the audio playback progress and overall duration information, thus fully presenting the real-time playback status of the target audio.

[0072] In one approach, when the native playback layer starts playing audio data, it sends an audio playback prompt to the cross-platform UI layer through the native playback layer control communication bridge layer, so that the cross-platform UI layer can determine the display state of the playback control controls in the playback interface of the target audio based on the audio playback prompt.

[0073] In other words, when the native playback layer starts audio playback, it sends audio playback prompts simultaneously while sending playback progress data. This information is then sent to the cross-platform UI layer via the communication bridge layer, enabling the cross-platform UI layer to determine the display status of the playback control controls on the playback interface based on the audio playback prompts, thus ensuring that the interface display is consistent with the actual playback behavior.

[0074] In a specific implementation, such as Figure 2 As shown, the cross-platform UI layer is built on the Flutter cross-platform development framework and includes: playback control UI, progress bar component, and text follow-up / highlighting component; the communication bridge layer includes: playback status synchronizer, progress time broadcaster, and lyrics matching engine; the native playback layer includes: audio playback engine, buffer manager, and system integration module.

[0075] The cross-platform UI layer and the communication bridge layer communicate through the MethodChannel provided by the Flutter cross-platform development framework. The communication bridge layer is used to implement command and data interaction between the cross-platform UI layer and the native playback layer.

[0076] The cross-platform UI layer is responsible for providing a unified cross-platform user interface, enabling functions such as displaying playback control controls, playback progress display, text highlighting, automatic scrolling, click-to-jump, speed switching, timed shutdown, and sound switching. This cross-platform UI layer is only responsible for interface rendering and user interaction, and does not participate in the underlying logic such as audio loading, playback control, and progress calculation. It achieves data synchronization through interaction with the communication bridging layer, realizing the layered decoupling of interface logic and playback capabilities.

[0077] The communication bridging layer is primarily responsible for enabling efficient bidirectional communication between the native playback layer and the cross-platform UI layer, completing command forwarding, state synchronization, and data collaboration. In terms of the communication protocol, it receives playback status, playback progress data, buffering status, and error messages from the native layer, and receives control commands from the cross-platform UI layer, such as play, pause, stop, progress jump, playback rate setting, and timed stop, thus achieving reliable interaction and collaborative scheduling between the two layers.

[0078] The core responsibility of the native playback layer is to provide high-performance underlying audio playback capabilities and uniformly handle all system-related audio operations. Its core functions include playback state machine management, progress control, variable speed playback, audio session configuration, simultaneous download and playback, system media control center adaptation, audio focus management, and audio device routing switching. Each function is handled by a corresponding module and does not depend on the cross-platform UI layer, ensuring the stability and efficiency of the playback process.

[0079] In one embodiment, based on Figure 2 The layered architecture shown illustrates the process of controlling the playback of the target audio: When a user clicks the play button for the target audio in the display interface provided by the cross-platform UI layer, the playback control UI of the cross-platform UI layer captures the user's click event, determines that the cross-platform UI layer has received an audio playback request for the target audio, and retrieves the audio resources of the target audio from the server. The audio resources include: the network playback address of the target audio and the corresponding audio text.

[0080] The cross-platform UI layer encapsulates audio resources into initialization instructions through the playback state synchronizer of the communication bridge layer, and sends them to the audio playback engine of the native playback layer to start the playback process.

[0081] After receiving the playback command, the audio playback engine in the native playback layer converts the network playback address into a local cache proxy address through the buffer manager to achieve streaming loading of the audio data corresponding to the target audio. The playback state machine in the audio playback engine switches from an idle state to a buffered state to complete data preloading without needing to download the entire audio.

[0082] When the amount of audio data cached for the target audio reaches a preset threshold, the native playback layer calls the corresponding platform's native player (such as AVPlayer on iOS or ExoPlayer on Android) to start decoding and playing the audio data, and the playback state machine switches from the buffer state to the playback state.

[0083] When the native playback layer begins playing audio data, it can immediately send playback progress data to the communication bridging layer. Then, every preset interval, such as 100ms, the native playback layer automatically and periodically sends playback progress data to the communication bridging layer, achieving playback progress synchronization between the native playback layer and the communication bridging layer. The playback progress data includes one or more pieces of information such as the current playback duration of the target audio and the percentage of playback progress, used to accurately indicate the playback progress of the audio data.

[0084] After receiving the playback progress data, the lyrics matching engine of the communication bridging layer determines the first target text corresponding to the playback progress data from the audio text.

[0085] Subsequently, the communication bridging layer sends the identification information of the first target text along with the playback progress data to the cross-platform UI layer. The identification information may include: the position information of the first target text in the audio text, the index information of the first target text, the start and end times of the playback of the first target text, etc.

[0086] After receiving the playback progress data and the identifier information of the first target text, the cross-platform UI layer controls the playback interface of the target audio. The text follow-up / highlighting component determines the first target text based on the received identifier information, sets the corresponding text to the highlight style, and automatically scrolls the first target text to the specified position on the playback interface. At the same time, the progress bar component updates the progress display according to the playback progress data, realizing a synchronized audio-visual text follow-up effect.

[0087] During continuous playback, the lyrics matching engine performs incremental matching every 100ms: lyrics highlighting is only triggered when the identifier information changes, in order to reduce unnecessary interface redraws.

[0088] In addition, the system integration module of the native playback layer updates the audio information of the lock screen and notification bar at regular intervals through the system media control center, including the target audio name, album art, current playback duration and total duration, so that the system-level interface and the playback status within the application are consistent.

[0089] This approach, by layering the target application into a cross-platform UI layer, a native playback layer, and a communication bridging layer, ensures high performance and stability of audio playback while maintaining consistent interface display across multiple platforms such as iOS and Android, effectively reducing development and maintenance costs. This solution innovatively pushes audio playback capabilities down to the native layer and elevates interface display and user interaction to the cross-platform UI layer. Through layered decoupling, it achieves an optimal balance between playback performance and development efficiency, and reliably synchronizes audio playback and text display.

[0090] The following examples illustrate the variable speed playback control process for target audio.

[0091] In one approach, after controlling the display of the target audio playback interface using playback progress data and the identification information of the first target text, a variable-speed playback request for the target audio is received through the cross-platform UI layer, and the playback rate contained in the variable-speed playback request is determined. The playback rate is then sent to the native playback layer through the communication bridging layer. The audio data is played according to the playback rate through the native playback layer. Playback progress data of the audio data is sent to the communication bridging layer every preset time interval. The second target text corresponding to the playback progress data is determined from the audio text through the communication bridging layer, and the identification information of the playback progress data and the second target text is sent to the cross-platform UI layer. The playback interface of the target audio is updated through the cross-platform UI layer based on the playback progress data and the identification information of the second target text.

[0092] In other words, if a user initiates a speed adjustment operation while the audio is playing normally and the text is displayed synchronously, the cross-platform UI layer will pass the set playback rate to the native playback layer via the communication bridge layer. The native playback layer will adjust the playback speed according to this rate and still report the playback progress data in real time according to the preset period. The communication bridge layer will match the corresponding second target text according to the updated playback progress data, and then send the playback progress data and the identification information of the text information back to the cross-platform UI layer to refresh the playback interface of the target audio, so as to maintain the synchronization of audio playback and text display in the speed adjustment scenario.

[0093] In one specific embodiment, based on Figure 2 The layered architecture shown illustrates the variable speed playback control process for the target audio: When a user selects 1.5x playback speed in the playback interface of the target audio, the cross-platform UI layer receives the speed adjustment playback request for the target audio. The cross-platform UI layer encapsulates the playback speed of 1.5x into a speed adjustment instruction of type setRate through the playback status synchronizer of the communication bridge layer and sends it to the native playback layer, so as to convert the user's speed adjustment operation into a structured control instruction that conforms to the interface specification.

[0094] After receiving the setRate speed change command, the native playback layer verifies that the playback rate of 1.5 is within the preset valid range [0.5, 2.0]. Then, it calls the native player interface of the terminal device to set the audio playback engine and adjust the playback rate to 1.5. On the iOS side, the speed change is achieved by setting the rate attribute of AVPlayer, and on the Android side, the speed change is achieved by setting the PlaybackParameters of ExoPlayer. The new playback rate takes effect in real time without interrupting playback or reloading audio, and the current rate state is persistently stored so that the playback rate remains consistent when switching audio.

[0095] After the speed change takes effect, the native playback layer still reports playback progress data to the communication bridging layer at preset intervals of 100ms using a timer. Based on the playback progress data, the communication bridging layer matches the corresponding text segments using a binary search method to ensure that changes in playback speed do not affect the accuracy of text matching.

[0096] After receiving playback progress data, the cross-platform UI layer synchronously refreshes the playback interface of the target audio based on the playback progress data, ensuring that the text reading rhythm is synchronized with the audio playback after speed adjustment. In addition, the native playback layer also synchronizes the current playback rate to the system media control center, updating the playback rate and progress display in the lock screen and notification bar, so that the system-level interface and the in-app playback status are consistent.

[0097] The following examples illustrate the control flow for adjusting the playback progress of the target audio.

[0098] In one approach, after controlling the display of the target audio playback interface using playback progress data and the identifier information of the first target text, in response to the cross-platform UI layer receiving a progress jump instruction, the second playback duration to which the progress jump instruction jumps is determined; the second playback duration is sent to the native playback layer via the communication bridging layer; the native playback layer controls the target audio to play from the second playback duration; the communication bridging layer determines the third target text corresponding to the second playback duration from the audio text; the identifier information of the second playback duration and the third target text is sent to the cross-platform UI layer; and the cross-platform UI layer updates the target audio playback interface based on the identifier information of the second playback duration and the third target text.

[0099] The aforementioned progress jump instruction can be generated by the user performing actions such as dragging or clicking on the progress display control in the playback interface provided by the cross-platform UI layer. For example, when dragging the playback progress bar to the target position in the playback interface of the target audio, the cross-platform UI layer determines that it has received a progress jump instruction when it captures this user interaction event.

[0100] In one specific embodiment, based on Figure 2 The layered architecture shown illustrates the process of controlling the playback of the target audio: When a user drags the playback progress bar to the target position in the target audio playback interface, the cross-platform UI layer receives the progress jump instruction, determines the second playback duration indicated by the target position, encapsulates it, and sends it to the native playback layer through the communication bridge layer. Specifically, the cross-platform UI layer can send the encapsulated content to the native playback layer through the playback status synchronizer. The purpose is to pass the progress jump instruction across layers to the native playback layer that actually controls the playback progress.

[0101] After receiving the second playback duration, the audio playback engine in the native playback layer calls the corresponding platform's native player interface to perform a jump operation. On iOS, this is done through the `seek(to:)` method of `AVPlayer`, and on Android, it is done through the `seekTo()` method of `ExoPlayer`. During the jump, the audio playback engine in the native playback layer can briefly enter a buffer state to load the audio data segment corresponding to the second playback duration. The local cache proxy module in the native playback layer checks whether the audio data segment corresponding to the second playback duration has been cached. If it has been cached, it is read directly; otherwise, it requests the corresponding audio data segment from the server.

[0102] After the playback duration jumps to the second playback duration, the audio playback engine in the native playback layer resumes the audio playback state and controls the playback of audio data from the second playback duration.

[0103] After the playback duration jumps to the second playback duration, the communication bridging layer immediately performs a binary search matching based on the second playback duration to locate the third target text corresponding to the second playback duration. The playback start time of the third target text is no later than the latest playback start time of the second playback duration. The communication bridging layer verifies that the second playback duration falls within the playback start and end time of the corresponding third target text, thus determining the identification information of the third target text. Then, it sends the second playback duration and the identification information of the third target text to the cross-platform UI layer.

[0104] The cross-platform UI layer refreshes the playback interface of the target audio based on the identification information of the second playback duration and the third target text. Specifically, it updates the position of the progress bar in the playback interface and displays the playback duration, sets the third target text to a highlighted style, and displays the remaining unmatched text segments in the audio text in a normal format while automatically scrolling to the corresponding text position, so that the playback interface display is completely consistent with the playback state of the native playback layer after the jump.

[0105] Furthermore, after the cross-platform UI layer updates the playback interface of the target audio based on the second playback duration and the identification information of the third target text, playback progress data is sent to the communication bridging layer through the native playback layer every preset duration; the communication bridging layer determines the fourth target text corresponding to the playback progress data from the audio text, and sends the playback progress data and the identification information of the fourth target text to the cross-platform UI layer; the cross-platform UI layer updates the playback interface of the target audio based on the playback progress data and the identification information of the fourth target text.

[0106] After updating the playback interface of the target audio using the second playback duration and the identifier information of the third target text, the native playback layer sends playback progress data to the communication bridging layer every preset interval. The lyrics matching engine in the communication bridging layer continues to determine the corresponding fourth target text using the playback progress data and sends the playback progress data and the identifier information of the fourth target text to the cross-platform UI layer. The cross-platform UI layer synchronously updates the progress bar display status and highlights the currently playing text to ensure accurate synchronization between audio and text.

[0107] The following examples illustrate the playback control process in scenarios involving audio output device switching and interruption.

[0108] Specifically, after controlling the display of the target audio playback interface based on playback progress data and the identification information of the first target text, in response to the native playback layer receiving a target event, the playback of audio data is stopped, and the third playback duration of the target audio is determined; wherein, the target event includes: an audio output device disconnection event, and / or an audio focus loss event; the native playback layer sends a play / pause prompt and playback progress data to the communication bridging layer; the communication bridging layer determines the fifth target text corresponding to the playback progress data from the audio text, and sends the playback progress data, play / pause prompt, and identification information of the fifth target text to the cross-platform UI layer; the cross-platform UI layer updates the display state of the specified controls in the target audio playback interface based on the play / pause prompt and playback progress data, and keeps the fifth target text displayed according to the specified display format.

[0109] The third playback duration mentioned above refers to the playback duration of the target audio at the moment the native playback layer receives the target event. The aforementioned audio output device disconnection event can be a physical disconnection or connection failure of the audio output device. For example, if the audio output device is headphones, the audio output device disconnection event could be a Bluetooth headphone disconnection or a wired headphone being unplugged.

[0110] The aforementioned audio focus loss event refers to a situation where the terminal device's operating system allocates audio focus to another application process, and the audio focus does not belong to the target application. As a result, the target application loses control of audio playback. For example, an incoming call, alarm clock, voice call, or other application initiates audio playback, triggering audio focus preemption.

[0111] Specifically, when the terminal device's operating system detects that the wired headphones have been unplugged or the target application has lost control of audio playback, it sends a corresponding notification to the native playback layer so that the native playback layer can recognize the audio output path disconnection event or the audio focus loss event. Subsequently, the native playback layer stops playing audio data, determines the current playback duration of the target audio as the third playback duration, and sends a play / pause prompt and playback progress data to the communication bridging layer. After the communication bridging layer determines the fifth target text corresponding to the playback progress data, it sends a play / pause prompt, identification information, and playback progress data to the cross-platform UI layer. Upon receiving the play / pause prompt and the fifth target text corresponding to the playback progress data, the cross-platform UI layer updates the display state of the playback control controls in the target audio playback interface. For example, it switches the play button to the pause state, keeps the display position of the progress bar unchanged, and keeps the fifth target text corresponding to the playback progress data continuously displayed according to the specified display format.

[0112] Furthermore, in response to receiving a request to resume playback of the target audio, the native playback layer controls the target audio to play from the third playback duration; the playback progress data of the audio data is sent to the communication bridging layer; the communication bridging layer determines the sixth target text corresponding to the playback progress data from the audio text, and sends the playback progress data and the identification information of the sixth target text to the cross-platform UI layer; the cross-platform UI layer updates the playback interface of the target audio based on the playback progress data and the identification information of the sixth target text.

[0113] For example, in one scenario, after the terminal device's operating system recognizes that the audio output path has been reconnected or the audio focus has been reassigned to the target application, it sends a corresponding notification to the native playback layer. Based on this, the native playback layer recognizes and receives a request to resume playback of the target audio.

[0114] In another scenario, when a user clicks on the playback control controls in the playback interface displayed by the cross-platform UI layer, switching it from paused to play, the cross-platform UI layer sends a corresponding resume playback notification to the native playback layer through the communication bridge layer. Upon receiving this notification, the native playback layer confirms that it has received the resume playback request for the target audio.

[0115] The native playback layer controls the target audio to play from the third playback duration; sends playback progress data of the audio data to the communication bridging layer; determines the sixth target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sends the playback progress data and the identification information of the sixth target text to the cross-platform UI layer.

[0116] Finally, the playback interface of the target audio is updated through the cross-platform UI layer based on the playback progress data and the identification information of the sixth target text.

[0117] In one embodiment, based on Figure 2 The layered architecture shown illustrates the process of controlling the playback of the target audio. Taking wired headphones as an example, when the terminal device's operating system detects that the wired headphones have been unplugged, it sends an audio output device disconnection notification to the native playback layer. For instance, on iOS, this is sent via `AVAudioSession.routeChangeNotification`, and on Android, it's sent via the `AudioManager.ACTION_AUDIO_BECOMING_NOISY` broadcast. The native playback layer identifies the event type as an audio output device disconnection event.

[0118] Upon detecting an audio output device disconnection event, the native playback layer immediately invokes the audio playback engine to perform a pause operation. The playback state machine within the audio playback engine transitions from the playback state to the paused state. The audio output device is automatically switched back to the device's speaker; however, since playback has been paused by the native playback layer, no external sound is generated to prevent audio from being played through the speaker after headphones are unplugged, thus protecting user privacy.

[0119] Then, the native playback layer sends a play / pause prompt and playback progress data to the communication bridging layer. The playback progress data records the playback duration of the target audio at the current moment. After the communication bridging layer determines the fifth target text corresponding to the playback progress data from the audio text, it sends the identifier information of the fifth target text and the playback progress data to the cross-platform UI layer, synchronizing the pause state caused by unplugging the headphones to the cross-platform UI layer in real time.

[0120] Then, the cross-platform UI layer switches the playback control to a pause icon in the target audio playback interface, while the text follow-up / highlighting component remains unchanged, and the fifth target text is still highlighted. The goal is to ensure that the playback interface provided by the cross-platform UI layer accurately reflects the current pause state and maintains the corresponding highlighted text so that the user can seamlessly resume playback.

[0121] When the user selects to click the playback control in the playback interface again, the cross-platform UI layer receives a request to resume playback for the target audio. The resume playback request is sent to the native playback layer through the communication bridge layer. The native playback layer resumes playback from the third playback duration and sends playback progress data at preset time intervals. This allows the cross-platform UI layer to update the playback progress and corresponding text display content through the communication bridge layer, achieving continuous synchronous display of audio and text after playback is resumed.

[0122] The target application also supports switching between multiple audio files while maintaining the continuity of audio playback during the switching process.

[0123] For example, during the playback of the target audio, after the cross-platform UI layer receives the user's operation to play the first audio, it sends a playback stop prompt and the audio resource of the first audio to the native playback layer via the communication bridge layer. After receiving the playback stop prompt and the audio resource, the native playback layer stops playing the target audio and re-initializes the playback task, obtains the audio data corresponding to the network playback address of the first audio, and when enough audio data is buffered, the audio playback engine in the native playback layer enters the playback state, controls the playback of the first audio, and periodically sends playback progress data to the communication bridge layer through a preset 100ms timer in the native playback layer.

[0124] After receiving the playback progress data, the communication bridging layer determines the target text from the audio text of the first audio file and immediately sends the target text's identification information and playback progress data to the cross-platform UI layer.

[0125] When the cross-platform UI layer receives an operation to play the first audio, it clears the audio text follow-up, highlight component and locally cached audio text currently displayed on the playback interface, loads the audio text of the first audio, and redraws the audio text follow-up and highlight component.

[0126] Finally, after receiving the target text's identifier information and playback progress data, the cross-platform UI layer highlights the audio text corresponding to the playback progress data, determines the display state of the playback progress control, and ensures that the audio and text are synchronized after switching.

[0127] For the corresponding method embodiments described above, see [link to relevant documentation]. Figure 3 The diagram shows an audio playback control device, which includes: The first sending module 302 is used to receive an audio playback request for the target audio through the cross-platform UI layer, and send the audio resources of the target audio to the native playback layer through the communication bridge layer; wherein, the audio resources include: the network playback address of the target audio and the audio text; The second sending module 304 is used to obtain the audio data corresponding to the network playback address through the native playback layer, play the audio data, and send the playback progress data of the audio data to the communication bridging layer. The third sending module 306 is used to determine the first target text corresponding to the playback progress data from the audio text through the communication bridging layer, and send the playback progress data and the identification information of the first target text to the cross-end UI layer. The first display module 308 is used to control the display of the target audio playback interface through the cross-platform UI layer based on playback progress data and the identification information of the first target text.

[0128] This approach achieves layered decoupling of functions by dividing the application into a cross-platform UI layer, a native playback layer, and a communication bridging layer. Audio playback capabilities are pushed down to the native playback layer, reducing redundant data interaction in the cross-platform layer. This allows the native playback layer to focus on playback control, the cross-platform UI layer to focus on interface display and user interaction, and the communication bridging layer to focus on data forwarding, text matching, and cross-layer collaboration. This avoids excessive coupling between the code layers, shortens the communication link, and reduces transmission latency. It effectively solves the problems of delayed response and audio-text asynchrony in existing cross-platform playback technologies, while also meeting the needs of multi-platform reuse and ensuring the smoothness and accuracy of playback control.

[0129] The first sending module is used to receive audio playback requests for target audio through the cross-platform UI layer, determine the audio resources of the target audio, send the audio resources of the target audio to the communication bridge layer through the communication channel provided by the cross-platform development framework, encapsulate the audio resources through the communication bridge layer, and send the encapsulated audio resources to the native playback layer.

[0130] The first sending module is used to obtain the network playback address and initial audio text of the target audio. The initial audio text is segmented through the cross-platform UI layer to obtain audio text containing multiple text segments. Each text segment has a corresponding playback start and end time, which indicates the playback period of the audio corresponding to the text segment in the target audio.

[0131] The second sending module is used to convert the network playback address into a local cache proxy address through the native playback layer; based on the local cache proxy address, it performs streaming loading and caching processing on the audio data corresponding to the network playback address; and in response to the amount of cached audio data reaching a preset threshold, it plays the audio data through the native playback layer.

[0132] The second sending module is used to send playback progress data to the communication bridge layer through the native playback layer at preset intervals.

[0133] The aforementioned playback progress data includes: the first playback duration of the target audio; and a third sending module, which is used to determine the first target text corresponding to the first playback duration from the audio text through the communication bridging layer, and send the playback duration of the target audio and the identification information of the first target text to the cross-platform UI layer.

[0134] The aforementioned audio text contains multiple text segments; each text segment has a corresponding playback start and end time, which indicates the playback period of the audio corresponding to the text segment in the target audio; the third sending module is used to sort the text segments according to the playback start time in the playback start and end times; based on the sorted multiple text segments, a preset search method is used to determine the first text segment corresponding to the latest playback start time no later than the first playback duration; the first text segment is determined as the first target text.

[0135] The first display module is used to determine the display state of a specified control in the playback interface based on playback progress data through the cross-platform UI layer; determine the first target text based on the identification information; and display the first target text in the playback interface in a specified display format.

[0136] The aforementioned device also includes a first prompt module, used to send an audio playback prompt to the cross-platform UI layer through the native playback layer control communication bridge layer, so that the cross-platform UI layer can determine the display state of the playback control controls in the playback interface of the target audio based on the audio playback prompt.

[0137] The aforementioned device further includes a first update module, configured to receive a variable-speed playback request for the target audio through a cross-platform UI layer, determine the playback rate contained in the variable-speed playback request; send the playback rate to the native playback layer through a communication bridging layer; play the audio data according to the playback rate through the native playback layer; send playback progress data of the audio data to the communication bridging layer at preset intervals; determine the second target text corresponding to the playback progress data from the audio text through the communication bridging layer, and send the identification information of the playback progress data and the second target text to the cross-platform UI layer; and update the playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the second target text.

[0138] The aforementioned device further includes a second update module, configured to, in response to a progress jump instruction received by the cross-platform UI layer, determine the second playback duration to which the progress jump instruction jumps; send the second playback duration to the native playback layer via a communication bridging layer; control the target audio to play from the second playback duration via the native playback layer; determine the third target text corresponding to the second playback duration from the audio text via the communication bridging layer; send the identification information of the second playback duration and the third target text to the cross-platform UI layer; and update the playback interface of the target audio via the cross-platform UI layer based on the identification information of the second playback duration and the third target text.

[0139] The second update module is used to send playback progress data to the communication bridging layer through the native playback layer at preset intervals; determine the fourth target text corresponding to the playback progress data from the audio text through the communication bridging layer, and send the playback progress data and the identification information of the fourth target text to the cross-platform UI layer; and update the playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the fourth target text.

[0140] The aforementioned device further includes a third update module, used to respond to the native playback layer receiving a target event, stop playing audio data, and determine the third playback duration of the target audio; wherein the target event includes: a disconnection event of the audio output device, and / or, an audio focus loss event; sending a play / pause prompt and playback progress data to the communication bridging layer through the native playback layer; determining the fifth target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sending the playback progress data, the play / pause prompt, and the identification information of the fifth target text to the cross-platform UI layer; updating the display state of the specified control in the playback interface of the target audio through the cross-platform UI layer based on the play / pause prompt and playback progress data, and maintaining the fifth target text displayed according to the specified display format.

[0141] The third update module is used to respond to a received request to resume playback of the target audio by controlling the target audio to play from the third playback duration through the native playback layer; sending playback progress data of the audio data to the communication bridging layer; determining the sixth target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sending the playback progress data and the identification information of the sixth target text to the cross-platform UI layer; and updating the playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the sixth target text.

[0142] This embodiment also provides an electronic device, including a processor and a memory. The memory stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions to implement the above-described audio playback control method.

[0143] See Figure 4 As shown, the electronic device includes a processor 100 and a memory 101. The memory 101 stores machine-executable instructions that can be executed by the processor 100. The processor 100 executes the machine-executable instructions to implement the synchronization method of physical information in the game.

[0144] Furthermore, Figure 4 The electronic device shown also includes a bus 102 and a communication interface 103, with the processor 100, the communication interface 103 and the memory 101 connected via the bus 102.

[0145] The memory 101 may include high-speed random access memory (RAM) and may also include non-volatile memory, such as at least one disk storage device. Communication between this system network element and at least one other network element is achieved through at least one communication interface 103 (which can be wired or wireless), such as the Internet, wide area network, local area network, metropolitan area network, etc. The bus 102 may be an ISA bus, PCI bus, or EISA bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 4 The diagram uses only a single double-headed arrow, but this does not imply a single bus or a single type of bus. Processor 100 may be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of processor 100 or by instructions in software form. Processor 100 can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this invention. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this invention can be directly manifested as execution by a hardware decoding processor, or execution by a combination of hardware and software modules in the decoding processor. The software module can reside in a readily available storage medium in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory 101, and the processor 100 reads the information from memory 101 and, in conjunction with its hardware, completes the steps of the method described in the foregoing embodiments.

[0146] The processor in the aforementioned electronic device, by executing machine-executable instructions, can implement the following operations of the aforementioned audio playback control method: receiving an audio playback request for a target audio through a cross-platform UI layer, and sending the audio resources of the target audio to the native playback layer through a communication bridging layer; wherein, the audio resources include: the network playback address and audio text of the target audio; obtaining the audio data corresponding to the network playback address through the native playback layer, playing the audio data, and sending playback progress data of the audio data to the communication bridging layer; determining the first target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sending the playback progress data and the identification information of the first target text to the cross-platform UI layer; and controlling the display of the playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the first target text.

[0147] This approach achieves layered decoupling of functions by dividing the application into a cross-platform UI layer, a native playback layer, and a communication bridging layer. Audio playback capabilities are pushed down to the native playback layer, reducing redundant data interaction in the cross-platform layer. This allows the native playback layer to focus on playback control, the cross-platform UI layer to focus on interface display and user interaction, and the communication bridging layer to focus on data forwarding, text matching, and cross-layer collaboration. This avoids excessive coupling between the code layers, shortens the communication link, and reduces transmission latency. It effectively solves the problems of delayed response and audio-text asynchrony in existing cross-platform playback technologies, while also meeting the needs of multi-platform reuse and ensuring the smoothness and accuracy of playback control.

[0148] The processor in the aforementioned electronic device can execute machine-executable instructions to implement the following operations of the aforementioned audio playback control method: receiving an audio playback request for a target audio through the cross-platform UI layer and determining the audio resources of the target audio; sending the audio resources of the target audio to the communication bridge layer through the communication channel provided by the cross-platform development framework; encapsulating the audio resources through the communication bridge layer and sending the encapsulated audio resources to the native playback layer.

[0149] The processor in the aforementioned electronic device can execute the following operations of the aforementioned audio playback control method by executing machine-executable instructions: obtaining the network playback address and initial audio text of the target audio, and segmenting the initial audio text through a cross-platform UI layer to obtain audio text containing multiple text segments; wherein, each text segment has a corresponding playback start and end time, and the playback start and end time is used to indicate the playback period of the audio corresponding to the text segment in the target audio.

[0150] The processor in the aforementioned electronic device can execute machine-executable instructions to perform the following operations of the audio playback control method: converting the network playback address into a local cache proxy address through the native playback layer; performing streaming loading and caching processing on the audio data corresponding to the network playback address based on the local cache proxy address; and playing the audio data through the native playback layer in response to the cached data amount reaching a preset threshold.

[0151] The processor in the aforementioned electronic device can execute machine-executable instructions to implement the following operation of the audio playback control method: every preset time interval, it sends playback progress data to the communication bridging layer through the native playback layer.

[0152] The aforementioned playback progress data includes: the first playback duration of the target audio; the processor in the aforementioned electronic device, by executing machine-executable instructions, can implement the following operation of the aforementioned audio playback control method: determining the first target text corresponding to the first playback duration from the audio text through the communication bridging layer, and sending the playback duration of the target audio and the identification information of the first target text to the cross-platform UI layer.

[0153] The aforementioned audio text comprises multiple text segments; each text segment has a corresponding playback start and end time, which indicates the playback period of the audio corresponding to the text segment within the target audio; the processor in the aforementioned electronic device, by executing machine-executable instructions, can implement the following operations of the aforementioned audio playback control method: sorting the text segments according to the playback start time in the playback start and end times; based on the sorted multiple text segments, determining the first text segment corresponding to the latest playback start time no later than the first playback duration using a preset search method; and determining the first text segment as the first target text.

[0154] The processor in the aforementioned electronic device can perform the following operations of the aforementioned audio playback control method by executing machine-executable instructions: determining the display state of a specified control in the playback interface based on playback progress data through a cross-platform UI layer; determining a first target text based on identification information; and displaying the first target text in the playback interface in a specified display format.

[0155] The processor in the aforementioned electronic device can execute machine-executable instructions to implement the following operations of the audio playback control method: sending an audio playback prompt to the cross-platform UI layer through the native playback layer control communication bridge layer, so that the cross-platform UI layer can determine the display state of the playback control control in the playback interface of the target audio based on the audio playback prompt.

[0156] The processor in the aforementioned electronic device, by executing machine-executable instructions, can implement the following operations of the aforementioned audio playback control method: receiving a variable-speed playback request for the target audio through a cross-platform UI layer, and determining the playback rate contained in the variable-speed playback request; sending the playback rate to the native playback layer through a communication bridging layer; playing the audio data according to the playback rate through the native playback layer; sending playback progress data of the audio data to the communication bridging layer at preset intervals; determining the second target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sending the identification information of the playback progress data and the second target text to the cross-platform UI layer; updating the playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the second target text.

[0157] The processor in the aforementioned electronic device, by executing machine-executable instructions, can implement the following operations of the aforementioned audio playback control method: in response to the cross-platform UI layer receiving a progress jump instruction, determining the second playback duration to which the progress jump instruction jumps; sending the second playback duration to the native playback layer via the communication bridging layer; controlling the target audio to play from the second playback duration via the native playback layer; determining the third target text corresponding to the second playback duration from the audio text via the communication bridging layer; sending the identification information of the second playback duration and the third target text to the cross-platform UI layer; and updating the playback interface of the target audio via the cross-platform UI layer based on the identification information of the second playback duration and the third target text.

[0158] The processor in the aforementioned electronic device can execute machine-executable instructions to implement the following operations of the audio playback control method: every preset time interval, sending playback progress data to the communication bridging layer through the native playback layer; determining the fourth target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sending the playback progress data and the identification information of the fourth target text to the cross-end UI layer; and updating the playback interface of the target audio through the cross-end UI layer based on the playback progress data and the identification information of the fourth target text.

[0159] The processor in the aforementioned electronic device, by executing machine-executable instructions, can implement the following operations of the aforementioned audio playback control method: in response to the native playback layer receiving a target event, stopping the playback of audio data and determining a third playback duration for the target audio; wherein the target event includes: a disconnection event of the audio output device, and / or an audio focus loss event; sending a play / pause prompt and playback progress data to the communication bridging layer through the native playback layer; determining the fifth target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sending the playback progress data, the play / pause prompt, and the identification information of the fifth target text to the cross-platform UI layer; updating the display state of the specified control in the playback interface of the target audio through the cross-platform UI layer based on the play / pause prompt and playback progress data, and maintaining the fifth target text displayed according to the specified display format.

[0160] The processor in the aforementioned electronic device, by executing machine-executable instructions, can implement the following operations of the aforementioned audio playback control method: in response to receiving a request to resume playback of the target audio, controlling the target audio to play from the third playback duration through the native playback layer; sending playback progress data of the audio data to the communication bridging layer; determining the sixth target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sending the playback progress data and the identification information of the sixth target text to the cross-platform UI layer; updating the playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the sixth target text.

[0161] The computer program products of the audio playback control method, apparatus, electronic device and storage medium provided in the embodiments of the present invention include a computer-readable storage medium storing program code. The instructions included in the program code can be used to execute the methods described in the preceding method embodiments. For specific implementation, please refer to the method embodiments, which will not be repeated here.

[0162] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the system and apparatus described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0163] Furthermore, in the description of the embodiments of the present invention, unless otherwise explicitly specified and limited, the terms "installation," "connection," and "linking" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in the present invention based on the specific circumstances.

[0164] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, essentially, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0165] In the description of this invention, it should be noted that the terms "center," "upper," "lower," "left," "right," "vertical," "horizontal," "inner," and "outer," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are used only for the convenience of describing the invention and for simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on the invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and should not be construed as indicating or implying relative importance.

[0166] Finally, it should be noted that the above embodiments are merely specific implementations of the present invention, used to illustrate the technical solutions of the present invention, and not to limit it. The scope of protection of the present invention is not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present invention, or make equivalent substitutions for some of the technical features; and these modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. An audio playback control method, characterized in that, The terminal device runs a target application, which includes a cross-platform UI layer, a native playback layer, and a communication bridging layer; The cross-platform UI layer is built on a cross-platform development framework; The native playback layer is built using the native code of the terminal device; The method includes: The cross-platform UI layer receives an audio playback request for the target audio and sends the audio resources of the target audio to the native playback layer through the communication bridge layer; wherein, the audio resources include: the network playback address and audio text of the target audio; The native playback layer obtains the audio data corresponding to the network playback address, plays the audio data, and sends the playback progress data of the audio data to the communication bridging layer. The first target text corresponding to the playback progress data is determined from the audio text through the communication bridging layer, and the identification information of the playback progress data and the first target text is sent to the cross-platform UI layer. The cross-platform UI layer controls the display of the target audio playback interface based on the playback progress data and the identification information of the first target text.

2. The method according to claim 1, characterized in that, The step of receiving an audio playback request for a target audio through the cross-platform UI layer and sending the audio resources of the target audio to the native playback layer through the communication bridge layer includes: The cross-platform UI layer receives an audio playback request for the target audio and determines the audio resources of the target audio. The audio resources of the target audio are sent to the communication bridge layer through the communication channel provided by the cross-platform development framework. The audio resources are encapsulated through the communication bridging layer and then sent to the native playback layer.

3. The method according to claim 2, characterized in that, The steps for determining the audio resources of the target audio include: Obtain the network playback address and initial audio text of the target audio. The initial audio text is segmented through the cross-platform UI layer to obtain audio text containing multiple text segments; wherein, each text segment has a corresponding playback start and end time, and the playback start and end time is used to indicate the playback period of the audio corresponding to the text segment in the target audio.

4. The method according to claim 1, characterized in that, The step of sending the playback progress data of the audio data to the communication bridging layer includes: Playback progress data is sent from the native playback layer to the communication bridge layer at preset intervals.

5. The method according to claim 1, characterized in that, The playback progress data includes: the first playback duration of the target audio; The step of determining the first target text corresponding to the playback progress data from the audio text through the communication bridging layer, and sending the playback progress data and the identification information of the first target text to the cross-platform UI layer includes: The communication bridging layer determines the first target text corresponding to the first playback duration from the audio text, and sends the first playback duration of the target audio and the identification information of the first target text to the cross-platform UI layer.

6. The method according to claim 5, characterized in that, The audio text contains multiple text segments; each text segment has a corresponding playback start and end time, which indicates the playback period of the audio corresponding to the text segment in the target audio. The step of determining the first target text corresponding to the first playback duration from the audio text through the communication bridging layer includes: The text segments are sorted according to the playback start time in the playback start and end times; Based on the sorted text segments, a first text segment corresponding to the latest playback start time no later than the first playback duration is determined using a preset search method. The first text fragment is identified as the first target text.

7. The method according to claim 1, characterized in that, The step of controlling the display of the playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the first target text includes: The cross-platform UI layer determines the display state of a specified control in the playback interface based on the playback progress data; the first target text is determined based on the identification information, and the first target text is displayed in the playback interface in a specified display format.

8. An audio playback control device, characterized in that, The terminal device runs a target application, which includes a cross-platform UI layer, a native playback layer, and a communication bridging layer; The cross-platform UI layer is built on a cross-platform development framework; The native playback layer is built using the native code of the terminal device; The device includes: The first sending module is used to receive an audio playback request for a target audio through the cross-platform UI layer, and send the audio resources of the target audio to the native playback layer through the communication bridging layer; wherein, the audio resources include: the network playback address and audio text of the target audio; The second sending module is used to obtain the audio data corresponding to the network playback address through the native playback layer, play the audio data, and send the playback progress data of the audio data to the communication bridging layer; The third sending module is used to determine the first target text corresponding to the playback progress data from the audio text through the communication bridging layer, and send the playback progress data and the identification information of the first target text to the cross-platform UI layer; The first display module is used to control the display of the playback interface of the target audio through the cross-platform UI layer based on the playback progress data and the identification information of the first target text.

9. An electronic device, characterized in that, The device includes a processor and a memory, the memory storing machine-executable instructions that can be executed by the processor, the processor executing the machine-executable instructions to implement the audio playback control method according to any one of claims 1-7.

10. A storage medium, characterized in that, The storage medium stores machine-executable instructions, which, when invoked and executed by a processor, cause the processor to implement the audio playback control method according to any one of claims 1-7.