Audio and video data synchronization method and device based on DDS

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By synchronizing audio and video data in vehicles using a DDS server, the problem of audio and video asynchrony is solved, and data synchronization playback and storage optimization are achieved.

CN116761025BActive Publication Date: 2026-06-12BEIJING HUAYUTONGSOFT TECH CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING HUAYUTONGSOFT TECH CO LTD
Filing Date: 2023-04-26
Publication Date: 2026-06-12

Application Information

Patent Timeline

26 Apr 2023

Application

12 Jun 2026

Publication

CN116761025B

IPC: H04N21/242; H04N21/43; H04N21/234; H04N21/44; H04N21/8547

CPC: H04N21/242; H04N21/4302; H04N21/23418; H04N21/44008; H04N21/8547

AI Tagging

Application Domain

Selective content distribution

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

The audio and video inputs are located on different nodes in the vehicle, causing audio and video to be out of sync and affecting the user experience.

⚗Method used

Audio and video data synchronization is achieved using a DDS server. This involves decapsulating and de-protocol at the sending end, associating the avpacket data structure of audio and video frames with the message subject, and publishing it to the DDS server for the receiving end to parse. A minimum transmission tolerance time between audio and video frames is set to ensure synchronization.

🎯Benefits of technology

It enables synchronized playback of audio and video data across different nodes, enhancing the user experience while reducing data transmission volume and storage space.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116761025B_ABST

Patent Text Reader

Abstract

The application discloses a kind of audio and video data synchronization method and device based on DDS, and the present application relates to vehicle communication technical field, wherein including: audio and video code stream is unpacked and is unpacked protocol, obtains the first avpacket data structure of audio frame corresponding and the second avpacket data structure of video frame corresponding;The content in first avpacket data structure and second avpacket data structure is associated with the message topic created respectively;Audio data message and video data message after association are published to DDS server for receiving end to carry out message subscription;Wherein, receiving end is used to analyze audio data message and video data message respectively, obtains the first avpacket data structure of audio frame corresponding and the second avpacket data structure of video frame corresponding, if the time difference between the timestamp of audio frame and the timestamp of video frame is less than preset tolerance time, then determine that audio frame and video frame are synchronized, and play audio and video based on first avpacket data structure and second avpacket data structure.This application can guarantee that audio and video data are synchronized.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of vehicle communication technology, and more specifically, to a method and apparatus for synchronizing audio and video data based on DDS. Background Technology

[0002] In recent years, with the development of intelligent driving technology, people's demands for in-vehicle entertainment systems have also increased. While fulfilling their basic transportation functions, vehicles also need to meet people's entertainment needs. Therefore, an increasing amount of audio and video data needs to be transmitted via in-vehicle networks.

[0003] Currently, when playing audio and video in a vehicle, the audio and video devices typically need to play simultaneously. However, because the video and audio devices in a vehicle are located on different nodes, and most vehicles have multiple audio nodes, communication delays between these nodes can lead to audio and video desynchronization, severely impacting the user experience. Summary of the Invention

[0004] This invention provides a method and apparatus for synchronizing audio and video data based on DDS, which mainly ensures the synchronization of audio and video data at different nodes of a vehicle.

[0005] According to a first aspect of the present invention, a DDS-based audio and video data synchronization method is provided, applied at a transmitting end, comprising:

[0006] Obtain the audio and video streams to be sent;

[0007] The audio and video streams are de-encapsulated and de-protocoled to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame;

[0008] The contents of the first avpacket data structure and the second avpacket data structure are associated with the created message topic to obtain audio data messages and video data messages;

[0009] The audio data message and the video data message are published to the DDS server for the receiving end to subscribe to. The receiving end parses the audio data message and the video data message respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. If the time difference between the timestamp of the audio frame and the timestamp of the video frame is less than a preset tolerance time, it is determined that the audio frame and the video frame are synchronized. The first avpacket data structure and the second avpacket data structure are then parsed to obtain parsed video data and parsed audio data. Based on the parsed video data and the parsed audio data, the audio and video are played.

[0010] According to a second aspect of the present invention, another DDS-based audio and video data synchronization method is provided, applied at a receiving end, comprising:

[0011] In response to audio and video data messages sent by the sender in the DDS server, the audio and video data messages are parsed to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. The sender decapsulates and deprotocols the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame. The sender then associates the contents of the first and second avpacket data structures with the created message topic to obtain the audio data message and the video data message.

[0012] Determine the time difference between the timestamp of the audio frame and the timestamp of the video frame;

[0013] If the time difference is less than the preset tolerance time, then the audio frame is determined to be synchronized with the video frame, and the first avpacket data structure and the second avpacket data structure are parsed respectively to obtain the parsed video data and the parsed audio data.

[0014] Based on the parsed video data and the parsed audio data, the audio and video are played.

[0015] According to a third aspect of the present invention, a DDS-based audio and video data synchronization device is provided, comprising:

[0016] The acquisition unit is used to acquire the audio and video streams to be sent.

[0017] The decapsulation and protocol unit is used to decapsulate and deprotocolize the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame.

[0018] The association unit is used to associate the contents of the first avpacket data structure and the second avpacket data structure with the created message topic to obtain audio data messages and video data messages respectively;

[0019] The publishing unit is used to publish the audio data message and the video data message to the DDS server for the receiving end to subscribe to the message; wherein, the receiving end is used to parse the audio data message and the video data message respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. If the time difference between the timestamp of the audio frame and the timestamp of the video frame is less than a preset tolerance time, it is determined that the audio frame and the video frame are synchronized, and the first avpacket data structure and the second avpacket data structure are parsed respectively to obtain parsed video data and parsed audio data. Based on the parsed video data and the parsed audio data, the audio and video are played.

[0020] According to a fourth aspect of the present invention, another DDS-based audio and video data synchronization device is provided, comprising:

[0021] The first parsing unit is used to respond to audio data messages and video data messages sent by the sending end in the DDS server, and to parse the audio data messages and video data messages respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame; wherein, the sending end is used to decapsulate and deprotocol the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame, and to associate the contents of the first avpacket data structure and the second avpacket data structure with the created message topic to obtain the audio data message and the video data message;

[0022] A determining unit is configured to determine the time difference between the timestamp of the audio frame and the timestamp of the video frame;

[0023] The second parsing unit is used to determine that the audio frame is synchronized with the video frame if the time difference is less than a preset tolerance time, and to parse the first avpacket data structure and the second avpacket data structure respectively to obtain the parsed video data and the parsed audio data.

[0024] The playback unit is used to play audio and video based on the parsed video data and the parsed audio data.

[0025] According to a fifth aspect of the present invention, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the steps of a DDS-based audio and video data synchronization method.

[0026] According to a sixth aspect of the present invention, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of a DDS-based audio and video data synchronization method.

[0027] According to a seventh aspect of the present invention, another computer-readable storage medium is provided, having a computer program stored thereon that, when executed by a processor, implements the steps of a DDS-based audio and video data synchronization method.

[0028] According to an eighth aspect of the present invention, another electronic device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of a DDS-based audio and video data synchronization method.

[0029] The innovative aspects of this invention include:

[0030] 1. One of the innovative aspects of this invention is that it utilizes a DDS server to set a certain transmission strategy to ensure the synchronization of audio and video data between different nodes.

[0031] 2. One of the innovative aspects of this invention is that only the contents of the avpacket data structure are transmitted at the sending end, and the avpacket data structure is parsed at the receiving end, thereby reducing the amount of data transmitted and saving storage space.

[0032] 3. Redefining the content of the avpacket data structure based on the data type defined in the DDS server to ensure smooth data transmission is one of the innovations of this embodiment.

[0033] This invention provides a DDS-based audio and video data synchronization method and apparatus. Compared with existing technologies, it can acquire the audio and video streams to be sent, and decapsulate and deprotocol-decode the audio and video streams to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. Simultaneously, the contents of the first and second avpacket data structures are associated with a created message topic to obtain an audio data message and a video data message. Finally, the audio data message and the video data message are published to the DDS server for the receiving end to subscribe to. The receiving end parses the audio data message and the video data message to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame. If the time difference between the timestamp of the audio frame and the timestamp of the video frame is less than a preset tolerance time, the audio frame and the video frame are determined to be synchronized. The first and second avpacket data structures are then parsed to obtain parsed video data and parsed audio data. Based on the parsed video data and the parsed audio data, the audio and video are played. Therefore, it can be seen that by setting the minimum transmission tolerance time between audio and video data through the DDS server, the present invention can ensure the synchronous playback of audio and video data at different nodes, thereby enhancing the user experience. In addition, since the present invention only transmits the contents of the avpacket data structure at the sending end and parses the avpacket data structure at the receiving end, it can reduce the amount of data transmitted and save storage space.

[0034] The above description is only an overview of the technical solution of this application. In order to better understand the technical means of this application and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of this application more obvious and understandable, the following are specific embodiments of this application. Attached Figure Description

[0035] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0036] Figure 1 The diagram illustrates a flowchart of an audio / video data synchronization method based on DDS provided by an embodiment of the present invention.

[0037] Figure 2This illustration shows a flowchart of another DDS-based audio and video data synchronization method provided by an embodiment of the present invention.

[0038] Figure 3 A schematic diagram of an audio and video data synchronization device based on DDS provided in an embodiment of the present invention is shown.

[0039] Figure 4 This diagram illustrates another DDS-based audio and video data synchronization device provided in an embodiment of the present invention.

[0040] Figure 5 A schematic diagram of the physical structure of an electronic device provided in an embodiment of the present invention is shown;

[0041] Figure 6 A schematic diagram of the physical structure of another electronic device provided in an embodiment of the present invention is shown; Detailed Implementation

[0042] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0043] It should be noted that the terms "comprising" and "having," and any variations thereof, in the embodiments and drawings of this invention are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the steps or units listed, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or devices.

[0044] Because the video and audio ends of a vehicle are located on different nodes, and most vehicles have multiple audio ends, communication delays between nodes can lead to audio and video desynchronization, severely impacting the user experience.

[0045] To overcome the above-mentioned shortcomings, embodiments of the present invention provide an audio and video data synchronization method based on DDS, applied at the transmitting end, such as... Figure 1 As shown, the method includes:

[0046] Step 101: Obtain the audio and video streams to be sent.

[0047] The audio and video streams to be sent come from audio and video files, which include audio and video streams.

[0048] The embodiments of the present invention are mainly applicable to scenarios where audio and video data from different nodes within a vehicle are played synchronously.

[0049] In this embodiment of the invention, the sending end and the receiving end can communicate using a message subscription method. That is, the sending end publishes audio and video data messages to the DDS server (DDS communication middleware), and the receiving end can receive the messages published by the sending end through message subscription. The technical specification of the DDS communication middleware adopts the publish / subscribe model in data distribution models, emphasizing data-centricity in data communication and providing rich QoS (Quality of Service) policies. Compared to RTP (Real-time Transport Protocol), the DDS communication middleware has excellent transmission performance for both audio and video files and other files. Furthermore, because the DDS communication middleware uses a distributed / subscribe model, it can support the access of numerous users and nodes. In addition, the DDS communication middleware has good cross-platform compatibility and is simple and convenient to deploy. Moreover, compared to the traditional Ethernet transmission protocol Some / IP, the DDS communication middleware provides highly flexible, real-time, reliable, and secure message transmission services, which can not only improve system interconnection capabilities but also meet various information exchange and sharing needs within the system.

[0050] In this embodiment of the invention, to enable synchronized playback of audio and video data from different nodes, it is necessary to pre-decode and decapsulate the audio and video streams using the ffmpeg tool at the sending end. Specifically, all components of the ffmpeg tool are first registered, then the audio and video file is opened to obtain the audio and video streams, so that the ffmpeg tool can be used to decode and decapsulate the audio and video streams.

[0051] Step 102: Decapsulate and deprotocol the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame.

[0052] To reduce data transmission volume and save storage space, this embodiment of the invention only decapsulates and de-protocols the video stream at the sending end, and performs formal parsing at the receiving end. Specifically, the ffmpeg tool is used to de-protocol and decapsulate the audio and video streams, obtaining a first avpacket data structure corresponding to each of the multiple audio frames and a second avpacket data structure corresponding to each of the multiple video frames. The first and second avpacket data structures are not parsed.

[0053] Step 103: Associate the contents of the first avpacket data structure and the second avpacket data structure with the created message topic to obtain audio data messages and video data messages.

[0054] In this embodiment of the invention, after obtaining the first avpacket data structure corresponding to multiple audio frames and the second avpacket data structure corresponding to multiple video frames, the sending end reads one frame of audio data (first avpacket data structure) and one frame of video data (second avpacket data structure), and creates a corresponding message topic. Then, it associates the contents of the first avpacket data structure and the second avpacket data structure with the created corresponding message topic to obtain the audio data message and video data message to be sent.

[0055] Step 104: Publish the audio data message and the video data message to the DDS server for the receiving end to subscribe to the message.

[0056] The receiving end is configured to parse the audio data message and the video data message respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. If the time difference between the timestamp of the audio frame and the timestamp of the video frame is less than a preset tolerance time, it is determined that the audio frame and the video frame are synchronized. The first avpacket data structure and the second avpacket data structure are then parsed to obtain parsed video data and parsed audio data. Based on the parsed video data and the parsed audio data, the audio and video are played.

[0057] In this embodiment of the invention, after generating audio and video data messages, the sending end creates a datawriter corresponding to the audio data message and a datawriter corresponding to the video data message. Based on the created datawriters, it sends the audio and video data messages to the DDS server. Receiving ends that subscribe to the message topic can then receive these audio and video data messages. Further, it reads the next frame of audio data (first avpacket data structure) and the next frame of video data (second avpacket data structure) and sends the message until all audio and video frames have been read, releasing resources.

[0058] In one optional implementation of this disclosure, the step of decapsulating and deprotocol-decapsulating the audio and video streams to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame includes: decapsulating the audio and video streams based on the stream type corresponding to the audio and video streams to obtain an audio stream and a video stream; deprotocol-decapsulating the audio stream based on the audio protocol to obtain a first avpacket data structure corresponding to the audio frame, and deprotocol-decapsulating the video stream based on the video protocol to obtain a second avpacket data structure corresponding to the video frame.

[0059] Specifically, after acquiring the audio and video streams, they are first decapsulated, that is, the audio and video streams are decomposed into audio streams and video streams. Then, the audio streams and video streams are deprotocol-decoded separately. That is, the audio stream is deprotocol-decoded according to the audio protocol to obtain the first avpacket data structure corresponding to each of the multiple audio frames. Similarly, the video stream is deprotocol-decoded according to the video protocol to obtain the second avpacket data structure corresponding to each of the multiple video frames.

[0060] Compared to directly parsing data at the sending end, the embodiments of the present invention can greatly reduce the amount of data transmitted and the data transmission time, and also save storage space.

[0061] In one optional implementation of this disclosure, the step of associating the contents of the first avpacket data structure and the second avpacket data structure with the created message topic to obtain audio data messages and video data messages includes: redefining the contents of the first avpacket data structure and the second avpacket data structure based on the data types defined in the DDS server to obtain audio frame data and video frame data; and associating the audio frame data and the video frame data with the created message topic to generate audio data messages and video data messages.

[0062] Since the DDS server can only transmit basic data types, such as string, int, and char, it is necessary to redefine the contents of the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame according to the data types defined in the DDS server, so as to obtain audio frame data and video frame data that meet the data type definition requirements. The contents of the redefined avpacket data structure are shown in the table below:

[0063]

[0064]

[0065] The redefined audio and video frame data are then associated with the message topic to obtain audio data messages and video data messages.

[0066] Furthermore, since the receiving end needs decoding context information, i.e., the AVCodecContext data structure, when decoding the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame, it needs to send the decoding context information (AVCodecContext data structure) to the receiving end before sending the contents of the avpacket data structure to the receiving end, so that the receiving end can perform decoding based on the decoding context information. Specifically, since the AVCodecContext data structure contains many pointer-type contents that the DDS server cannot transmit, it is also necessary to redefine the contents of the AVCodecContext data structure according to the data types defined in the DDS server, associate the redefined decoding context information with the created message topic, and send it to the DDS server. The receiving end subscribing to the message topic can receive this decoding context information, and based on this decoding context information, the receiving end can decode the avpacket data structure. The contents of the redefined AVCodecContext data structure are shown in the table below:

[0067]

[0068]

[0069] When a custom data structure contains the above data content, it can be restored into an avpacket data structure and an AVCodecContext data structure at the receiving end so that it can be recognized by the ffmpeg tool.

[0070] It should be noted that the AVCodecContext data structure corresponding to the audio frame and the AVCodecContext data structure corresponding to the video frame only need to be sent once, unlike the avpacket data structure which needs to be sent repeatedly.

[0071] In one optional embodiment of this disclosure, the step of publishing the audio data message and the video data message to the DDS server for message subscription by the receiving end includes: determining an audio transmission interval based on the timestamp of the audio frame and the timestamp of the previous audio frame corresponding to the audio frame, and determining a video transmission interval based on the timestamp of the video frame and the timestamp of the previous video frame corresponding to the video frame; adjusting the video transmission interval based on the audio transmission interval to obtain an adjusted video transmission interval; and publishing the audio data message to the DDS server for message subscription by the receiving end based on the audio transmission interval, and publishing the video data message to the DDS server for message subscription by the receiving end based on the adjusted video transmission interval.

[0072] Specifically, when sending data messages, the transmission interval between two adjacent audio frames (i.e., the audio transmission interval) can be determined based on the timestamp in the first avpacket data structure. Similarly, the transmission interval between two adjacent video frames (i.e., the video transmission interval) can be determined based on the timestamp in the second avpacket data structure. Then, based on the determined audio and video transmission intervals, audio and video data messages are sent. Furthermore, to ensure audio and video data synchronization, the video transmission interval can be adjusted at the sending end based on the audio transmission interval.

[0073] This invention provides a DDS-based audio and video data synchronization method. By setting the minimum transmission tolerance time between audio and video data through a DDS server, it can ensure synchronized playback of audio and video data at different nodes, enhancing the user experience. Furthermore, since this invention only transmits the content of the avpacket data structure at the sending end and parses the avpacket data structure at the receiving end, it can reduce the amount of data transmitted and save storage space.

[0074] Furthermore, this embodiment of the invention also provides another audio and video data synchronization method based on DDS, applied at the receiving end, such as... Figure 2 As shown, the method includes:

[0075] Step 201: In response to the audio data message and video data message sent by the sending end in the DDS server, parse the audio data message and the video data message respectively to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame.

[0076] The sending end is used to decapsulate and deprotocol the audio and video streams to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. The contents of the first avpacket data structure and the second avpacket data structure are then associated with the created message topic to obtain an audio data message and a video data message.

[0077] Specifically, when the sending end sends audio data messages and video data messages to the DDS server, the receiving end can subscribe to the audio data messages and video data messages. By parsing the above-mentioned topic messages, it can obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame.

[0078] Step 202: Determine the time difference between the timestamp of the audio frame and the timestamp of the video frame.

[0079] In this embodiment of the invention, the timestamp of the audio frame can be determined based on the display timestamp and decoding timestamp in the first avpacket data structure. Similarly, the timestamp of the video frame can be determined based on the display timestamp and decoding timestamp in the second avpacket data structure. Furthermore, subtracting the timestamp of the audio frame from the timestamp of the video frame yields the transmission time difference between the audio frame and the video frame.

[0080] Step 203: If the time difference is less than the preset tolerance time, then the audio frame is determined to be synchronized with the video frame, and the first avpacket data structure and the second avpacket data structure are parsed respectively to obtain the parsed video data and the parsed audio data.

[0081] The preset tolerance time can be set according to actual business needs.

[0082] In this embodiment of the invention, if the time difference is greater than or equal to the tolerance time, and the timestamp of the video frame is less than the timestamp of the audio frame, then the video frame is discarded; if the time difference is greater than or equal to the tolerance time, and the timestamp of the video frame is greater than the timestamp of the audio frame, then video playback is delayed.

[0083] For example, if the preset tolerance time is 15ms and the transmission time difference between the audio frame and the time frame is 10ms, since this time difference is less than 15ms, it can be determined that the audio frame and the video frame are synchronized. At this time, the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame can be parsed respectively to obtain the parsed audio data acc and the parsed video data yuv.

[0084] For example, if the preset tolerance time is 15ms and the transmission time difference between the audio frame and the time frame is 17ms, since this time difference is greater than 15ms, it can be determined that the audio frame and the video frame are out of sync. In this case, if the timestamp of the video frame is less than the timestamp of the audio frame, the video frame can be discarded until the time difference between the timestamp of the video frame and the timestamp of the audio frame is less than the preset tolerance time. If the timestamp of the video frame is greater than the timestamp of the audio frame, it means that the video is ahead and the playback of the video frame needs to be delayed.

[0085] Furthermore, since the receiving end needs to parse the avpacket data structure based on the decoding context information (AVCodecContext data structure), before receiving the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame, the receiving end will first receive a topic message containing the AVCodecContext data structure, so as to parse the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame based on the content in the AVCodecContext data structure.

[0086] Step 204: Play the audio and video based on the parsed video data and the parsed audio data.

[0087] In this embodiment of the invention, after the receiving end obtains the parsed audio data (acc) and the parsed video data (yuv), it can play the audio and video on different audio and video nodes.

[0088] In one optional embodiment of this disclosure, the step of parsing the audio data message and the video data message respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame includes: parsing the audio data message and the video data message respectively to obtain audio frame data and video frame data that meet the data type definition requirements; and performing structure restoration on the audio frame data and the video frame data respectively to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame.

[0089] Specifically, the receiving end parses the subscribed topic message to obtain the redefined audio frame data and audio frame data. Since the redefined audio frame data and audio frame data are basic data types, they cannot be recognized by the ffmpeg tool. Therefore, it is necessary to restore their structure, that is, restore the audio frame data to the first avpacket data structure and restore the video frame data to the second avpacket data structure.

[0090] In one optional implementation of this disclosure, the DDS server can use a keep_last QoS strategy. That is, when network fluctuations occur and multiple audio or video frames arrive at the receiving end simultaneously, the receiving end only plays the latest frame and discards the others to ensure audio and video data synchronization. Based on this, the method further includes: when multiple audio data messages or video data messages are subscribed to, determining the latest audio or video frame based on the timestamps of the audio frames contained in the multiple audio data messages or the timestamps of the video frames contained in the multiple video data messages; and playing the audio or video based on the avpacket data structure corresponding to the latest audio frame or the avpacket data structure corresponding to the latest video frame.

[0091] Furthermore, embodiments of the present invention can also invoke the listener inside the DDS server to listen for data. Once there is a data message, the receiving end can obtain the data and play it, thereby ensuring that there is no playback delay.

[0092] Another audio and video data synchronization method based on DDS provided in this embodiment of the invention can ensure synchronized playback of audio and video data at different nodes by setting the minimum transmission tolerance time between audio and video data through the DDS server, thereby enhancing the user experience. In addition, since this invention only transmits the content of the avpacket data structure at the sending end and parses the avpacket data structure at the receiving end, it can reduce the amount of data transmitted and save storage space.

[0093] Furthermore, as Figure 1 In specific implementation, embodiments of the present invention provide an audio and video data synchronization device based on DDS, such as... Figure 3 As shown, the device includes: an acquisition unit 31, a decapsulation and protocol unit 32, an association unit 33, and a publishing unit 34.

[0094] The acquisition unit 31 can be used to acquire the audio and video bitstream to be sent.

[0095] The decapsulation and protocol unit 32 can be used to decapsulate and deprotocol the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame.

[0096] The association unit 33 can be used to associate the contents of the first avpacket data structure and the second avpacket data structure with the created message topic to obtain audio data messages and video data messages.

[0097] The publishing unit 34 can be used to publish the audio data message and the video data message to the DDS server for the receiving end to subscribe to the message; wherein, the receiving end is used to parse the audio data message and the video data message respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. If the time difference between the timestamp of the audio frame and the timestamp of the video frame is less than a preset tolerance time, it is determined that the audio frame and the video frame are synchronized, and the first avpacket data structure and the second avpacket data structure are parsed respectively to obtain parsed video data and parsed audio data. Based on the parsed video data and the parsed audio data, the audio and video are played.

[0098] In specific application scenarios, the decapsulation and protocol unit 32 can be specifically used to decapsulate the audio and video streams based on the stream type corresponding to the audio and video streams to obtain audio streams and video streams; to de-protocol the audio stream based on the audio protocol to obtain the first avpacket data structure corresponding to the audio frame; and to de-protocol the video stream based on the video protocol to obtain the second avpacket data structure corresponding to the video frame.

[0099] In specific application scenarios, the association unit 33 can be specifically used to redefine the content in the first avpacket data structure and the second avpacket data structure based on the data types defined in the DDS server, respectively, to obtain audio frame data and video frame data; and associate the audio frame data and the video frame data with the created message topics respectively to generate audio data messages and video data messages.

[0100] In a specific application scenario, the publishing unit 34 can be specifically used to determine the audio transmission interval based on the timestamp of the audio frame and the timestamp of the previous audio frame corresponding to the audio frame, and to determine the video transmission interval based on the timestamp of the video frame and the timestamp of the previous video frame corresponding to the video frame; adjust the video transmission interval based on the audio transmission interval to obtain the adjusted video transmission interval; publish the audio data message to the DDS server for the receiving end to subscribe to the message based on the audio transmission interval, and publish the video data message to the DDS server for the receiving end to subscribe to the message based on the adjusted video transmission interval.

[0101] It should be noted that other corresponding descriptions of the functional modules involved in the DDS-based audio and video data synchronization device provided in this embodiment of the invention can be found in [reference needed]. Figure 1The corresponding description of the method shown will not be repeated here.

[0102] Based on the above, Figure 1 The method shown in the figure, correspondingly, also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, performs the following steps: acquiring the audio and video streams to be sent; decapsulating and deprotocol-decapsulating the audio and video streams to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame; associating the contents of the first avpacket data structure and the second avpacket data structure with the created message topic, respectively, to obtain an audio data message and a video data message; and publishing the audio data message and the video data message to the DDS server for use. The receiving end subscribes to messages; wherein, the receiving end is used to parse the audio data message and the video data message respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. If the time difference between the timestamp of the audio frame and the timestamp of the video frame is less than a preset tolerance time, it is determined that the audio frame and the video frame are synchronized, and the first avpacket data structure and the second avpacket data structure are parsed respectively to obtain parsed video data and parsed audio data. Based on the parsed video data and the parsed audio data, the audio and video are played.

[0103] Based on the above, Figure 1 The method shown and as Figure 3 The embodiment of the device shown in the invention also provides a physical structural diagram of an electronic device, such as... Figure 5As shown, the electronic device includes: a processor 51, a memory 52, and a computer program stored in the memory 52 and executable on the processor. Both the memory 52 and the processor 51 are mounted on a bus 53. When the processor 51 executes the program, it performs the following steps: acquiring the audio and video streams to be sent; decapsulating and deprototyping the audio and video streams to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame; associating the contents of the first avpacket data structure and the second avpacket data structure with a created message topic to obtain an audio data message and a video data message; and connecting the audio data message and the video data message... The audio and video data messages are published to the DDS server for the receiving end to subscribe to. The receiving end parses the audio data message and the video data message respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. If the time difference between the timestamp of the audio frame and the timestamp of the video frame is less than a preset tolerance time, it is determined that the audio frame and the video frame are synchronized. The first avpacket data structure and the second avpacket data structure are then parsed to obtain parsed video data and parsed audio data. Based on the parsed video data and the parsed audio data, the audio and video are played.

[0104] Furthermore, as Figure 2 In specific implementation, embodiments of the present invention provide an audio and video data synchronization device based on DDS, such as... Figure 4 As shown, the device includes: a first parsing unit 41, a determining unit 42, a second parsing unit 43, and a playback unit 44.

[0105] The first parsing unit 41 can be used to respond to audio data messages and video data messages sent by the sending end in the DDS server, and parse the audio data messages and video data messages respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame; wherein, the sending end is used to decapsulate and deprotocol the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame, and associate the contents of the first avpacket data structure and the second avpacket data structure with the created message topic to obtain the audio data message and the video data message.

[0106] The determining unit 42 can be used to determine the time difference between the timestamp of the audio frame and the timestamp of the video frame.

[0107] The second parsing unit 43 can be used to determine that the audio frame is synchronized with the video frame if the time difference is less than a preset tolerance time, and to parse the first avpacket data structure and the second avpacket data structure respectively to obtain the parsed video data and the parsed audio data.

[0108] The playback unit 44 can be used to play audio and video based on the parsed video data and the parsed audio data.

[0109] In a specific application scenario, the first parsing unit 41 can be specifically used to parse the audio data message and the video data message respectively to obtain audio frame data and video frame data that meet the data type definition requirements; and to perform structural restoration on the audio frame data and the video frame data respectively to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame.

[0110] In specific application scenarios, the device may also include a discard unit.

[0111] The discard unit can be used to discard a video frame if the time difference is greater than or equal to the tolerance time and the timestamp of the video frame is less than the timestamp of the audio frame.

[0112] The playback unit 44 can also be used to delay video playback if the time difference is greater than or equal to the tolerance time and the timestamp of the video frame is greater than the timestamp of the audio frame.

[0113] In specific application scenarios, the determining unit 42 can also be used to determine the latest audio frame or video frame based on the timestamps of the audio frames contained in the multiple audio data messages or the timestamps of the video frames contained in the multiple video data messages when multiple audio data messages or video data messages are subscribed to.

[0114] The playback unit 44 can also be used to play audio or video according to the avpacket data structure corresponding to the latest audio frame or the avpacket data structure corresponding to the latest video frame.

[0115] It should be noted that other corresponding descriptions of the functional modules involved in the DDS-based audio and video data synchronization device provided in this embodiment of the invention can be found in [reference needed]. Figure 2 The corresponding description of the method shown will not be repeated here.

[0116] Based on the above, Figure 2Correspondingly, this embodiment of the invention also provides a computer-readable storage medium storing a computer program that, when executed by a processor, performs the following steps: In response to audio data messages and video data messages sent by a sender in a DDS server, the audio data messages and video data messages are parsed respectively to obtain a first avpacket data structure corresponding to an audio frame and a second avpacket data structure corresponding to a video frame; wherein, the sender is used to decapsulate and deprotocol the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame, and associate the contents of the first avpacket data structure and the second avpacket data structure with a created message topic to obtain the audio data message and the video data message; determine the time difference between the timestamp of the audio frame and the timestamp of the video frame; if the time difference is less than a preset tolerance time, determine that the audio frame and the video frame are synchronized, and parse the first avpacket data structure and the second avpacket data structure respectively to obtain parsed video data and parsed audio data; play the audio and video based on the parsed video data and the parsed audio data.

[0117] Based on the above, Figure 2 The method shown and as Figure 4 The embodiment of the device shown in the invention also provides a physical structural diagram of an electronic device, such as... Figure 6As shown, the electronic device includes: a processor 61, a memory 62, and a computer program stored in the memory 62 and executable on the processor. Both the memory 62 and the processor 61 are mounted on a bus 63. When the processor 61 executes the program, it performs the following steps: in response to audio data messages and video data messages sent by the sending end in the DDS server, it parses the audio data messages and the video data messages respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame; wherein, the sending end is used to decapsulate and deprotocol-decode the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame. The second avpacket data structure corresponding to the video frame is used, and the contents of the first and second avpacket data structures are associated with the created message topic to obtain audio data messages and video data messages; the time difference between the timestamp of the audio frame and the timestamp of the video frame is determined; if the time difference is less than a preset tolerance time, the audio frame is determined to be synchronized with the video frame, and the first and second avpacket data structures are parsed respectively to obtain parsed video data and parsed audio data; based on the parsed video data and the parsed audio data, the audio and video are played.

[0118] This invention sets the minimum transmission tolerance time between audio and video data through a DDS server, which can ensure synchronous playback of audio and video data at different nodes and enhance user experience. In addition, since this invention only transmits the content of the avpacket data structure at the sending end and parses the avpacket data structure at the receiving end, it can reduce the amount of data transmitted and save storage space.

[0119] Those skilled in the art will understand that the accompanying drawings are merely schematic diagrams of one embodiment, and the modules or processes shown in the drawings are not necessarily essential for implementing the present invention.

[0120] Those skilled in the art will understand that the modules in the apparatus of the embodiments can be distributed in the apparatus of the embodiments as described in the embodiments, or they can be located in one or more devices different from this embodiment with corresponding changes. The modules of the above embodiments can be combined into one module, or they can be further divided into multiple sub-modules.

[0121] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for synchronizing audio and video data based on DDS, characterized in that, Applied to the sending end, including: Obtain the audio and video streams to be sent; The audio and video streams are de-encapsulated and de-protocoled to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame; The contents of the first avpacket data structure and the second avpacket data structure are associated with the created message topic to obtain audio data messages and video data messages; The audio data message and the video data message are published to the DDS server for the receiving end to subscribe to. The receiving end parses the audio data message and the video data message respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. If the time difference between the timestamp of the audio frame and the timestamp of the video frame is less than a preset tolerance time, it is determined that the audio frame and the video frame are synchronized. The first avpacket data structure and the second avpacket data structure are then parsed to obtain parsed video data and parsed audio data. Based on the parsed video data and the parsed audio data, the audio and video are played.

2. The method according to claim 1, characterized in that, The process of decapsulating and deprotocol-decapsulating the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame includes: Based on the stream type corresponding to the audio and video streams, the audio and video streams are decapsulated to obtain audio streams and video streams; The audio stream is de-protocol-based to obtain a first avpacket data structure corresponding to the audio frame, and the video stream is de-protocol-based to obtain a second avpacket data structure corresponding to the video frame; and / or The step of associating the contents of the first avpacket data structure and the second avpacket data structure with the created message topic to obtain audio data messages and video data messages includes: Based on the data types defined in the DDS server, the contents of the first avpacket data structure and the second avpacket data structure are redefined to obtain audio frame data and video frame data. Associating the audio frame data and the video frame data with the created message topic respectively, generating audio data messages and video data messages; and / or The step of publishing the audio data message and the video data message to the DDS server for the receiving end to subscribe to the message includes: The audio transmission interval is determined based on the timestamp of the audio frame and the timestamp of the previous audio frame corresponding to the audio frame; and the video transmission interval is determined based on the timestamp of the video frame and the timestamp of the previous video frame corresponding to the video frame. Based on the audio transmission interval, the video transmission interval is adjusted to obtain the adjusted video transmission interval; Based on the audio transmission interval, the audio data message is published to the DDS server for the receiving end to subscribe to the message, and based on the adjusted video transmission interval, the video data message is published to the DDS server for the receiving end to subscribe to the message.

3. A method for synchronizing audio and video data based on DDS, characterized in that, Applied to the receiving end, include: In response to audio and video data messages sent by the sender in the DDS server, the audio and video data messages are parsed to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. The sender decapsulates and deprotocols the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame. The sender then associates the contents of the first and second avpacket data structures with the created message topic to obtain the audio data message and the video data message. Determine the time difference between the timestamp of the audio frame and the timestamp of the video frame; If the time difference is less than the preset tolerance time, then the audio frame is determined to be synchronized with the video frame, and the first avpacket data structure and the second avpacket data structure are parsed respectively to obtain the parsed video data and the parsed audio data. Based on the parsed video data and the parsed audio data, the audio and video are played.

4. The method according to claim 3, characterized in that, The step of parsing the audio data message and the video data message respectively to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame includes: The audio data message and the video data message are parsed respectively to obtain audio frame data and video frame data that meet the data type definition requirements; The audio frame data and the video frame data are respectively structurally reconstructed to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame; and / or The method further includes: If the time difference is greater than or equal to the tolerance time, and the timestamp of the video frame is less than the timestamp of the audio frame, then the video frame is discarded. If the time difference is greater than or equal to the tolerance time, and the timestamp of the video frame is greater than the timestamp of the audio frame, then video playback is delayed; and / or The method further includes: When multiple audio data messages or video data messages are subscribed to, the latest audio frame or video frame is determined based on the timestamps of the audio frames contained in the multiple audio data messages or the timestamps of the video frames contained in the multiple video data messages. Play audio or video based on the avpacket data structure corresponding to the latest audio frame or the latest video frame.

5. A DDS-based audio and video data synchronization device, characterized in that, include: The acquisition unit is used to acquire the audio and video streams to be sent. The decapsulation and protocol unit is used to decapsulate and deprotocolize the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame. The association unit is used to associate the contents of the first avpacket data structure and the second avpacket data structure with the created message topic to obtain audio data messages and video data messages respectively; The publishing unit is used to publish the audio data message and the video data message to the DDS server for the receiving end to subscribe to the message; wherein, the receiving end is used to parse the audio data message and the video data message respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame. If the time difference between the timestamp of the audio frame and the timestamp of the video frame is less than a preset tolerance time, it is determined that the audio frame and the video frame are synchronized, and the first avpacket data structure and the second avpacket data structure are parsed respectively to obtain parsed video data and parsed audio data. Based on the parsed video data and the parsed audio data, the audio and video are played.

6. A DDS-based audio and video data synchronization device, characterized in that, include: The first parsing unit is used to respond to audio data messages and video data messages sent by the sending end in the DDS server, and to parse the audio data messages and video data messages respectively to obtain a first avpacket data structure corresponding to the audio frame and a second avpacket data structure corresponding to the video frame; wherein, the sending end is used to decapsulate and deprotocol the audio and video streams to obtain the first avpacket data structure corresponding to the audio frame and the second avpacket data structure corresponding to the video frame, and to associate the contents of the first avpacket data structure and the second avpacket data structure with the created message topic to obtain the audio data message and the video data message; A determining unit is configured to determine the time difference between the timestamp of the audio frame and the timestamp of the video frame; The second parsing unit is used to determine that the audio frame is synchronized with the video frame if the time difference is less than a preset tolerance time, and to parse the first avpacket data structure and the second avpacket data structure respectively to obtain the parsed video data and the parsed audio data. The playback unit is used to play audio and video based on the parsed video data and the parsed audio data.

7. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 2.

8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 2.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 3 to 4.

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 3 to 4.