An audio and video synchronous acquisition method, device and equipment
By triggering the synchronous acquisition of audio-visual data and visible light data through a hardware synchronization signal generator, the problem of insufficient audio-visual synchronization accuracy in existing technologies is solved, and high-precision audio-visual data synchronization is achieved, which is applicable to fields such as industrial inspection, security, and autonomous driving.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI THERMAL IMAGING TECH CO LTD
- Filing Date
- 2026-03-17
- Publication Date
- 2026-06-19
AI Technical Summary
Existing audio-visual synchronization acquisition technology cannot achieve high-precision time synchronization, resulting in a serious deviation between audio-visual positioning and visual images, making accurate correlation analysis impossible and limiting the application effect and reliability of the technology.
A hardware synchronization signal generator is used to generate a global hard trigger signal. By binding the hardware-level trigger with the timestamp, it is ensured that the audio-visual data and visible light data are acquired at the same time and associated with the same reference timestamp, thus eliminating the uncertainty caused by software scheduling and transmission delay.
It has reduced the synchronization error between acoustic and visible light image data from tens of milliseconds to less than 1 millisecond, or even to the microsecond level, improving the stability and accuracy of synchronization and enabling accurate capture and analysis of high-speed transient acoustic events.
Smart Images

Figure CN122245340A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data acquisition technology, and in particular to a method, apparatus and equipment for simultaneous audio and video acquisition. Background Technology
[0002] Acoustic imaging technology uses microphone arrays to collect sound signals and processes them through algorithms such as beamforming to visualize the source and intensity of sound as images. It is widely used in fields such as gas leak detection, abnormal noise location, and fault prediction. Visible light cameras provide intuitive visual scene information.
[0003] In practical applications, such as equipment status monitoring or security inspection, it is often necessary to analyze acoustic and optical images simultaneously and accurately map the location of the sound source to the corresponding location on the visible light screen. This requires that the two types of data be highly synchronized in time.
[0004] Existing synchronization solutions mostly employ software synchronization, which involves timestamping both types of data at the operating system level. However, due to factors such as operating system scheduling, data transmission delays, and differences in sensor exposure and acquisition timing, the error in this software synchronization is typically on the order of tens or even hundreds of milliseconds. For high-speed or transient events (such as mechanical impacts, discharges, and rapid leaks), such errors can lead to significant deviations between acoustic-image localization and visual images, making accurate correlation analysis impossible and severely limiting the application effectiveness and reliability of the technology. Existing technical solutions cannot achieve high-precision time synchronization between raw acoustic-image data and visible light data. Summary of the Invention
[0005] This invention provides a method, apparatus, and device for synchronous audio-visual acquisition to solve the problem that audio-visual data and visible light data cannot achieve high-precision time synchronization.
[0006] According to one aspect of the present invention, a method for synchronous audio-visual acquisition is provided, applied to a system-on-a-chip (SoC). The SoC includes a first interface, a second interface, and a hardware synchronization signal generator. The first interface is connected to a camera module via a first synchronization signal path, and the second interface is connected to an audio-visual sensor module via a second synchronization signal path. The method includes: In response to a data acquisition request, a global hard trigger signal is generated through the hardware synchronization signal generator; The global hard trigger signal is synchronously sent to the camera module and the acoustic sensor module through the first synchronization signal path and the second synchronization signal path, respectively, so that the camera module can acquire image data and the acoustic sensor module can acquire acoustic data at the same time. The system receives image data collected by the camera module through the first interface and acoustic data collected by the sound and image sensor module through the second interface. The reference timestamps corresponding to the image data and the acoustic data are determined by the hardware synchronization signal generator. Based on the reference timestamp, the image data, and the acoustic data, an audio-visual data pair is constructed.
[0007] According to another aspect of the present invention, a synchronous audio-visual acquisition device is provided, configured in a system-on-a-chip (SoC). The SoC includes a first interface, a second interface, and a hardware synchronization signal generator. The first interface is connected to a camera module via a first synchronization signal path, and the second interface is connected to an audio-visual sensor module via a second synchronization signal path. The device includes: The hard trigger signal generation module is used to generate a global hard trigger signal in response to a data acquisition request through the hardware synchronization signal generator; The hard trigger signal sending module is used to synchronously send the global hard trigger signal to the camera module and the audio-visual sensor module through the first synchronization signal path and the second synchronization signal path respectively, so that the camera module can acquire image data and the audio-visual sensor module can acquire acoustic data at the same time. The data acquisition module is used to receive image data acquired by the camera module through the first interface and acoustic data acquired by the sound and image sensor module through the second interface; The timestamp determination module is used to determine the reference timestamps corresponding to the image data and the acoustic data through the hardware synchronization signal generator; The data pair acquisition module is used to construct an audio-visual data pair based on the reference timestamp, the image data, and the acoustic data.
[0008] According to another aspect of the present invention, an electronic device is provided, the electronic device comprising: At least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the audio-visual synchronization acquisition method according to any embodiment of the present invention.
[0009] According to another aspect of the present invention, a computer-readable storage medium is provided, the computer-readable storage medium storing computer instructions for causing a processor to execute and implement the audio-visual synchronization acquisition method according to any embodiment of the present invention.
[0010] According to another aspect of the present invention, a computer program product is provided, comprising a computer program / instructions that, when executed by a processor, implement the audio-visual synchronization acquisition method as described in any embodiment of the present invention.
[0011] This invention employs a hardware-based global hard trigger mechanism. A global hard trigger signal triggers the synchronous acquisition of visible light image data and acoustic data. By binding hardware-level triggering with timestamps, the simultaneous acquisition of visible light image data and acoustic data at the same time and associating them with the same reference timestamp ensures high-precision audio-visual synchronization. This physically eliminates the uncertainties caused by software scheduling and transmission delays, reducing the synchronization error between audio-visual data and visible light image data from tens of milliseconds to less than one millisecond, or even down to the microsecond level. The hardware-based synchronization mechanism is unaffected by operating system load, software interrupts, or other factors, exhibiting significantly higher stability than software synchronization schemes. This high-precision synchronization facilitates accurate capture and analysis of high-speed transient acoustic events, and can be widely applied in industrial inspection, security, autonomous driving, and scientific research.
[0012] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description
[0013] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0014] Figure 1 This is a flowchart of a method for synchronous audio-visual acquisition provided in an embodiment of the present invention; Figure 2 This is a flowchart of another method for synchronous audio-visual acquisition provided in an embodiment of the present invention; Figure 3 This is a schematic diagram of the structure of a synchronous audio-visual acquisition device provided in an embodiment of the present invention; Figure 4 This is a schematic diagram of the structure of an electronic device that implements the audio-visual synchronous acquisition method of this invention. Detailed Implementation
[0015] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0016] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0017] Furthermore, it should be noted that the information collected in the technical solution of this invention is information and data authorized by the user or fully authorized by all parties, and the collection, storage, use, processing, transmission, provision, disclosure and application of related data all comply with the relevant laws, regulations and standards of relevant countries and regions, necessary confidentiality measures have been taken, and public order and good morals are not violated. Corresponding operation entry points are provided for users to choose to authorize or refuse.
[0018] Figure 1 This is a flowchart of a synchronous audio-visual acquisition method provided by an embodiment of the present invention. This embodiment is applicable to the synchronous acquisition of image data and acoustic data. The method can be executed by an audio-visual synchronous acquisition device, which can be implemented in hardware and / or software. This device can be configured in an electronic device with corresponding data processing capabilities, such as a System on a Chip (SoC). The SoC includes a first interface, a second interface, and a hardware synchronization signal generator. The first interface is connected to a camera module through a first synchronization signal path, and the second interface is connected to an audio-visual sensor module through a second synchronization signal path. Figure 1 As shown, the method includes: S110, in response to a data acquisition request, generates a global hard trigger signal through a hardware synchronization signal generator.
[0019] The data acquisition request is the instruction that triggers data acquisition. The hardware synchronization signal generator is used for timing and generating a global hard trigger signal. The global hard trigger signal is a pulse signal generated by the hardware synchronization signal generator, which is used to trigger the camera module and the audio-visual sensor module to synchronously perform data acquisition operations.
[0020] Specifically, the data acquisition request can be an external hardware request. For example, when an external button is detected to be pressed, an external hardware request is generated. The external button can be the "Start Acquisition" button on the on-chip system. In response to the data acquisition request, the hardware synchronization signal generator is controlled to generate a global hard trigger signal to trigger the camera module and the audio-visual sensor module to synchronously perform data acquisition operations.
[0021] S120. The global hard trigger signal is sent synchronously to the camera module and the audio-visual sensor module through the first synchronization signal path and the second synchronization signal path respectively, so that the camera module can collect image data and the audio-visual sensor module can collect acoustic data at the same time.
[0022] The first synchronization signal path is the hardware transmission path connecting the first interface of the system-on-chip (SoC) and the camera module. The second synchronization signal path is the hardware transmission path connecting the second interface of the SoC and the audio-visual sensor module. The camera module is a modular component used to acquire visible light video streams and output visible light image data. The audio-visual sensor module is a modular component used to acquire audio signals and output acoustic data. The audio-visual sensor module may include a microphone array. Image data is raw data reflecting the optical information of the target scene. The target scene is the scene for which data is to be acquired. Acoustic data is raw data reflecting the acoustic information of the target scene.
[0023] Specifically, a global hard trigger signal (such as a rising edge pulse) is synchronously sent to the camera module and the acoustic sensor module through the first synchronization signal path and the second synchronization signal path, respectively, so that the camera module can acquire image data and the acoustic sensor module can acquire acoustic data at the same time. This hardware-based global hard trigger mechanism eliminates the uncertainty caused by software scheduling and transmission delays at the physical level, greatly reducing the synchronization error between image data and acoustic data, and laying the hardware foundation for subsequent timestamp binding and synchronization matching.
[0024] S130: Receive image data collected by the camera module through the first interface, and receive acoustic data collected by the acoustic sensor module through the second interface.
[0025] The first interface is a system-on-chip (SoC) integrated hardware data receiving interface that matches the data output interface of the camera module. This interface receives image data acquired by the camera module under a global hard trigger signal and transmits the image data to the SoC's internal components (such as memory or processing units). The second interface is a system-on-chip (SoC) integrated hardware data receiving interface that matches the data output interface of the audio-visual sensor module. This interface receives acoustic data acquired by the audio-visual sensor module under a global hard trigger signal and transmits the acoustic data to the SoC's internal components (such as memory or processing units).
[0026] Specifically, after the camera module acquires image data, it sends the image data to the first interface of the on-chip system (SoC) through its own data output interface. The SoC then receives the image data through the first interface. Similarly, after the acoustic sensor acquires acoustic data, it sends the raw acoustic data to the second interface of the SoC through its own data output interface. The SoC then receives the acoustic data through the second interface. This allows for efficient and interference-free reception of the acquired data and enables coordination with the synchronization signal path to ensure the synchronization of subsequent data pairs.
[0027] S140. Determine the reference timestamps corresponding to the image data and acoustic data through a hardware synchronization signal generator.
[0028] The reference timestamp is the time information used to indicate the moment the data was collected. The precision of the reference timestamp is on the order of microseconds.
[0029] Specifically, the on-chip system receives a frame of image data from the camera module and immediately assigns a reference timestamp to that frame of data via a hardware synchronization signal generator. This reference timestamp has an accuracy in the microsecond range. Simultaneously, the on-chip system receives a frame of acoustic data (corresponding to the same global hard trigger signal) from the audio-visual sensor and assigns that frame of data the exact same reference timestamp.
[0030] S150. Construct an audio-visual data pair based on the reference timestamp, image data, and acoustic data.
[0031] Among them, the audio-visual data pair is a structured data set consisting of image data and acoustic data with the same reference timestamp.
[0032] Specifically, image data and acoustic data with the same reference timestamp are combined to obtain an audio-visual data pair. For example, based on the reference timestamp, image data, and acoustic data, an audio-visual data pair with strict temporal matching (both timestamps are T0) is obtained, and the audio-visual data pair includes both image data and acoustic data. This achieves high-precision time synchronization between image data and acoustic data.
[0033] Optionally, the hardware synchronization signal generator is connected via physical lines to the hardware trigger pins of the camera module and the audio / video sensor module.
[0034] The physical circuitry consists of a hardware transmission line made of conductive dielectric material used to transmit global hard trigger signals. The hardware trigger pin is the physical pin used to receive the global hard trigger signals.
[0035] Specifically, the physical circuitry refers to the actual hardware connection lines linking the "hardware synchronization signal generator" (on-chip system side) and the "camera or audio-visual sensor" (on module side). The physical circuitry is not a software-virtual channel, but a visible and tangible physical carrier specifically designed to transmit the global hard trigger signal. The physical circuitry includes a first synchronization signal path and a second synchronization signal path. The first synchronization signal path connects the on-chip system to the hardware trigger pin of the camera module, and the second synchronization signal path connects the on-chip system to the hardware trigger pin of the audio-visual sensor module. The two lines run in parallel and independently, and are of equal length (or calibrated through delay compensation). The global hard trigger signal is synchronously transmitted to the hardware trigger pins of the camera module and the audio-visual sensor module via the physical circuitry. Upon detecting the global hard trigger signal, both modules immediately initiate data acquisition (image data and acoustic data).
[0036] Optionally, the global hard trigger signal includes a single pulse or a periodic pulse; wherein, the single pulse is used to instruct the camera module and the audio-visual sensor module to perform a single data acquisition; and the periodic pulse is used to instruct the camera module and the audio-visual sensor module to perform continuous data acquisition.
[0037] In this context, a single pulse is a single pulse. A periodic pulse is a pulse that is continuously output according to a preset period. Single data acquisition refers to the camera module and the audio-visual sensor module completing one image data and acoustic data acquisition operation at the same moment under the trigger of a single pulse. Continuous data acquisition refers to the camera module and the audio-visual sensor module sequentially completing multiple image data and acoustic data acquisition operations at the same moment corresponding to each pulse under the continuous triggering of periodic pulses.
[0038] Specifically, a single pulse is a single-time hardware trigger signal generated by a hardware synchronization signal generator, possessing a preset pulse width and effective level. The characteristic of a single pulse is that it stops immediately after being generated once, with no repeated output, and is used only to trigger the camera module and audio-visual sensor module to perform a single data acquisition operation. For example, when the user clicks the "Start Acquisition" button, the hardware synchronization signal generator outputs a single pulse, and the two modules simultaneously acquire one frame of image and one set of acoustic data, forming an audio-visual data pair before stopping acquisition. A periodic pulse is a repetitive hardware trigger signal generated by a hardware synchronization signal generator, possessing a preset pulse width, effective level, and fixed period. The characteristic of a periodic pulse is that it continuously outputs pulses according to a preset period until the acquisition task of that period is completed, and is used to trigger the camera module and audio-visual sensor module to perform continuous data acquisition operations. For example, when the user clicks the "Start Acquisition" button, the hardware synchronization signal generator outputs pulses at 10ms periods. Each time a pulse is output, the two modules synchronously acquire one set of data until a "Stop Acquisition" command is received, ultimately forming multiple sets of audio-visual data pairs sorted by timestamps.
[0039] This invention employs a hardware-based global hard trigger mechanism. A global hard trigger signal triggers the synchronous acquisition of visible light image data and acoustic data. By binding hardware-level triggering with timestamps, the simultaneous acquisition of visible light image data and acoustic data at the same time and associating them with the same reference timestamp ensures high-precision audio-visual synchronization. This physically eliminates the uncertainties caused by software scheduling and transmission delays, reducing the synchronization error between audio-visual data and visible light image data from tens of milliseconds to less than one millisecond, or even down to the microsecond level. The hardware-based synchronization mechanism is unaffected by operating system load, software interrupts, or other factors, exhibiting significantly higher stability than software synchronization schemes. This high-precision synchronization facilitates accurate capture and analysis of high-speed transient acoustic events, and can be widely applied in industrial inspection, security, autonomous driving, and scientific research.
[0040] Figure 2 This is a flowchart of another audio-visual synchronization acquisition method provided by an embodiment of the present invention. Based on the above embodiments, this embodiment optimizes the step of "determining the reference timestamps corresponding to image data and acoustic data through a hardware synchronization signal generator," providing an optional implementation scheme. For example... Figure 2 As shown, the method includes: S210, in response to a data acquisition request, generates a global hard trigger signal through a hardware synchronization signal generator.
[0041] S220: The global hard trigger signal is synchronously sent to the camera module and the audio-visual sensor module through the first synchronization signal path and the second synchronization signal path, respectively, so that the camera module can collect image data and the audio-visual sensor module can collect acoustic data at the same time.
[0042] S230: Receive image data collected by the camera module through the first interface, and receive acoustic data collected by the acoustic sensor module through the second interface.
[0043] S240. Determine the reference timestamps corresponding to the image data and acoustic data through a hardware synchronization signal generator.
[0044] S250. Construct an audio-visual data pair based on the reference timestamp, image data, and acoustic data.
[0045] Optionally, the hardware synchronization signal generator consists of the system-on-chip's general-purpose input / output ports and clock counter.
[0046] The general purpose input / output (GPIO) ports are general-purpose hardware pins integrated into the system-on-chip (SoC) and have programmable configuration capabilities. The clock counter is an internal hardware timing unit within the SoC. Driven by a reference clock signal, the clock counter automatically accumulates the number of clock pulses according to a preset counting rule (such as incrementing from 0).
[0047] Specifically, the general-purpose input / output ports and clock counter of the on-chip system are hardware-bound to obtain a hardware synchronization signal generator. This hardware synchronization signal generator then generates, transmits, and latches trigger signals; it can precisely control the trigger timing, stably transmit the trigger signal, and record the trigger time, providing core support for audio-visual synchronization.
[0048] Optionally, before responding to a data acquisition request, the method further includes: initializing and configuring the hardware synchronization signal generator after the on-chip system is powered on, controlling the hardware synchronization signal generator to generate a reference clock signal; and starting a clock counter; wherein the clock counter accumulates counts based on the reference clock signal.
[0049] Power-on refers to connecting the on-chip system to power, enabling it to transition from a "power-off sleep state" to a "workable state." Initialization configuration involves using hardware configuration control logic to preset the core operating parameters of the hardware synchronization signal generator. The reference clock signal is the time scale generated by the hardware synchronization signal generator, essentially the system's "atomic clock," with a fixed rhythm and extremely high precision. The clock counter's counting and the trigger signal's period are both based on it. The reference clock signal provides a unified time reference for the generation timing of global hard trigger signals and the calculation of the reference timestamp.
[0050] Specifically, after the on-chip system is powered on, the hardware synchronization signal generator is configured with operating rules to produce a precise time scale (reference clock signal). Then, the accompanying timer / counter is activated, allowing the counter to precisely accumulate counts in accordance with this time scale. This transforms the hardware synchronization signal generator from a "power-on ready" state to a "precisely operating according to rules" state, providing a stable time scale and clear operating rules for the entire subsequent process of "responding to data acquisition requests, generating synchronization trigger signals, latching count values, and generating timestamps." This lays the foundation for ultimately achieving high-precision synchronization of audio and video data.
[0051] Optionally, a hardware synchronization signal generator is used to determine the reference timestamps corresponding to the image data and acoustic data, including: obtaining the current count value of the clock counter; and determining the reference timestamps corresponding to the image data and acoustic data based on the current count value.
[0052] The current count value is the total number of pulses accumulated by the clock counter at the instant the global hard trigger signal triggers the acquisition.
[0053] Specifically, at the instant the global hard trigger signal triggers data acquisition, the hardware synchronization signal generator automatically latches the current count value of the clock counter and binds this current count value to the image and acoustic data acquired at that trigger moment. In other words, the current count value serves as the reference timestamp for the image and acoustic data acquired at that trigger moment. For example, a 1MHz reference clock increments by 1 every 1μs; if the current reading displayed by the clock counter (current count value 50000) is the reference timestamp, then it is 50000μs. Each current count value uniquely corresponds to a set of image and acoustic data acquired at the same time, ensuring that their reference timestamps are completely consistent, thus achieving synchronization matching.
[0054] Optionally, after constructing the acoustic-image data pair based on the reference timestamp, image data, and acoustic data, the method further includes: processing the acoustic data using an acoustic imaging algorithm to generate an acoustic-image map; and constructing an acoustic-image pair based on the acoustic-image map and image data with the same reference timestamp.
[0055] Acoustic imaging algorithms transform raw acoustic data into image-based data with spatial coordinate information and quantized acoustic intensity. An acoustic image is image-based data with spatial coordinate information and quantized acoustic intensity. An acoustic image pair is a structured image set consisting of a frame of image data (i.e., a visible light image) and a frame of acoustic image corresponding to the same reference timestamp.
[0056] Specifically, acoustic data is processed using acoustic imaging algorithms to generate an acoustic image with the same reference timestamp; based on the image data and acoustic image with the same reference timestamp, an acoustic-image pair is constructed. This forms a synchronously visualized image pair with both visual and acoustic dimensions, which can then be displayed simultaneously with the acoustic image and visible light image, laying the foundation for acoustic-image fusion and synchronous analysis.
[0057] Optionally, the system-on-chip may also include a memory, and further include: storing audio-visual data pairs in the memory.
[0058] The memory is used to store program instructions and collected data.
[0059] Specifically, after obtaining the audio-visual data pair, the audio-visual data pair will be completely stored in the memory for subsequent reading, processing or retrieval.
[0060] This invention generates a reference clock signal by initializing a hardware synchronization signal generator upon power-on. A clock counter accumulates counts based on this reference clock, ensuring a unified and stable hardware-level time scale for timestamp generation. A global hard trigger signal is simultaneously sent to the camera and audio-visual sensor through dual independent synchronization paths, triggering both to acquire data at the same moment. Combined with the timestamp generation method of "latching the counter value at the moment of triggering," the acquisition of image data and acoustic data is synchronized, and timestamps are bound from the same source, solving the problem of audio-visual misalignment (such as asynchronous image and sound, or mismatch between sound source location and visual target). The memory stores complete audio-visual data pairs, preserving the original image and acoustic data for subsequent data quality verification and algorithm optimization. While ensuring high data accuracy and reliability, this invention enhances engineering practicality and application value, and can be widely adapted to the audio-visual synchronization acquisition needs in fields such as industrial inspection, security, autonomous driving, and scientific research.
[0061] Figure 3 This is a schematic diagram of a synchronous audio-visual acquisition device provided in an embodiment of the present invention. This embodiment is applicable to situations requiring simultaneous acquisition of image and acoustic data. The device can be implemented in hardware and / or software and can be configured in an electronic device with corresponding data processing capabilities, such as a system-on-a-chip (SoC). The SoC includes a first interface, a second interface, and a hardware synchronization signal generator. The first interface is connected to a camera module via a first synchronization signal path, and the second interface is connected to an audio-visual sensor module via a second synchronization signal path. Figure 3 As shown, the device includes: The hard trigger signal generation module 310 is used to generate a global hard trigger signal in response to a data acquisition request through a hardware synchronization signal generator. The hard trigger signal sending module 320 is used to synchronously send the global hard trigger signal to the camera module and the audio-visual sensor module through the first synchronization signal path and the second synchronization signal path respectively, so that the camera module can collect image data and the audio-visual sensor module can collect acoustic data at the same time. The data acquisition module 330 is used to receive image data acquired by the camera module through the first interface and acoustic data acquired by the sound and image sensor module through the second interface. The timestamp determination module 340 is used to determine the reference timestamps corresponding to the image data and acoustic data through a hardware synchronization signal generator; The data acquisition module 350 is used to construct audio-visual data pairs based on the reference timestamp, image data, and acoustic data.
[0062] This invention employs a hardware-based global hard trigger mechanism. A global hard trigger signal triggers the synchronous acquisition of visible light image data and acoustic data. By binding hardware-level triggering with timestamps, the simultaneous acquisition of visible light image data and acoustic data at the same time and associating them with the same reference timestamp ensures high-precision audio-visual synchronization. This physically eliminates the uncertainties caused by software scheduling and transmission delays, reducing the synchronization error between audio-visual data and visible light image data from tens of milliseconds to less than one millisecond, or even down to the microsecond level. The hardware-based synchronization mechanism is unaffected by operating system load, software interrupts, or other factors, exhibiting significantly higher stability than software synchronization schemes. This high-precision synchronization facilitates accurate capture and analysis of high-speed transient acoustic events, and can be widely applied in industrial inspection, security, autonomous driving, and scientific research.
[0063] Optionally, the hardware synchronization signal generator consists of the system-on-chip's general-purpose input / output ports and clock counter.
[0064] Optionally, the device further includes: an initialization configuration module, used to initialize and configure the hardware synchronization signal generator after the on-chip system is powered on, control the hardware synchronization signal generator to generate a reference clock signal, and start a clock counter; wherein the clock counter accumulates counts based on the reference clock signal.
[0065] Optionally, the timestamp determination module 340 includes: The current count value determination unit is used to obtain the current count value of the clock counter; The reference timestamp determination unit is used to determine the reference timestamps corresponding to the image data and acoustic data based on the current count value.
[0066] Optionally, the apparatus further includes: an image pair construction module for processing acoustic data using an acoustic imaging algorithm to generate an acoustic image; and constructing an acoustic image pair based on the acoustic image and image data with the same reference timestamp.
[0067] Optionally, the hardware synchronization signal generator is connected via physical lines to the hardware trigger pins of the camera module and the audio / video sensor module.
[0068] Optionally, the global hard trigger signal includes a single pulse or a periodic pulse; wherein, the single pulse is used to instruct the camera module and the audio-visual sensor module to perform a single data acquisition; and the periodic pulse is used to instruct the camera module and the audio-visual sensor module to perform continuous data acquisition.
[0069] Optionally, the system-on-chip also includes a memory, and the device further includes a data storage module for storing audio-visual data pairs in the memory.
[0070] The audio-visual synchronous acquisition device provided in the embodiments of the present invention can execute the audio-visual synchronous acquisition method provided in any embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the method.
[0071] According to embodiments of the present invention, the present invention also provides an electronic device, a readable storage medium, and a computer program product.
[0072] Figure 4 A schematic diagram of an electronic device 10, which can be used to implement embodiments of the present invention, is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.
[0073] like Figure 4 As shown, the electronic device 10 includes at least one processor 11 and a memory, such as a read-only memory (ROM) 12 or a random access memory (RAM) 13, communicatively connected to the at least one processor 11. The memory stores computer programs executable by the at least one processor. The processor 11 can perform various appropriate actions and processes based on the computer program stored in the ROM 12 or loaded from storage unit 18 into the RAM 13. The RAM 13 can also store various programs and data required for the operation of the electronic device 10. The processor 11, ROM 12, and RAM 13 are interconnected via a bus 14. An input / output (I / O) interface 15 is also connected to the bus 14.
[0074] Multiple components in electronic device 10 are connected to I / O interface 15, including: input unit 16, such as keyboard, mouse, etc.; output unit 17, such as various types of displays, speakers, etc.; storage unit 18, such as disk, optical disk, etc.; and communication unit 19, such as network card, modem, wireless transceiver, etc. Communication unit 19 allows electronic device 10 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0075] Processor 11 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. Processor 11 performs the various methods and processes described above, such as the audio-visual synchronization acquisition method.
[0076] In some embodiments, the audio-visual synchronization acquisition method may be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and / or installed on electronic device 10 via ROM 12 and / or communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the audio-visual synchronization acquisition method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the audio-visual synchronization acquisition method by any other suitable means (e.g., by means of firmware).
[0077] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.
[0078] Computer programs used to implement the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the processor, the computer programs cause the functions / operations specified in the flowcharts and / or block diagrams to be performed. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0079] In the context of this invention, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination thereof. Alternatively, a computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0080] To provide interaction with a user, the systems and techniques described herein can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).
[0081] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or computing systems that include middleware components (e.g., application servers), or computing systems that include frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.
[0082] A computing system can include clients and servers. Clients and servers are generally geographically separated and typically interact via communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a host product within the cloud computing service system to address the shortcomings of traditional physical hosts and virtual private servers, such as high management difficulty and weak business scalability.
[0083] It should be understood that the various forms of processes shown above can be used, with steps reordered, added, or deleted. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this invention can be achieved, and this is not limited herein.
[0084] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.
Claims
1. A method for synchronous audio-visual acquisition, characterized in that, The method is applied to a system-on-a-chip (SoC), which includes a first interface, a second interface, and a hardware synchronization signal generator. The first interface is connected to a camera module via a first synchronization signal path, and the second interface is connected to an audio-visual sensor module via a second synchronization signal path. The method includes: In response to a data acquisition request, a global hard trigger signal is generated through the hardware synchronization signal generator; The global hard trigger signal is synchronously sent to the camera module and the acoustic sensor module through the first synchronization signal path and the second synchronization signal path, respectively, so that the camera module can acquire image data and the acoustic sensor module can acquire acoustic data at the same time. The system receives image data collected by the camera module through the first interface and acoustic data collected by the sound and image sensor module through the second interface. The reference timestamps corresponding to the image data and the acoustic data are determined by the hardware synchronization signal generator. Based on the reference timestamp, the image data, and the acoustic data, an audio-visual data pair is constructed.
2. The method according to claim 1, characterized in that, The hardware synchronization signal generator consists of a system-on-a-chip general-purpose input / output port and a clock counter.
3. The method according to claim 2, characterized in that, Prior to responding to the data acquisition request, the method further includes: When the system-on-chip is powered on, the hardware synchronization signal generator is initialized and configured, and the hardware synchronization signal generator is controlled to generate a reference clock signal. Start the clock counter; wherein the clock counter accumulates counts based on the reference clock signal.
4. The method according to claim 2, characterized in that, The step of determining the reference timestamps corresponding to the image data and the acoustic data through the hardware synchronization signal generator includes: Obtain the current count value of the clock counter; Based on the current count value, determine the reference timestamps corresponding to the image data and the acoustic data.
5. The method according to claim 1, characterized in that, After constructing the audio-visual data pair based on the reference timestamp, the image data, and the acoustic data, the method further includes: The acoustic data is processed using an acoustic imaging algorithm to generate an acoustic image; Based on the aforementioned audio-visual image and image data with the same reference timestamp, an audio-visual image pair is constructed.
6. The method according to claim 1, characterized in that, The hardware synchronization signal generator is connected to the hardware trigger pins of the camera module and the audio-visual sensor module via physical lines.
7. The method according to claim 1, characterized in that, The global hard trigger signal includes a single pulse or a periodic pulse; wherein, the single pulse is used to instruct the camera module and the audio-visual sensor module to perform a single data acquisition; the periodic pulse is used to instruct the camera module and the audio-visual sensor module to perform continuous data acquisition.
8. The method according to claim 1, characterized in that, The on-chip system further includes a memory, and the method further includes: The audio-visual data pairs are stored in the memory.
9. A synchronous audio-visual acquisition device, characterized in that, Configured in a system-on-a-chip, the system-on-a-chip includes a first interface, a second interface, and a hardware synchronization signal generator. The first interface is connected to a camera module via a first synchronization signal path, and the second interface is connected to an audio-visual sensor module via a second synchronization signal path. The device includes: The hard trigger signal generation module is used to generate a global hard trigger signal in response to a data acquisition request through the hardware synchronization signal generator; The hard trigger signal sending module is used to synchronously send the global hard trigger signal to the camera module and the audio-visual sensor module through the first synchronization signal path and the second synchronization signal path respectively, so that the camera module can acquire image data and the audio-visual sensor module can acquire acoustic data at the same time. The data acquisition module is used to receive image data acquired by the camera module through the first interface and acoustic data acquired by the sound and image sensor module through the second interface; The timestamp determination module is used to determine the reference timestamps corresponding to the image data and the acoustic data through the hardware synchronization signal generator; The data pair acquisition module is used to construct an audio-visual data pair based on the reference timestamp, the image data, and the acoustic data.
10. An electronic device, characterized in that, The electronic device includes: At least one processor; and a memory communicatively connected to the at least one processor; The memory stores a computer program that can be executed by the at least one processor, which is then executed by the at least one processor to enable the at least one processor to perform the audio-visual synchronization acquisition method according to any one of claims 1-8.