Data processing method and related apparatus
By performing multi-domain transformation and information fusion of feature extraction networks on electromagnetic signals, the problem of low accuracy in electromagnetic signal environmental perception is solved, achieving higher perception accuracy and robustness.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2025-07-03
- Publication Date
- 2026-07-02
AI Technical Summary
In existing technologies, when using electromagnetic signals for environmental perception, the information carried is limited, resulting in low accuracy of the perception results.
By transforming electromagnetic signals into different domains using various signal processing methods, processing multiple electromagnetic signals using the same feature extraction network, and combining machine learning models for information fusion and interaction, the accuracy and robustness of the perception results can be improved.
By using multi-view and multi-modal signal processing, the accuracy and robustness of environmental perception are improved, and the ability to interpret electromagnetic signals is enhanced.
Smart Images

Figure CN2025106775_02072026_PF_FP_ABST
Abstract
Description
A data processing method and related apparatus
[0001] This application claims priority to Chinese Patent Application No. 202411644845.4, filed on November 15, 2024, entitled "A Data Processing Method and Related Apparatus", the entire contents of which are incorporated herein by reference. Technical Field
[0002] This application relates to the field of artificial intelligence (AI) technology, and more particularly to a data processing method, a computer-readable storage medium, and a computer program product. Background Technology
[0003] Artificial intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.
[0004] In environmental perception tasks, the external environment can be sensed through electromagnetic signals such as those from optical fibers, WiFi channels, and wireless base station communication channels. For example, the state of scattering objects in the external environment can be sensed by analyzing the scattered signals from optical fibers and the pilot echo signals from WiFi or wireless air interfaces, thereby enabling perception tasks such as classifying and detecting external events.
[0005] In existing technologies, when using electromagnetic signals for environmental perception, the accuracy of the environmental perception results is low because certain types of electromagnetic signals carry limited information or only carry information required for environmental perception in a specific scenario. Summary of the Invention
[0006] In a first aspect, this application provides a data processing method, the method comprising: acquiring electromagnetic signals collected from an environment; obtaining multiple processed electromagnetic signals based on the electromagnetic signals through multiple different signal processing methods, wherein the signal processing methods are methods for performing signal transformation in at least one domain of the frequency domain, spatial domain, or time domain; and obtaining a perception result of the environment based on the multiple processed electromagnetic signals through a machine learning model, wherein the machine learning model includes a feature extraction network (e.g., a backbone network), and the feature extraction network used in processing the multiple processed electromagnetic signals through the machine learning model is the same.
[0007] Here, "same" can mean that the feature extraction networks are completely identical or partially identical, such as the networks that are more important to the processing result being the same. Alternatively, it can be described as the feature extraction networks used by the machine learning model to process the multiple processed electromagnetic signals being the same, or as the feature extraction networks using the machine learning model to process the multiple processed electromagnetic signals sharing parameters, meaning that the same feature extraction network is used to extract features from the multiple processed electromagnetic signals.
[0008] In this embodiment, various signal processing methods can be used to obtain different perspectives of the original electromagnetic signal. Performing multiple transformations on the original signal is equivalent to characterizing it from different perspectives, thus obtaining a richer characterization of the original channel sensing signal. This facilitates the analysis and modeling of the original signal, thereby improving the accuracy of subsequent environmental perception results. Furthermore, using the same feature extraction network to process multiple electromagnetic signals allows for the fusion and interaction of different electromagnetic signals. This enables the use of information from one electromagnetic signal to enhance the interpretation of another, thereby improving overall accuracy and robustness.
[0009] The above steps can be the model inference process or the forward propagation process of model training. Furthermore, the machine learning model can be updated based on the perception results and corresponding labels.
[0010] In one possible implementation, the signal processing method includes one or more of the following: short-time variance calculation, power spectral density calculation, short-time Fourier transform, and signal statistics methods.
[0011] Due to the sparsity and high dimensionality of electromagnetic channel sensing signals, current small-model schemes often struggle to classify certain types of intrusion signals with high accuracy. In this case, the advantage of performing multiple transformations on the original signal is that, through time-frequency-spatial feature enhancement transformations, the characteristics of the current signal can be perceived from multiple perspectives. For example, intensity signals are very good at distinguishing subtle intrusion signals, while phase signals perform better in classification tasks for most signals.
[0012] In one possible implementation, the method further includes: mapping each of the plurality of processed electromagnetic signals to a token representation to obtain a plurality of token representations; the step of obtaining the perception result of the environment based on the plurality of processed electromagnetic signals through a machine learning model includes: obtaining the perception result of the environment based on the plurality of token representations through a machine learning model.
[0013] In one possible implementation, the machine learning model further includes a task network; the step of obtaining the perception result of the environment through the machine learning model based on the multiple token representations includes: performing information interaction and fusion between the multiple token representations through the feature extraction network to obtain multiple feature representations; and obtaining the perception result of the environment through the task network based on the multiple feature representations.
[0014] In one possible implementation, the feature extraction network is a transformer model.
[0015] The Transformer model can use an attention mechanism to automatically infer the feature interaction relationships between tokens across multiple modalities (time-frequency-space perspective), thereby achieving better inference accuracy.
[0016] In one possible implementation, the machine learning model is a visual language model, and the method further includes: mapping the plurality of processed electromagnetic signals into image data; and obtaining the perception result of the environment through the visual language model based on the plurality of token representations of the plurality of processed electromagnetic signals and the image data.
[0017] This application embodiment can utilize a pre-trained visual language model to obtain basic capabilities for visual features and logical reasoning. Then, the multi-view electromagnetic sensing signals are visualized, and the visual language model's ability to perceive the visualized signals, combined with the original multi-view electromagnetic sensing signals, is used to collaboratively perform electromagnetic sensing signal perception, contextual reasoning, and other tasks.
[0018] In one possible implementation, obtaining the environmental perception result through the visual language model based on the multiple token representations of the multiple processed electromagnetic signals and the image data includes: obtaining a prompt, the prompt indicating one or more of the following: performing signal analysis on the electromagnetic signals, performing signal analysis on the multiple processed electromagnetic signals, and determining the environmental perception result;
[0019] Based on the prompts, the multiple token representations of the multiple processed electromagnetic signals, and the image data, the perception result of the environment is obtained through the visual language model.
[0020] In one possible implementation, the method further includes: acquiring sensor signals of a different type from electromagnetic signals collected for the environment; obtaining processed sensor signals based on the sensor signals using a signal processing method; and obtaining the perception result of the environment based on the plurality of processed electromagnetic signals using a machine learning model, which includes: obtaining the perception result of the environment based on the plurality of processed electromagnetic signals and the processed sensor signals using a machine learning model, wherein the feature extraction network (e.g., backbone network) used in processing the plurality of processed electromagnetic signals and the processed sensor signals using the machine learning model is the same.
[0021] In this embodiment, features can also be extracted from data of different modalities (e.g., fiber optic data and other sensor data such as temperature and humidity) through the same feature extraction network (e.g., a backbone network). This multimodal fusion capability enables the use of information from one modality to enhance the interpretation of another modality, thereby improving the overall accuracy and robustness.
[0022] In one possible implementation, the method further includes: pre-training the machine learning model based on the perception result.
[0023] For data from different modalities (or, as can be called domains), a domain-specific multi-view transformation is performed on the channel measurement signals of each domain. After obtaining multi-view data of electromagnetic measurement signals in each domain, the multi-view token sequence of cross-domain electromagnetic measurement signals is fed into the model for training. Through cross-domain training, the model learns the commonalities of cross-domain signal multi-view interaction, thereby improving the signal perception and processing capabilities of each domain.
[0024] In one possible implementation, obtaining the perception result of the environment through the visual language model based on the prompt, the multiple token representations of the multiple processed electromagnetic signals, and the image data includes: obtaining the analysis result of the electromagnetic signals in the time dimension or spatial dimension, and the perception result of the environment, based on the prompt, the multiple token representations of the multiple processed electromagnetic signals, and the image data, through the visual language model.
[0025] For example, taking intrusion detection as an example, a language-like tool (natural language / vibration signal / audio / formatted text, etc.) can be used to perform logical analysis of the temporal or spatial context of the signal by the model, and then determine the temporal and spatial behavioral state (intrusion, non-intrusion) of the perceived signal.
[0026] In one possible implementation, the prompt may further include at least one of the following: a prediction of the perception result of the environment, and a description of the environment.
[0027] Before the model performs analysis or monitoring: it can interact with prompts input by external users, allowing engineers to input pre-judgments of the perceived results. Additionally, conditional statements (e.g., ambient temperature and humidity, external construction conditions, weather conditions, etc.) can be input to enable the model to provide more accurate results. After the model performs analysis or monitoring: it can report to users on the presence of risks or anomalies within a specific time or spatial span, for example, in the monitored scenario.
[0028] In one possible implementation, the perception result of the environment is specifically the perception result of the environment in the spatial dimension or the temporal dimension.
[0029] Secondly, this application provides a data processing apparatus, the apparatus comprising:
[0030] The acquisition module is used to acquire electromagnetic signals collected from the environment.
[0031] A signal processing module is used to obtain multiple processed electromagnetic signals through multiple different signal processing devices, wherein the signal processing devices are devices that perform signal transformation in at least one domain, namely the frequency domain, spatial domain, or time domain.
[0032] A machine learning module is used to obtain the perception result of the environment based on the plurality of processed electromagnetic signals through a machine learning model, wherein the machine learning model includes a feature extraction network (e.g., a backbone network), and the feature extraction network used to process the plurality of processed electromagnetic signals through the machine learning model is the same.
[0033] In one possible implementation, the plurality of different signal processing devices include one or more of the following:
[0034] Devices for short-time variance calculation, power spectral density calculation, short-time Fourier transform, and signal statistics.
[0035] In one possible implementation, the machine learning module is used for:
[0036] Each of the multiple processed electromagnetic signals is mapped to a token representation to obtain multiple token representations;
[0037] Based on the multiple token representations, a machine learning model is used to obtain the perception results of the environment.
[0038] In one possible implementation, the machine learning model further includes: a task network; the machine learning module is used for:
[0039] Based on the multiple token representations, the feature extraction network performs information interaction and fusion among the multiple token representations to obtain multiple feature representations;
[0040] Based on the multiple feature representations, the perception results of the environment are obtained through the task network.
[0041] In one possible implementation, the feature extraction network is a transformer model.
[0042] In one possible implementation, the machine learning model is a visual language model, and the apparatus further includes:
[0043] The mapping module is used to map the multiple processed electromagnetic signals into image data;
[0044] The machine learning module is used for:
[0045] Based on the multiple token representations of the multiple processed electromagnetic signals and the image data, the perception result of the environment is obtained through the visual language model.
[0046] In one possible implementation, the machine learning module is used for:
[0047] Obtain a prompt, which indicates one or more of the following: perform signal analysis on the electromagnetic signal, perform signal analysis on the multiple processed electromagnetic signals, and determine the perception result of the environment;
[0048] Based on the prompts, the multiple token representations of the multiple processed electromagnetic signals, and the image data, the perception result of the environment is obtained through the visual language model.
[0049] In one possible implementation, the acquisition module is further configured to:
[0050] Acquire sensor signals that are different from electromagnetic signals collected from the environment;
[0051] Based on the sensor signal, a processed sensor signal is obtained through a signal processing method;
[0052] The machine learning module is used for:
[0053] Based on the multiple processed electromagnetic signals and the processed sensor signals, a machine learning model is used to obtain the perception result of the environment, wherein the feature extraction network (e.g., backbone network) used in processing the multiple processed electromagnetic signals and the processed sensor signals through the machine learning model is the same.
[0054] In one possible implementation, the machine learning module is used for:
[0055] Based on the prompts, the multiple token representations of the multiple processed electromagnetic signals, and the image data, the visual language model is used to obtain the analysis results of the electromagnetic signals in the time or spatial dimensions, as well as the perception results of the environment.
[0056] In one possible implementation, the prompt may further include at least one of the following: a prediction of the perception result of the environment, and a description of the environment.
[0057] In one possible implementation, the perception result of the environment is specifically the perception result of the environment in the spatial dimension or the temporal dimension.
[0058] In one possible implementation, the machine learning module is further used for:
[0059] Based on the perception results, the machine learning model is pre-trained.
[0060] Thirdly, this application provides a data processing apparatus, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory to perform the methods described in the second aspect above and any optional methods thereunder, or the methods described in the second aspect above and any optional methods thereunder.
[0061] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program that, when run on a computer, causes the computer to perform the methods described in the second aspect and any optional methods thereof, or the methods described in the third aspect and any optional methods thereof.
[0062] Fifthly, embodiments of this application provide a computer program that, when run on a computer, causes the computer to perform the methods described in the second aspect above and any of its alternatives.
[0063] Sixthly, this application provides a chip system including a processor for supporting the implementation of the functions involved in the foregoing aspects, such as transmitting or processing data or information involved in the foregoing methods. In one possible design, the chip system further includes a memory for storing program instructions and data necessary for the execution device or training device. This chip system may be composed of chips or may include chips and other discrete devices. Attached Figure Description
[0064] To more clearly illustrate the technical methods of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly described below.
[0065] Figure 1 is a schematic diagram of an application architecture provided in an embodiment of this application;
[0066] Figures 2 to 7 are schematic diagrams of an application architecture provided in an embodiment of this application;
[0067] Figure 8 is a schematic diagram of a data processing method provided in an embodiment of this application;
[0068] Figure 9 is a schematic diagram of a data processing method provided in an embodiment of this application;
[0069] Figure 10 is a schematic diagram of a data processing method provided in an embodiment of this application;
[0070] Figure 11 is a schematic diagram of an application architecture according to an embodiment of this application;
[0071] Figure 12 is a schematic diagram of the structure of an image processing device provided in an embodiment of this application;
[0072] Figure 13 is a schematic diagram of a device provided in an embodiment of this application;
[0073] Figure 14 is a schematic diagram of a device provided in an embodiment of this application;
[0074] Figure 15 is a schematic diagram of a chip provided in an embodiment of this application. Detailed Implementation
[0075] The embodiments of this application are described below with reference to the accompanying drawings. The terminology used in the implementation section of this application is for explaining specific embodiments only and is not intended to limit the scope of this application.
[0076] The embodiments of this application will now be described with reference to the accompanying drawings. Those skilled in the art will recognize that, with technological advancements and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are equally applicable to similar technical problems.
[0077] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such terms are interchangeable where appropriate; this is merely a way of distinguishing objects with the same attributes in the embodiments of this application. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion, so that a process, method, system, product, or apparatus that comprises a series of elements is not necessarily limited to those elements, but may include other elements not explicitly listed or inherent to those processes, methods, products, or apparatuses.
[0078] First, the overall workflow of an artificial intelligence system is described, as shown in Figure 1. Figure 1 is a structural diagram of the main framework of artificial intelligence. The framework is then elaborated on from two dimensions: the "Intelligent Information Chain" (horizontal axis) and the "IT Value Chain" (vertical axis). The "Intelligent Information Chain" reflects a series of processes from data acquisition to processing. For example, it could be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, and intelligent execution and output. In this process, data undergoes a condensation process of "data—information—knowledge—wisdom." The "IT Value Chain" reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (provided and processed by technology) to the industrial ecosystem of the system.
[0079] (1) Infrastructure
[0080] Infrastructure provides computing power to support artificial intelligence systems, enabling communication with the external world and providing support through a basic platform. This communication occurs through sensors; computing power is provided by intelligent chips (hardware acceleration chips such as CPUs, NPUs, GPUs, ASICs, and FPGAs); and the basic platform includes distributed computing frameworks and related platform guarantees and support, which may include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to acquire data, and this data is provided to intelligent chips in the distributed computing system provided by the basic platform for computation.
[0081] (2) Data
[0082] The data at the next layer of infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, text, and IoT data from traditional devices, including business data from existing systems and sensor data such as force, displacement, liquid level, temperature, and humidity.
[0083] (3) Data processing
[0084] Data processing typically includes methods such as data training, machine learning, deep learning, search, reasoning, and decision-making.
[0085] Among them, machine learning and deep learning can perform intelligent information modeling, extraction, preprocessing, and training on data, including symbolization and formalization.
[0086] Reasoning refers to the process in which, in a computer or intelligent system, the machine thinks and solves problems by simulating human intelligent reasoning, based on reasoning control strategies and using formalized information. Typical functions include search and matching.
[0087] Decision-making refers to the process of making decisions based on intelligent information after reasoning, and it typically provides functions such as classification, sorting, and prediction.
[0088] (4) General ability
[0089] After the data processing mentioned above, the results of the data processing can be used to form some general capabilities, such as algorithms or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.
[0090] (5) Smart Products and Industry Applications
[0091] Intelligent products and industry applications refer to products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Their application areas mainly include: intelligent terminals, intelligent transportation, intelligent healthcare, autonomous driving, smart cities, etc.
[0092] This application can be applied, but is not limited to, to the field of natural language processing in the field of artificial intelligence. Specifically, it can be applied to neural network search and neural network inference in the field of natural language processing. The following will introduce several application scenarios that have been implemented in products.
[0093] To better understand the solutions of the embodiments of this application, the possible application scenarios of the embodiments of this application will be briefly introduced below with reference to Figures 2 to 5.
[0094] I. Environmental Awareness Applications
[0095] The product form of this application embodiment can be an environmental awareness application. Environmental awareness applications can run on terminal devices or cloud-based servers.
[0096] In one possible implementation, referring to Figure 2, an environment-aware application can perform an environment-aware task and obtain the processing result.
[0097] For example, the external environment can be sensed through communication optical fibers, WiFi channels, and wireless base station communication channels. This is mainly achieved by analyzing the scattered signals from optical fibers and the pilot echo signals from WiFi or wireless air interfaces to sense the state of scattering objects in the external environment, and then performing sensing tasks such as classifying and detecting external events.
[0098] In one possible implementation, a user can open an environmental awareness application installed on a terminal device and input data such as electromagnetic signals. The environmental awareness application can process the input electromagnetic signals and other data using a model trained by the method provided in the embodiments of this application, or by the method provided in the embodiments of this application, and present the processing results to the user (the presentation method may include, but is not limited to, displaying, playing, saving, uploading to the cloud, etc.).
[0099] In one possible implementation, a user can open an environmental awareness application installed on a terminal device and input data such as electromagnetic signals. The environmental awareness application can then send the electromagnetic signals and other data to a cloud-based server. The cloud-based server processes the input electromagnetic signals and other data using a model trained by the method provided in this application embodiment and sends the processing results back to the terminal device. The terminal device can then present the processing results to the user (the presentation method may include, but is not limited to, displaying, playing, saving, or uploading to the cloud).
[0100] The following sections will introduce the environment-aware application in this application from the perspectives of functional architecture and product architecture that implements the functions.
[0101] Referring to Figure 2, which is a schematic diagram of the functional architecture of the environment-aware application in an embodiment of this application:
[0102] In one possible implementation, as shown in FIG2, the environment-aware application 102 may receive input parameters 101 (e.g., data including electromagnetic signals) and generate a processing result 103. The environment-aware application 102 may execute on at least one computer system (for example) and includes computer code that, when executed by one or more computers, causes the computers to execute a model trained by the method provided in the embodiments of this application.
[0103] Referring to Figure 3, which is a schematic diagram of the entity architecture of the environment-aware application in the embodiment of this application:
[0104] Referring to Figure 3, which illustrates a system architecture, the system may include a terminal 100 and a server 200. The server 200 may include one or more servers (Figure 3 uses one server as an example), and the server 200 can provide environmental awareness functionality for one or more terminals.
[0105] The terminal 100 may have an environmental awareness application installed, or a webpage related to the environmental awareness function opened. The application and webpage can provide an interface. The terminal 100 can receive relevant parameters input by the user on the environmental awareness function interface and send the parameters to the server 200. The server 200 can obtain the processing result based on the received parameters and return the processing result to the terminal 100.
[0106] It should be understood that in some optional implementations, the terminal 100 can also complete the action of obtaining the processing result based on the received parameters on its own, without the need for the server to cooperate. This application embodiment is not limited to this.
[0107] The product form of terminal 100 in Figure 3 is described below;
[0108] The terminal 100 in this application embodiment can be a mobile phone, tablet computer, wearable device, vehicle device, augmented reality (AR) / virtual reality (VR) device, laptop computer, ultra-mobile personal computer (UMPC), netbook, personal digital assistant (PDA), etc., and this application embodiment does not impose any restrictions on it.
[0109] Figure 4 shows a schematic diagram of an optional hardware structure for terminal 100.
[0110] Referring to Figure 4, terminal 100 may include components such as a radio frequency unit 110, a memory 120, an input unit 130, a display unit 140, a camera 150 (optional), an audio circuit 160 (optional), a speaker 161 (optional), a microphone 162 (optional), a processor 170, an external interface 180, and a power supply 190. Those skilled in the art will understand that Figure 4 is merely an example of a terminal or multi-functional device and does not constitute a limitation on the terminal or multi-functional device; it may include more or fewer components than illustrated, or combine certain components, or use different components.
[0111] The input unit 130 can be used to receive input numerical or character information, and to generate key signal inputs related to user settings and function control of the portable multi-functional device. Specifically, the input unit 130 may include a touchscreen 131 (optional) and / or other input devices 132. The touchscreen 131 can collect touch operations performed by the user on or near it (such as operations performed by the user using fingers, knuckles, styluses, or any suitable object on or near the touchscreen), and drive the corresponding connection devices according to a pre-set program. The touchscreen can detect the user's touch actions, convert the touch actions into touch signals and send them to the processor 170, and can receive and execute commands sent by the processor 170; the touch signal includes at least touch point coordinate information. The touchscreen 131 can provide an input interface and an output interface between the terminal 100 and the user. In addition, various types of touchscreens, such as resistive, capacitive, infrared, and surface acoustic wave, can be used to implement the touchscreen. Besides the touchscreen 131, the input unit 130 may also include other input devices. Specifically, other input devices 132 may include, but are not limited to, one or more of the following: physical keyboard, function keys (such as volume control buttons, power buttons, etc.), trackball, mouse, joystick, etc.
[0112] Other input devices 132 can receive input images.
[0113] The display unit 140 can be used to display information input by the user or information provided to the user, various menus of the terminal 100, interactive interfaces, file display, and / or playback of any multimedia file. In this embodiment, the display unit 140 can be used to display the interface of an environment-aware application, processing results, etc.
[0114] The memory 120 can be used to store instructions and data. The memory 120 may primarily include an instruction storage area and a data storage area. The data storage area can store various types of data, such as multimedia files and text. The instruction storage area can store software units such as operating systems, applications, and instructions required for at least one function, or subsets or extended sets thereof. It may also include non-volatile random access memory. It provides the processor 170 with hardware, software, and data resources for managing the computing device, supporting control software and applications. It is also used for storing multimedia files, as well as storing running programs and applications.
[0115] The processor 170 is the control center of the terminal 100. It connects various parts of the terminal 100 via various interfaces and lines. By running or executing instructions stored in the memory 120 and calling data stored in the memory 120, it performs various functions and processes data of the terminal 100, thereby controlling the terminal device as a whole. Optionally, the processor 170 may include one or more processing units; preferably, the processor 170 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the modem processor may not be integrated into the processor 170. In some embodiments, the processor and memory can be implemented on a single chip; in some embodiments, they can also be implemented separately on independent chips. The processor 170 can also be used to generate corresponding operation control signals, send them to the corresponding components of the computing processing device, read and process data in the software, especially read and process data and programs in the memory 120, so that the various functional modules therein perform corresponding functions, thereby controlling the corresponding components to act according to the instructions.
[0116] The memory 120 can be used to store software code related to the data processing method, and the processor 170 can execute the steps of the chip's data processing method, and can also schedule other units (such as the above-mentioned input unit 130 and display unit 140) to achieve the corresponding functions.
[0117] The radio frequency unit 110 (optional) can be used for receiving and transmitting signals during information transmission or calls. For example, it can receive downlink information from the base station and process it for the processor 170; additionally, it can transmit uplink data to the base station. Typically, the RF circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low-noise amplifier (LNA), a duplexer, etc. Furthermore, the radio frequency unit 110 can also communicate wirelessly with network devices and other devices. This wireless communication can use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), etc.
[0118] In this embodiment of the application, the radio frequency unit 110 can send an image to the server 200 and receive the processing result sent by the server 200.
[0119] It should be understood that the radio frequency unit 110 is optional and can be replaced with other communication interfaces, such as a network port.
[0120] The terminal 100 also includes a power supply 190 (such as a battery) that supplies power to various components. Preferably, the power supply can be logically connected to the processor 170 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system.
[0121] Terminal 100 also includes an external interface 180, which can be a standard Micro USB interface or a multi-pin connector, which can be used to connect terminal 100 to other devices for communication or to connect a charger to charge terminal 100.
[0122] Although not shown, terminal 100 may also include a flash, a wireless fidelity (WiFi) module, a Bluetooth module, and sensors with various functions, which will not be described in detail here. Some or all of the methods described below can be applied to terminal 100 as shown in Figure 4.
[0123] The product form of server 200 in Figure 3 is described below;
[0124] Figure 5 provides a schematic diagram of the structure of a server 200. As shown in Figure 5, the server 200 includes a bus 201, a processor 202, a communication interface 203, and a memory 204. The processor 202, the memory 204, and the communication interface 203 communicate with each other via the bus 201.
[0125] Bus 201 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, only one thick line is used in Figure 5, but this does not indicate that there is only one bus or one type of bus.
[0126] The processor 202 can be any one or more of the following processors: central processing unit (CPU), graphics processing unit (GPU), microprocessor (MP), or digital signal processor (DSP).
[0127] Memory 204 may include volatile memory, such as random access memory (RAM). Memory 204 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD), or solid state drive (SSD).
[0128] The memory 204 can be used to store software code related to the data processing method, and the processor 202 can execute the steps of the chip's data processing method, and can also schedule other units to achieve corresponding functions.
[0129] It should be understood that the aforementioned terminal 100 and server 200 can be centralized or distributed devices. The processors (e.g., processor 170 and processor 202) in the aforementioned terminal 100 and server 200 can be hardware circuits (such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), general-purpose processors, digital signal processors (DSPs), microprocessors or microcontrollers, etc.) or combinations of these hardware circuits. For example, the processor can be a hardware system with instruction execution capabilities, such as a CPU or DSP, or a hardware system without instruction execution capabilities, such as an ASIC or FPGA, or a combination of the aforementioned hardware systems without instruction execution capabilities and hardware systems with instruction execution capabilities.
[0130] It should be understood that the steps related to the model inference process in the embodiments of this application involve AI-related operations. When performing AI operations, the instruction execution architecture of the terminal device and the server is not limited to the processor-memory architecture described above. The system architecture provided in the embodiments of this application will be described in detail below with reference to Figure 6.
[0131] Figure 6 is a schematic diagram of the system architecture provided in an embodiment of this application. As shown in Figure 6, the system architecture 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550, and a data acquisition device 560.
[0132] The execution device 510 includes a calculation module 511, an I / O interface 512, a preprocessing module 513, and a preprocessing module 514. The calculation module 511 may include a target model / rule 501, while the preprocessing modules 513 and 514 are optional.
[0133] Among them, the execution device 510 can be the terminal device or server of the aforementioned environment-aware application.
[0134] The data acquisition device 560 is used to collect training samples. Training samples can be images, etc. After collecting the training samples, the data acquisition device 560 stores these training samples in the database 530.
[0135] The training device 520 can maintain training samples in the database 530 to obtain the target model / rule 501 from the neural network to be trained (e.g., the graph neural network in the embodiments of this application).
[0136] It should be understood that the training device 520 can perform a pre-training process on the neural network to be trained based on the training samples maintained in the database 530, or fine-tune the model based on the pre-training.
[0137] It should be noted that in practical applications, the training samples maintained in database 530 may not all come from the data acquisition device 560; they may also be received from other devices. Furthermore, it should be noted that training device 520 may not necessarily train the target model / rule 501 entirely based on the training samples maintained in database 530; it may also obtain training samples from the cloud or other sources for model training. The above description should not be construed as limiting the embodiments of this application.
[0138] The target model / rule 501 trained by the training device 520 can be applied to different systems or devices, such as the execution device 510 shown in Figure 6. The execution device 510 can be a terminal, such as a mobile terminal, tablet computer, laptop computer, augmented reality (AR) / virtual reality (VR) device, vehicle terminal, etc., or it can be a server, etc.
[0139] Specifically, the training device 520 can transfer the trained model to the execution device 510.
[0140] In Figure 6, the execution device 510 is configured with an input / output (I / O) interface 512 for data interaction with external devices. Users can input data (such as images in the embodiments of this application) into the I / O interface 512 through the client device 540.
[0141] Preprocessing modules 513 and 514 are used to preprocess the input data received from the I / O interface 512. It should be understood that preprocessing modules 513 and 514 may be absent, or only one preprocessing module may be used. When preprocessing modules 513 and 514 are absent, the calculation module 511 can be used directly to process the input data.
[0142] During the preprocessing of input data by the execution device 510, or during the calculation module 511 of the execution device 510 performing calculations and other related processes, the execution device 510 can call data, code, etc. in the data storage system 550 for corresponding processing, or store the data, instructions, etc. obtained from the corresponding processing into the data storage system 550.
[0143] Finally, the I / O interface 512 provides the processing result to the client device 540, thereby providing it to the user.
[0144] In the scenario shown in Figure 6, the user can manually provide input data, which can be done through the interface provided by I / O interface 512. Alternatively, the client device 540 can automatically send input data to I / O interface 512. If user authorization is required for the client device 540 to automatically send input data, the user can set the corresponding permissions in the client device 540. The user can view the output results of the execution device 510 on the client device 540, which can be presented in various forms such as display, sound, or animation. The client device 540 can also act as a data acquisition terminal, collecting the input data and output results of the input I / O interface 512 as shown in the figure, and storing them as new sample data in database 530. Alternatively, data can be collected directly from the I / O interface 512 without going through the client device 540, using the input data and output results of the input I / O interface 512 as shown in the figure, and storing them as new sample data in database 530.
[0145] It is worth noting that Figure 6 is merely a schematic diagram of a system architecture provided in an embodiment of this application. The positional relationships between the devices, components, modules, etc. shown in the figure do not constitute any limitation. For example, in Figure 6, the data storage system 550 is an external memory relative to the execution device 510. In other cases, the data storage system 550 can also be placed in the execution device 510. It should be understood that the aforementioned execution device 510 can be deployed in the client device 540.
[0146] The following section describes the more detailed architecture of the execution entity of the data processing method in the embodiments of this application.
[0147] The system architecture provided in this application embodiment will be described in detail below with reference to Figure 6. Figure 6 is a schematic diagram of the system architecture provided in this application embodiment. As shown in Figure 6, the system architecture 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550, and a data acquisition device 560.
[0148] The execution device 510 includes a calculation module 511, an I / O interface 512, a preprocessing module 513, and a preprocessing module 514. The calculation module 511 may include a target model / rule 501, while the preprocessing modules 513 and 514 are optional.
[0149] The data acquisition device 560 is used to collect training samples. Training samples can be data such as electromagnetic signals; in this embodiment, the training samples are the data used to train multiple candidate neural networks. After collecting the training samples, the data acquisition device 560 stores them in the database 530.
[0150] The training device 520 can construct multiple candidate neural networks based on the search space maintained in the database 530, and train the neural networks based on training samples to search for and obtain the target model / rule 501. In this embodiment, the target model / rule 501 can be the target neural network.
[0151] It should be noted that in practical applications, the training samples maintained in database 530 may not all come from the data acquisition device 560; they may also be received from other devices. Furthermore, it should be noted that training device 520 may not necessarily train the target model / rule 501 entirely based on the training samples maintained in database 530; it may also obtain training samples from the cloud or other sources for model training. The above description should not be construed as limiting the embodiments of this application.
[0152] The target model / rule 501 trained by the training device 520 can be applied to different systems or devices, such as the execution device 510 shown in Figure 6. The execution device 510 can be a terminal, such as a mobile terminal, tablet computer, laptop computer, augmented reality (AR) / virtual reality (VR) device, vehicle terminal, etc., or it can be a server or cloud, etc.
[0153] Specifically, the training device 520 can transmit the target neural network to the execution device 510.
[0154] In Figure 6, the execution device 510 is configured with an input / output (I / O) interface 512 for data interaction with external devices. Users can input data to the I / O interface 512 through the client device 540.
[0155] Preprocessing modules 513 and 514 are used to preprocess the input data received from the I / O interface 512. It should be understood that preprocessing modules 513 and 514 may be absent, or only one preprocessing module may be used. When preprocessing modules 513 and 514 are absent, the calculation module 511 can be used directly to process the input data.
[0156] During the preprocessing of input data by the execution device 510, or during the calculation module 511 of the execution device 510 performing calculations and other related processes, the execution device 510 can call data, code, etc. in the data storage system 550 for corresponding processing, or store the data, instructions, etc. obtained from the corresponding processing into the data storage system 550.
[0157] Finally, the I / O interface 512 presents the processing results (such as the environmental perception results in this embodiment) to the client device 540, thereby providing them to the user.
[0158] From the inference side of the model:
[0159] In this embodiment, the computing module 511 of the execution device 510 can obtain the code stored in the data storage system 550 to implement the data processing method in this embodiment.
[0160] In this embodiment of the application, the computing module 511 of the execution device 510 may include hardware circuits (such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), general-purpose processors, digital signal processors (DSPs), microprocessors or microcontrollers, etc.) or combinations of these hardware circuits. For example, the training device 520 may be a hardware system with instruction execution capabilities, such as a CPU or DSP, or a hardware system without instruction execution capabilities, such as an ASIC or FPGA, or a combination of the aforementioned hardware systems without instruction execution capabilities and hardware systems with instruction execution capabilities.
[0161] Specifically, the computing module 511 of the execution device 510 can be a hardware system with the function of executing instructions. The data processing method provided in this application embodiment can be software code stored in the memory. The computing module 511 of the execution device 510 can obtain the software code from the memory and execute the obtained software code to implement the data processing method provided in this application embodiment.
[0162] It should be understood that the computing module 511 of the execution device 510 can be a combination of a hardware system without the function of executing instructions and a hardware system with the function of executing instructions. Some steps of the data processing method provided in the embodiments of this application can also be implemented by the hardware system without the function of executing instructions in the computing module 511 of the execution device 510, which is not limited here.
[0163] From the training side of the model:
[0164] In this embodiment, the training device 520 can obtain the code stored in the memory (not shown in Figure 6, which can be integrated into the training device 520 or deployed separately from the training device 520) to implement the data processing method in this embodiment.
[0165] In this embodiment of the application, the training device 520 may include hardware circuits (such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), general-purpose processors, digital signal processors (DSPs), microprocessors or microcontrollers, etc.) or combinations of these hardware circuits. For example, the training device 520 may be a hardware system with instruction execution capabilities, such as a CPU or DSP, or a hardware system without instruction execution capabilities, such as an ASIC or FPGA, or a combination of the aforementioned hardware systems without instruction execution capabilities and hardware systems with instruction execution capabilities.
[0166] Specifically, the training device 520 can be a hardware system with instruction execution capabilities. The data processing method provided in this application embodiment can be software code stored in a memory. The training device 520 can retrieve the software code from the memory and execute the retrieved software code to implement the data processing method provided in this application embodiment.
[0167] It should be understood that the training device 520 can be a combination of a hardware system without the function of executing instructions and a hardware system with the function of executing instructions. Some steps of the data processing method provided in the embodiments of this application can also be implemented by the hardware system in the training device 520 without the function of executing instructions, which is not limited here.
[0168] II. Cloud services with environment awareness capabilities provided by the server:
[0169] In one possible implementation, the server can provide environment-aware services to the client side through an application programming interface (API).
[0170] In this process, the terminal device can send relevant parameters (such as image data) to the server through the API provided by the cloud. The server can obtain the processing results based on the received parameters and return the processing results to the terminal.
[0171] The description of the terminal and server can be found in the above embodiments, and will not be repeated here.
[0172] Figure 7 illustrates the process of using an environment-aware cloud service provided by a cloud platform.
[0173] 1. Activate and purchase environmental sensing services.
[0174] 2. Users can download the software development kit (SDK) corresponding to the environment awareness service. Cloud platforms usually provide multiple development versions of the SDK for users to choose from according to their development environment needs, such as JAVA version SDK, Python version SDK, PHP version SDK, Android version SDK, etc.
[0175] 3. After downloading the corresponding version of the SDK to their local machine according to their needs, users can import the SDK project into their local development environment, configure and debug it in the local development environment, and develop other functions in the local development environment to form an application that integrates environmental awareness capabilities.
[0176] 4. When an application needs to perform environmental sensing functions, it can trigger an API call for these functions. When the application triggers environmental sensing, it sends an API request to the running instance of the environmental sensing service in the cloud environment. This API request carries an image, and the running instance in the cloud environment processes the input electromagnetic signals and other data to obtain the processing result.
[0177] 5. The cloud environment returns the processing result to the application, thus completing a call to the environment awareness function.
[0178] Since the embodiments of this application involve a large number of neural network applications, for ease of understanding, the relevant terms and concepts such as neural networks involved in the embodiments of this application will be introduced below.
[0179] (1) Neural Network
[0180] A neural network can be composed of neural units, which can be operational units that take xs and an intercept of 1 as inputs, and whose output can be:
[0181] Where s = 1, 2, ..., n, where n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer, and the activation function can be the sigmoid function. A neural network is a network formed by connecting multiple of the above-mentioned individual neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field, which can be a region composed of several neural units.
[0182] (2) Loss Function
[0183] In training a deep neural network, to ensure the output closely approximates the desired predicted value, we compare the network's prediction with the target value. Based on the difference, we update the weight vector of each layer (usually pre-configuring parameters before the initial update). For example, if the prediction is too high, the weight vector is adjusted to predict a lower value. This adjustment continues until the deep neural network predicts the target value or a value very close to it. Therefore, we need to predefine "how to compare the difference between the predicted and target values," which is the loss function or objective function. These are important equations used to measure the difference between the predicted and target values. Taking the loss function as an example, a higher output value (loss) indicates a greater difference, and training the deep neural network becomes a process of minimizing this loss.
[0184] (3) Backpropagation algorithm
[0185] Convolutional neural networks can employ backpropagation (BP) to correct the parameters in the initial super-resolution model during training, thereby reducing the reconstruction error loss. Specifically, forward propagation of the input signal to the output generates an error loss; this error loss information is then propagated back to update the parameters in the initial super-resolution model, leading to convergence of the error loss. The backpropagation algorithm is an error-loss-driven backpropagation process aimed at obtaining the optimal parameters of the super-resolution model, such as the weight matrix.
[0186] (4) Deep Neural Networks
[0187] Deep Neural Networks (DNNs), also known as multilayer neural networks, can be understood as neural networks with many hidden layers, though there's no specific metric for "many." DNNs can be categorized into three layers based on their position: input layers, hidden layers, and output layers. Generally, the first layer is the input layer, the last layer is the output layer, and the layers in between are hidden layers. All layers are fully connected, meaning that any neuron in the i-th layer is connected to any neuron in the (i+1)-th layer. Although DNNs appear complex, the operation of each layer is actually quite simple, resembling a linear relationship as follows: in, It is the input vector. It is the output vector. α is the offset vector, W is the weight matrix (also called coefficients), and α() is the activation function. Each layer is simply an adjustment of the input vector. The output vector is obtained through such a simple operation. Because DNNs have many layers, the coefficients W and the offset vector... The number of these parameters is therefore quite large. The definitions of these parameters in a DNN are as follows: Taking the coefficient W as an example: Assuming a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as... The superscript 3 represents the layer number where coefficient W resides, while the subscript corresponds to the output third layer index 2 and the input second layer index 4. In summary, the coefficients from the k-th neuron in layer L-1 to the j-th neuron in layer L are defined as follows: It's important to note that the input layer does not have a W parameter. In deep neural networks, more hidden layers allow the network to better represent complex real-world situations. Theoretically, the more parameters a model has, the higher its complexity and "capacity," meaning it can perform more complex learning tasks. Training a deep neural network is essentially the process of learning the weight matrix, with the ultimate goal of obtaining the weight matrix of all layers in the trained deep neural network (a weight matrix formed by the vectors W from many layers).
[0188] In environmental perception tasks, the external environment can be sensed through communication channels such as optical fibers, WiFi channels, and wireless base stations. This is primarily achieved by analyzing the scattered signals from optical fibers and the pilot echo signals from WiFi or wireless air interfaces to perceive the state of scattering objects in the external environment, thereby enabling tasks such as classifying and detecting external events.
[0189] In existing technologies, the use of the original channel sensing signal or one of its perspective transformation features as input features for the neural network results in low accuracy of the environmental perception results.
[0190] Furthermore, existing technical solutions typically train and infer models for a specific channel sensing task. Therefore, when a channel sensing task is first launched, the dataset is often insufficient, making immediate deployment difficult. This technical solution addresses these two points (multi-view channel signal enhancement and multi-channel sensing task pre-training) to improve the capabilities of electromagnetic channel sensing tasks.
[0191] To address the aforementioned problems, this application provides a data processing method. Referring to Figure 8, which is a schematic illustration of an embodiment of the data processing method provided by this application, the data processing method provided by this application may include:
[0192] 901. Acquire electromagnetic signals for environmental sampling.
[0193] Among them, electromagnetic signals can be wireless pilot signals, router detection signals, fiber optic channel vibration signals, etc.
[0194] 902. Based on the electromagnetic signal, multiple processed electromagnetic signals are obtained through multiple different signal processing methods, wherein the signal processing method is a method of performing signal transformation in at least one domain of frequency domain, spatial domain, or time domain;
[0195] In this embodiment, various signal methods can be used to obtain different perspective information of the original electromagnetic signal. Performing various transformations on the original signal is equivalent to characterizing the original electromagnetic signal from different perspectives, thereby obtaining a richer perspective characterization of the original channel sensing signal, which is beneficial for analyzing and modeling the original signal, and thus improving the accuracy of subsequent environmental sensing results.
[0196] In one possible implementation, the signal processing method includes one or more of the following: short-time variance calculation, power spectral density calculation, short-time Fourier transform, and signal statistics. That is, "perspective" refers to the Fourier transform, short-time variance transform, or statistical methods applied to the original signal. Performing various transformations on the original signal is equivalent to characterizing the original high-dimensional tensor signal from different perspectives. This helps to convert high-dimensional data matrices, which are difficult for engineers to understand, into energy distribution signals or statistical description signals that can be understood through different perspectives. This allows for the acquisition of different perspectives on the original channel-sensing signal, thus facilitating the analysis and modeling of the original signal.
[0197] Due to the sparsity and high dimensionality of electromagnetic channel sensing signals, current small-model schemes often struggle to classify certain types of intrusion signals with high accuracy. In this case, the advantage of performing multiple transformations on the original signal is that, through time-frequency-spatial feature enhancement transformations, the characteristics of the current signal can be perceived from multiple perspectives. For example, intensity signals are very good at distinguishing subtle intrusion signals, while phase signals perform better in classification tasks for most signals.
[0198] For example, after obtaining the fiber optic sensing signal, the original signal can be characterized from multiple perspectives such as time, frequency, and space by calculating the short-time variance and Fourier transform. This allows the characteristics of the current intrusion signal to be perceived from multiple perspectives. Several typical and effective mode (perspective) transformation methods are as follows:
[0199] a) Short-time variance: For the signal in each spatial channel, the variance value of the signal within a certain time window is calculated, and the value of the entire time dimension is obtained by sliding the window. By calculating the short-time variance, the violent fluctuation part in the original signal can be suppressed, which is beneficial to the appearance of some intrusion signals that are submerged in the background signal.
[0200] b) Power spectral density of phase or intensity signals: By calculating the power spectral density of each intensity signal and the signal in the spatial dimension of the intensity signal, the energy intensity distribution of the vibration signal in each frequency band at each spatial location within a 10s time interval can be obtained. This frequency domain energy intensity distribution is crucial for detecting the frequency domain response of certain special intrusion signals.
[0201] c) Short-Time Fourier Transform: The two characteristic transformation schemes above (short-time variance, power spectral density) focus on time-domain enhancement and frequency-domain enhancement. However, due to the strong sparsity of buried fiber optic sensing signals, especially the sparse distribution of intrusion signals across the entire signal's time dimension, a sliding window can be used to calculate the frequency-domain power spectral density of the signal within each window. This yields a signal that considers both the time and frequency domains. Typically, the short-time Fourier signal with the strongest energy column can be selected, which is crucial for transforming certain regular intrusion signals or intrusion signals with strong noise.
[0202] d) Statistical information of a certain dimension of the original signal: Perform energy or numerical statistics on the original one-dimensional or two-dimensional signal, and then use the statistically obtained signal as a certain perspective signal to be tokenized.
[0203] In one possible implementation, each of the plurality of processed electromagnetic signals can be mapped to a token representation to obtain a plurality of token representations.
[0204] Before performing calculations on a machine learning model, electromagnetic signals can be converted into a series of feature tokens.
[0205] Taking the Transformer model as an example of machine learning, this step is similar to segmenting an image into small patches in image processing. This allows the model to capture subtle changes in the signal, which is crucial for detecting minute environmental changes. After obtaining multimodal features through several viewpoint transformation methods, a tokenization scheme based on convolutional neural networks can be used to tokenize the features from each viewpoint. Then, these tokens from each modality are fed together into the Transformer self-attention model. The model's predictive output is obtained by leveraging the attention characteristics of the large-parameter Transformer and the model memory characteristics of the FeedForward layer.
[0206] 903. Based on the plurality of processed electromagnetic signals, a perception result of the environment is obtained through a machine learning model, wherein the machine learning model includes a feature extraction network (e.g., a backbone network), and the feature extraction network used in processing the plurality of processed electromagnetic signals through the machine learning model is the same.
[0207] In one possible implementation, the perception result of the environment can be obtained based on the multiple token representations through a machine learning model.
[0208] In one possible implementation, the machine learning model further includes: a task network; capable of interacting and fusing information among the multiple token representations through the feature extraction network to obtain multiple feature representations; and obtaining the perception result of the environment through the task network based on the multiple feature representations.
[0209] In one possible implementation, the feature extraction network is a transformer model.
[0210] The Transformer model can use an attention mechanism to automatically infer the feature interaction relationships between tokens across multiple modalities (time-frequency-space perspective), thereby achieving better inference accuracy.
[0211] In one possible implementation, sensor signals of a different type than electromagnetic signals collected for the environment can also be acquired; based on the sensor signals, processed sensor signals are obtained through signal processing methods; based on the plurality of processed electromagnetic signals and the processed sensor signals, a machine learning model is used to obtain the perception result of the environment, wherein the feature extraction network (e.g., backbone network) used in processing the plurality of processed electromagnetic signals and the processed sensor signals through the machine learning model is the same.
[0212] In this embodiment, features can also be extracted from data of different modalities (e.g., fiber optic data and other sensor data such as temperature and humidity) through the same feature extraction network (e.g., a backbone network). This multimodal fusion capability enables the use of information from one modality to enhance the interpretation of another modality, thereby improving the overall accuracy and robustness.
[0213] In this embodiment, environmental perception can also be performed based on data from different modalities (e.g., fiber optic data and other sensor data such as temperature and humidity). This multimodal fusion capability enables the use of information from one modality to enhance the interpretation of another modality, thereby improving overall accuracy and robustness.
[0214] For example, referring to Figure 9, which is a structural schematic of the attention module of Transformer, the attention module of Transformer can use tokens of multimodal data and a multi-head self-attention mechanism to fuse feature information of multiple modalities, thereby realizing automatic reasoning of the interaction between complex features (interference background signals and intrusion signals).
[0215] In one possible implementation, the machine learning model can also be pre-trained based on the perception results.
[0216] For data from different modalities (or, as some might call domains), a domain-specific multi-view transformation is performed on the channel measurement signals for each domain. After obtaining the multi-view token sequence of the electromagnetic measurement signals for each domain, the multi-view token sequence of the cross-domain electromagnetic measurement signals is fed into the Transformer model for training. Through cross-domain training, the model learns the commonalities of cross-domain signal multi-view interaction, thereby improving the signal perception and processing capabilities of each domain.
[0217] For example, referring to Figure 10, Figure 10 illustrates the processing flow of model input data for collaborative training based on cross-domain electromagnetic sensing signals, including:
[0218] Step 1 involves extracting multi-view signals for the electromagnetic sensing signals in each domain using prior view feature extraction methods specific to that domain. Then, during the training of the neural network model, data from multiple domains are used simultaneously to train the model. The training of the base model can be self-supervised or supervised, thereby enabling a base model to acquire cross-domain generalized sensing capabilities.
[0219] Step 2: When used in downstream tasks, for each individual electromagnetic sensing domain or a completely new electromagnetic signal sensing domain, use the multi-view data unique to that domain to fine-tune the downstream tasks, further improving the rapid adaptability of the tasks in that domain.
[0220] The feature of this embodiment is that it borrows the underlying fundamental theory of electromagnetic signal sensing to construct a basic model that is applicable to all electromagnetic channel sensing tasks. Through a pre-training-fine-tuning paradigm, the basic model has a good ability to extract multi-view features of electromagnetic channel signals.
[0221] Because it uses a base model for training on a variety of electromagnetic channel signals, the model has good generalization ability and can perform better on electromagnetic sensing signal tasks that have already been trained, or quickly acquire Few-Shot capability on electromagnetic channel sensing domain tasks that have never been seen before.
[0222] In one possible implementation, the machine learning model is a visual language model, which can also map the multiple processed electromagnetic signals into image data; based on the multiple token representations of the multiple processed electromagnetic signals and the image data, the perception result of the environment is obtained through the visual language model.
[0223] In one possible implementation, a prompt can be obtained, which indicates one or more of the following: performing signal analysis on the electromagnetic signal, performing signal analysis on the plurality of processed electromagnetic signals, and determining the perception result of the environment; based on the prompt, the plurality of token representations of the plurality of processed electromagnetic signals, and the image data, the perception result of the environment is obtained through the visual language model.
[0224] This application embodiment can utilize a pre-trained visual language model to obtain basic capabilities for visual features and logical reasoning. Then, the multi-view electromagnetic sensing signals are visualized, and the visual language model's ability to perceive the visualized signals, combined with the original multi-view electromagnetic sensing signals, is used to collaboratively perform electromagnetic sensing signal perception, contextual reasoning, and other tasks.
[0225] In one possible implementation, obtaining the perception result of the environment through the visual language model based on the prompt, the multiple token representations of the multiple processed electromagnetic signals, and the image data includes: obtaining the analysis result of the electromagnetic signals in the time dimension or spatial dimension, and the perception result of the environment, based on the prompt, the multiple token representations of the multiple processed electromagnetic signals, and the image data, through the visual language model.
[0226] For example, taking intrusion detection as an example, a language-like tool (natural language / vibration signal / audio / formatted text, etc.) can be used to perform logical analysis of the temporal or spatial context of the signal by the model, and then determine the temporal and spatial behavioral state (intrusion, non-intrusion) of the perceived signal.
[0227] In one possible implementation, the prompt may further include at least one of the following: a prediction of the perception result of the environment, and a description of the environment.
[0228] Before the model performs analysis or monitoring: it can interact with prompts input by external users, allowing engineers to input pre-judgments of the perceived results. Additionally, conditional statements (e.g., ambient temperature and humidity, external construction conditions, weather conditions, etc.) can be input to enable the model to provide more accurate results. After the model performs analysis or monitoring: it can report to users on the presence of risks or anomalies within a specific time or spatial span, for example, in the monitored scenario.
[0229] In one possible implementation, the perception result of the environment is specifically the perception result of the environment in the spatial dimension or the temporal dimension.
[0230] For example, referring to Figure 11, Figure 11 illustrates a schematic diagram of data processing using a pre-trained visual language model.
[0231] Step 1. First, visualize the electromagnetic channel sensing signal or its perspective transformation signal. The visualization method and process include: converting the original single-channel data into three-channel data, converting the high-dimensional matrix signal into a visual size signal (such as 224x224), and normalizing the matrix values to [0,1] to be consistent with the input of the Vision Model.
[0232] Step 2. Directly input the visualized electromagnetic channel sensing multi-view signal into the VLM model. Since the VLM model has been pre-trained on the visual signal, it can directly describe the characteristics of the current electromagnetic channel sensing multi-view signal using language.
[0233] Step 3. In the further training process, (a) the original electromagnetic signal, (b) the visualized signal, and (c) the linguistic problem description are then fed into the VLM model. The tokens of the electromagnetic signal need to be processed using an adapter before being fed into the pre-trained VLM model.
[0234] Step 4. Then, the model with added electromagnetic channel sensing signal is trained using paired electromagnetic channel sensing data. When training the VLM model, the intermediate mode of the visualized electromagnetic signal is used as a bridge, along with a pre-defined visual description of the electromagnetic signal, to help the model understand the electromagnetic signal token.
[0235] Step 5. In actual use, input a series of electromagnetic channel sensing signals and let the trained VLM model describe what the current series of electromagnetic channel sensing signals describe and what the current scenario might be.
[0236] This application uses a pre-trained Visual Language Model (VLM model), which possesses visual understanding, image description, and natural language logical reasoning capabilities. After visualizing the multi-view features of electromagnetic signals, the VLM model simultaneously acquires a coarse perception capability for the visualized electromagnetic channel sensing signals. Then, an adapter can be trained to enable the VLM model to further understand the multi-view data of electromagnetic channel sensing (rather than the visualized multi-view data).
[0237] The embodiments of this application can quickly and effectively enable a pre-trained VLM model to understand electromagnetic channel sensing signals. Furthermore, the technical solution can directly utilize the visual perception capabilities and language-based logical reasoning capabilities of the pre-trained VLM model, laying a solid technical foundation for reasoning about electromagnetic channel signals with temporal and spatial contexts.
[0238] Referring to FIG12, FIG12 is a schematic diagram of an embodiment of an image processing apparatus provided in this application. As shown in FIG12, the image processing apparatus 1200 provided in this application embodiment may include:
[0239] Acquisition module 1201 is used to acquire electromagnetic signals collected from the environment;
[0240] The description of the acquisition module 1201 can be found in the description of step 901 in the above embodiments, and will not be repeated here.
[0241] The signal processing module 1202 is used to obtain multiple processed electromagnetic signals through multiple different signal processing devices, wherein the signal processing device is a device that performs signal transformation in at least one domain of frequency domain, spatial domain or time domain.
[0242] The description of the signal processing module 1202 can be found in the description of step 902 in the above embodiments, and will not be repeated here.
[0243] The machine learning module 1203 is used to obtain the perception result of the environment based on the plurality of processed electromagnetic signals through a machine learning model, wherein the machine learning model includes a feature extraction network (e.g., a backbone network), and the feature extraction network used to process the plurality of processed electromagnetic signals through the machine learning model is the same.
[0244] The description of the machine learning module 1203 can be found in the description of step 903 in the above embodiment, and will not be repeated here.
[0245] In one possible implementation, the plurality of different signal processing devices include one or more of the following:
[0246] Devices for short-time variance calculation, power spectral density calculation, short-time Fourier transform, and signal statistics.
[0247] In one possible implementation, the machine learning module is used for:
[0248] Each of the multiple processed electromagnetic signals is mapped to a token representation to obtain multiple token representations;
[0249] Based on the multiple token representations, a machine learning model is used to obtain the perception results of the environment.
[0250] In one possible implementation, the machine learning model further includes: a task network; the machine learning module is used for:
[0251] Based on the multiple token representations, the feature extraction network performs information interaction and fusion among the multiple token representations to obtain multiple feature representations;
[0252] Based on the multiple feature representations, the perception results of the environment are obtained through the task network.
[0253] In one possible implementation, the feature extraction network is a transformer model.
[0254] In one possible implementation, the machine learning model is a visual language model, and the apparatus further includes:
[0255] The mapping module is used to map the multiple processed electromagnetic signals into image data;
[0256] The machine learning module is used for:
[0257] Based on the multiple token representations of the multiple processed electromagnetic signals and the image data, the perception result of the environment is obtained through the visual language model.
[0258] In one possible implementation, the machine learning module is used for:
[0259] Obtain a prompt, which indicates one or more of the following: perform signal analysis on the electromagnetic signal, perform signal analysis on the multiple processed electromagnetic signals, and determine the perception result of the environment;
[0260] Based on the prompts, the multiple token representations of the multiple processed electromagnetic signals, and the image data, the perception result of the environment is obtained through the visual language model, wherein the feature extraction network (e.g., backbone network) used in processing the multiple processed electromagnetic signals and the processed sensor signals through the machine learning model is the same.
[0261] In one possible implementation, the machine learning module is used for:
[0262] Based on the prompts, the multiple token representations of the multiple processed electromagnetic signals, and the image data, the visual language model is used to obtain the analysis results of the electromagnetic signals in the time or spatial dimensions, as well as the perception results of the environment.
[0263] In one possible implementation, the prompt may further include at least one of the following: a prediction of the perception result of the environment, and a description of the environment.
[0264] In one possible implementation, the perception result of the environment is specifically the perception result of the environment in the spatial dimension or the temporal dimension.
[0265] In one possible implementation, the acquisition module is further configured to:
[0266] Acquire sensor signals that are different from electromagnetic signals collected from the environment;
[0267] Based on the sensor signal, a processed sensor signal is obtained through a signal processing method;
[0268] The machine learning module is used for:
[0269] Based on the multiple processed electromagnetic signals and the processed sensor signals, a machine learning model is used to obtain the perception results of the environment.
[0270] In one possible implementation, the machine learning module is further used for:
[0271] Based on the perception results, the machine learning model is pre-trained.
[0272] The following describes a terminal device provided in an embodiment of this application. Please refer to Figure 13, which is a structural schematic diagram of a terminal device provided in an embodiment of this application. The terminal device 1300 can specifically be a virtual reality (VR) device, a mobile phone, a tablet, a laptop, a smart wearable device, a monitoring data processing device, or a server, etc., and is not limited here. Specifically, the terminal device 1300 includes: a receiver 1301, a transmitter 1302, a processor 1303, and a memory 1304 (the number of processors 1303 in the terminal device 1300 can be one or more; Figure 13 shows one processor as an example). The processor 1303 may include an application processor 13031 and a communication processor 13032. In some embodiments of this application, the receiver 1301, transmitter 1302, processor 1303, and memory 1304 can be connected via a bus or other means.
[0273] Memory 1304 may include read-only memory and random access memory, and provides instructions and data to processor 1303. A portion of memory 1304 may also include non-volatile random access memory (NVRAM). Memory 1304 stores processor and operation instructions, executable modules, or data structures, or subsets thereof, or extended sets thereof, wherein the operation instructions may include various operation instructions for implementing various operations.
[0274] Processor 1303 controls the operation of the terminal device. In specific applications, the various components of the terminal device are coupled together through a bus system, which may include not only the data bus, but also power buses, control buses, and status signal buses. However, for clarity, all buses are referred to as the bus system in the diagram.
[0275] The methods disclosed in the embodiments of this application can be applied to or implemented by the processor 1303. The processor 1303 can be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by the integrated logic circuits in the hardware of the processor 1303 or by instructions in software form. The processor 1303 can be a general-purpose processor, a digital signal processor (DSP), a microprocessor, or a microcontroller, and may further include an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The processor 1303 can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory 1304. Processor 1303 reads the information in memory 1304 and, in conjunction with its hardware, completes the steps of the above method.
[0276] Receiver 1301 can be used to receive input digital or character information, and to generate signal inputs related to the settings and function control of the terminal device. Transmitter 1302 can be used to output digital or character information; transmitter 1302 can also be used to send instructions to the disk group to modify the data in the disk group.
[0277] In one embodiment of this application, the processor 1303 is used to execute the data processing method executed by the terminal device in the above embodiment.
[0278] This application embodiment also provides a server. Please refer to Figure 14. Figure 14 is a schematic diagram of a server structure provided in this application embodiment. The server 1400 can be deployed with the device described in the embodiment corresponding to Figure 12. Specifically, the server 1400 is implemented by one or more servers. The server 1400 can vary significantly due to different configurations or performance. It can include one or more central processing units (CPUs) 1414 (e.g., one or more processors) and memory 1432, and one or more storage media 1430 (e.g., one or more mass storage devices) for storing application programs 1442 or data 1444. The memory 1432 and storage media 1430 can be temporary or persistent storage. The program stored in the storage media 1430 can include one or more modules (not shown in the figure), and each module can include a series of instruction operations on the server. Furthermore, the CPU 1414 can be configured to communicate with the storage media 1430 and execute a series of instruction operations in the storage media 1430 on the server 1400.
[0279] Server 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input / output interfaces 1458; or, one or more operating systems 1441, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, etc.
[0280] In this embodiment, the central processing unit 1414 is used to execute the method in the embodiment corresponding to FIG8.
[0281] This application also provides a computer program product that, when run on a computer, causes the computer to perform steps as performed by the aforementioned image processing apparatus, or causes the computer to perform steps as performed by the aforementioned image processing apparatus.
[0282] This application also provides a computer-readable storage medium storing a program for performing signal processing, which, when run on a computer, causes the computer to perform steps as performed by the aforementioned image processing apparatus, or causes the computer to perform steps as performed by the aforementioned image processing apparatus.
[0283] The execution device, server, or terminal device provided in this application embodiment can specifically be a chip. The chip includes a processing unit and a communication unit. The processing unit can be, for example, a processor, and the communication unit can be, for example, an input / output interface, pins, or circuits. The processing unit can execute computer execution instructions stored in the storage unit to cause the chip within the execution device to execute the data processing method described in the above embodiments, or to cause the chip within the server to execute the data processing method described in the above embodiments. Optionally, the storage unit can be a storage unit within the chip, such as a register or cache. Alternatively, the storage unit can be a storage unit located outside the chip within the wireless access device, such as a read-only memory (ROM) or other types of static storage devices capable of storing static information and instructions, such as random access memory (RAM).
[0284] Specifically, please refer to Figure 15, which is a schematic diagram of a chip structure provided in an embodiment of this application. This chip can be represented as a neural network processor (NPU) 1500. The NPU 1500 is mounted as a coprocessor on the host CPU, and tasks are assigned by the host CPU. The core part of the NPU is the arithmetic circuit 1503, which is controlled by the controller 1504 to extract matrix data from the memory and perform multiplication operations.
[0285] In some implementations, the arithmetic circuit 1503 internally includes multiple processing engines (PEs). In some implementations, the arithmetic circuit 1503 is a two-dimensional pulsating array. The arithmetic circuit 1503 can also be a one-dimensional pulsating array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1503 is a general-purpose matrix processor.
[0286] For example, suppose we have an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit retrieves the corresponding data of matrix B from the weight memory 1502 and caches it in each PE of the arithmetic circuit. The arithmetic circuit retrieves the data of matrix A from the input memory 1501 and performs matrix operations with matrix B. The partial result or the final result of the obtained matrix is stored in the accumulator 1508.
[0287] Unified memory 1506 is used to store input and output data. Weight data is directly transferred to weight memory 1502 via Direct Memory Access Controller (DMAC) 1505. Input data is also transferred to unified memory 1506 via DMAC.
[0288] BIU stands for Bus Interface Unit, which is used for interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1509.
[0289] The Bus Interface Unit (BIU) 1510 is used by the instruction fetch memory 1509 to fetch instructions from external memory, and also by the memory access controller 1505 to fetch the original data of the input matrix A or the weight matrix B from external memory.
[0290] The DMAC is mainly used to move input data from external memory DDR to unified memory 1506, or to weight data to weight memory 1502, or to input data to input memory 1501.
[0291] The vector computation unit 1507 includes multiple arithmetic processing units that further process the output of the computation circuit as needed, such as vector multiplication, vector addition, exponential operations, logarithmic operations, size comparisons, etc. It is mainly used for computation in non-convolutional / fully connected layers of neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.
[0292] In some implementations, the vector computation unit 1507 can store the processed output vector in the unified memory 1506. For example, the vector computation unit 1507 can apply a linear function, or a nonlinear function, to the output of the computation circuit 1503, such as linear interpolation of feature planes extracted by a convolutional layer, or, for example, a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit 1507 generates normalized values, pixel-level summed values, or both. In some implementations, the processed output vector can be used as activation input to the computation circuit 1503, for example, for use in subsequent layers of the neural network.
[0293] The instruction fetch buffer 1509 connected to the controller 1504 is used to store the instructions used by the controller 1504;
[0294] Unified memory 1506, input memory 1501, weighted memory 1502, and instruction fetch memory 1509 are all on-chip memories. External memory is proprietary to this NPU hardware architecture.
[0295] The processor mentioned above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the above program.
[0296] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. In addition, in the device embodiment drawings provided in this application, the connection relationship between modules indicates that they have a communication connection, which can be implemented as one or more communication buses or signal lines.
[0297] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware, or it can be implemented by special-purpose hardware including application-specific integrated circuits, special-purpose CPUs, special-purpose memory, special-purpose components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for this application, software program implementation is more often the preferred implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, ROM, RAM, magnetic disk, or optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0298] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.
[0299] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a server or data center that integrates one or more available media. The available medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state disk (SSD)).
Claims
1. A data processing method, characterized in that, The method includes: Acquire electromagnetic signals collected from the environment; Based on the electromagnetic signal, multiple processed electromagnetic signals are obtained through multiple different signal processing methods, wherein the signal processing method is a method of performing signal transformation in at least one domain of frequency domain, spatial domain or time domain; Based on the multiple processed electromagnetic signals, a perception result of the environment is obtained through a machine learning model; wherein the machine learning model includes a feature extraction network, and the feature extraction network used to process the multiple processed electromagnetic signals through the machine learning model is the same.
2. The method according to claim 1, characterized in that, The signal processing method includes one or more of the following: Methods for short-time variance calculation, power spectral density calculation, short-time Fourier transform, and signal statistics.
3. The method according to claim 1 or 2, characterized in that, The method further includes: Each of the multiple processed electromagnetic signals is mapped to a token representation to obtain multiple token representations; The step of obtaining the environmental perception result based on the multiple processed electromagnetic signals through a machine learning model includes: Based on the multiple token representations, a machine learning model is used to obtain the perception results of the environment.
4. The method according to claim 3, characterized in that, The machine learning model further includes: a task network; the step of obtaining the perception result of the environment based on the multiple token representations through the machine learning model includes: Based on the multiple token representations, the feature extraction network performs information interaction and fusion among the multiple token representations to obtain multiple feature representations; Based on the multiple feature representations, the perception results of the environment are obtained through the task network.
5. The method according to claim 4, characterized in that, The feature extraction network is a transformer model.
6. The method according to any one of claims 3 to 5, characterized in that, The machine learning model is a visual language model, and the method further includes: The processed electromagnetic signals are mapped into image data; Based on the multiple token representations of the multiple processed electromagnetic signals and the image data, the perception result of the environment is obtained through the visual language model.
7. The method according to claim 6, characterized in that, The step of obtaining the environmental perception result through the visual language model based on the multiple token representations of the multiple processed electromagnetic signals and the image data includes: Obtain a prompt, which indicates one or more of the following: perform signal analysis on the electromagnetic signal, perform signal analysis on the multiple processed electromagnetic signals, and determine the perception result of the environment; Based on the prompts, the multiple token representations of the multiple processed electromagnetic signals, and the image data, the perception result of the environment is obtained through the visual language model.
8. The method according to claim 7, characterized in that, The process of obtaining the environmental perception result through the visual language model based on the prompts, the multiple token representations of the multiple processed electromagnetic signals, and the image data includes: Based on the prompts, the multiple token representations of the multiple processed electromagnetic signals, and the image data, the visual language model is used to obtain the analysis results of the electromagnetic signals in the time or spatial dimensions, as well as the perception results of the environment.
9. The method according to claim 7 or 8, characterized in that, The prompt also includes at least one of the following: Predictive information on the perception results of the environment, and descriptive information on the environment.
10. The method according to any one of claims 1 to 9, characterized in that, The perception result of the environment specifically refers to the perception result of the environment in the spatial dimension or the temporal dimension.
11. The method according to any one of claims 1 to 10, characterized in that, The method further includes: Acquire sensor signals that are different from electromagnetic signals collected from the environment; Based on the sensor signal, a processed sensor signal is obtained through a signal processing method; The step of obtaining the environmental perception result based on the multiple processed electromagnetic signals through a machine learning model includes: Based on the multiple processed electromagnetic signals and the processed sensor signals, a machine learning model is used to obtain the perception result of the environment, wherein the feature extraction network used in processing the multiple processed electromagnetic signals and the processed sensor signals through the machine learning model is the same.
12. The method according to claim 11, characterized in that, The method further includes: Based on the perception results, the machine learning model is pre-trained.
13. A data processing apparatus, characterized in that, The device includes: The acquisition module is used to acquire electromagnetic signals collected from the environment. A signal processing module is used to obtain multiple processed electromagnetic signals through multiple different signal processing devices, wherein the signal processing devices are devices that perform signal transformation in at least one domain, namely the frequency domain, spatial domain, or time domain. The machine learning module is used to obtain the perception result of the environment based on the multiple processed electromagnetic signals through a machine learning model. The machine learning model includes a feature extraction network, and the feature extraction network used to process the multiple processed electromagnetic signals through the machine learning model is the same.
14. The apparatus according to claim 13, characterized in that, The plurality of different signal processing devices include one or more of the following: Devices for short-time variance calculation, power spectral density calculation, short-time Fourier transform, and signal statistics.
15. The apparatus according to claim 13 or 14, characterized in that, The machine learning module is used for: Each of the multiple processed electromagnetic signals is mapped to a token representation to obtain multiple token representations; Based on the multiple token representations, a machine learning model is used to obtain the perception results of the environment.
16. The apparatus according to claim 15, characterized in that, The machine learning model further includes: a task network; the machine learning module is used for: Based on the multiple token representations, the feature extraction network performs information interaction and fusion among the multiple token representations to obtain multiple feature representations; Based on the multiple feature representations, the perception results of the environment are obtained through the task network.
17. The apparatus according to claim 16, characterized in that, The feature extraction network is a transformer model.
18. The apparatus according to any one of claims 15 to 17, characterized in that, The machine learning model is a visual language model, and the device further includes: The mapping module is used to map the multiple processed electromagnetic signals into image data; The machine learning module is used for: Based on the multiple token representations of the multiple processed electromagnetic signals and the image data, the perception result of the environment is obtained through the visual language model.
19. The apparatus according to claim 18, characterized in that, The machine learning module is used for: Obtain a prompt, which indicates one or more of the following: perform signal analysis on the electromagnetic signal, perform signal analysis on the multiple processed electromagnetic signals, and determine the perception result of the environment; Based on the prompts, the multiple token representations of the multiple processed electromagnetic signals, and the image data, the perception result of the environment is obtained through the visual language model.
20. The apparatus according to any one of claims 13 to 19, characterized in that, The acquisition module is also used for: Acquire sensor signals that are different from electromagnetic signals collected from the environment; Based on the sensor signal, a processed sensor signal is obtained through a signal processing method; The machine learning module is used for: Based on the multiple processed electromagnetic signals and the processed sensor signals, a machine learning model is used to obtain the perception result of the environment, wherein the feature extraction network used in processing the multiple processed electromagnetic signals and the processed sensor signals through the machine learning model is the same.
21. The apparatus according to claim 20, characterized in that, The machine learning module is also used for: Based on the perception results, the machine learning model is pre-trained.
22. A computer storage medium, characterized in that, The computer storage medium stores one or more instructions, which, when executed by one or more computers, cause the one or more computers to perform the operation of the method according to any one of claims 1 to 12.
23. A computer program product, characterized in that, Includes computer-readable instructions that, when executed on a computer device, cause the computer device to perform the method as described in any one of claims 1 to 12.
24. A system, characterized in that, It includes at least one processor and at least one memory; the processor and the memory are connected via a communication bus and communicate with each other. The at least one memory is used to store code; The at least one processor is used to execute the code to perform the method as described in any one of claims 1 to 12.
25. A chip, characterized in that, It includes at least one processing unit and an interface circuit, the interface circuit being used to provide program instructions or data to the at least one processing unit, the at least one processing unit being used to execute the program instructions to implement the method of any one of claims 1 to 12.