An intelligent cabin environment self-adaptive adjustment method and related device

By employing a heterogeneous dual-stream temporal prediction method, combined with convolutional and recurrent neural networks and linear Transformers, the problem of regulating transient changes and long-term comfort patterns in intelligent cockpits was solved, achieving efficient multimodal environmental control and improving the response speed and personalized adaptation capabilities of cockpit environmental regulation.

CN122308115APending Publication Date: 2026-06-30SOUTH CHINA UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SOUTH CHINA UNIV OF TECH
Filing Date
2026-05-29
Publication Date
2026-06-30

Smart Images

  • Figure CN122308115A_ABST
    Figure CN122308115A_ABST
Patent Text Reader

Abstract

This application provides an intelligent cockpit environment adaptive adjustment method and related equipment, belonging to the field of intelligent cockpit and deep learning time series prediction technology. The method includes: real-time acquisition of multimodal sensor data within the cockpit, preprocessing to obtain an input sequence tensor; inputting the input sequence tensor into parallel dual-stream networks, extracting local features of transient environmental changes through convolutional and recurrent neural networks, and extracting global features of long-term comfort patterns through a low-rank approximate linear Transformer; calculating fusion coefficients and dynamically weighting and fusing the local and global features; inputting the fused features into a multi-task decoder to generate parallel control commands for air conditioning, ambient lighting, audio, and seat accessories, and outputting them to the corresponding actuators to adjust the cockpit physical environment. This application, through heterogeneous dual-stream adaptive fusion, effectively eliminates temperature overshoot and takes into account long-stroke personalized comfort patterns, significantly improving the response speed and comfort of cockpit environment adjustment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of intelligent cockpit control, and in particular to an intelligent cockpit environment adaptive adjustment method and related equipment. Background Technology

[0002] With the development of new energy vehicles and intelligent connected vehicles, intelligent cockpits have become a core component for enhancing passenger comfort and immersive experiences. The cockpit environment adaptive adjustment system needs to process two types of timing signals simultaneously: The first category is transient events, such as sudden drops in temperature and humidity caused by opening a car door, passengers sitting down, sudden changes in the angle of direct sunlight, and airflow impacts when a car window is opened. These events occur on a timescale of milliseconds to seconds, requiring the control system to have extremely high responsiveness; otherwise, it can lead to uncomfortable experiences such as temperature overshoot, sudden changes in airflow, and abrupt changes in lighting conditions.

[0003] The second category is long-term comfort patterns, such as daytime / nighttime temperature preferences, cabin atmosphere settings during commuting, and personalized needs at different stages of the journey (departure, driving, and imminent arrival). These patterns typically span from tens of minutes to several hours, requiring the system to have long-term memory and the ability to extract periodic patterns.

[0004] In existing technologies, traditional control strategies (such as PID control and fuzzy logic rule base) are difficult to balance fast response and overall comfort, and generally suffer from the following problems: adjustment lag leads to excessively long waiting time for passengers; significant temperature overshoot (such as excessively cold air conditioning when boarding in summer); and insufficient personalized experience, failing to adapt to the preferences of different passengers.

[0005] In recent years, deep learning-based time series prediction models have been introduced into the field of cockpit control. However, existing single-network architectures (such as LSTM and GRU) suffer from the gradient vanishing problem when faced with extremely long backtracking windows; while the standard Transformer model can capture long-range dependencies, its computational complexity increases quadratically with the sequence length. It is difficult to deploy on cockpit domain controllers with limited computing power, and it has an inherent time lag effect on local mutation events. In addition, when training heterogeneous networks (such as LSTM and Transformer) together, the problem of mismatched convergence speed is often faced, making it difficult for the model to reach the global optimum convergence. Summary of the Invention

[0006] The main objective of this application is to propose an intelligent cockpit environment adaptive adjustment method, system, electronic device, storage medium, and program product based on heterogeneous dual-stream timing prediction, which can achieve millisecond-level compensation for transient changes and high-precision tracking of long-term comfort patterns, and ultimately generate multimodal cockpit environment control commands.

[0007] To achieve the above objectives, one aspect of this application proposes an intelligent cockpit environment adaptive adjustment method, the method comprising: Multimodal sensing data within the intelligent cockpit is collected in real time. The multimodal sensing data includes at least in-vehicle temperature, light intensity, passenger status, and vehicle status data. The multimodal sensing data is preprocessed to obtain an input sequence tensor. The input sequence tensors are respectively input into a parallel two-stream network for feature extraction. The two-stream network includes a first-stream network and a second-stream network. The first-stream network extracts local temporal feature vectors reflecting transient environmental changes through convolution and recurrent neural networks. The second-stream network extracts global dependency feature vectors reflecting long-term comfort patterns through a linear Transformer (Linformer) based on low-rank matrix approximation. The fusion coefficient is calculated based on a learnable scalar parameter, and the local temporal feature vector and the global dependent feature vector are dynamically weighted and fused according to the fusion coefficient to obtain the fused feature expression vector. The fused feature representation vector is input into a multi-task decoder to generate at least one of the following cabin environment control commands: air conditioning control command, ambient lighting control command, audio control command, and seat accessory control command; The cockpit environment control commands are output to the corresponding cockpit actuators to adjust the physical environment inside the cockpit.

[0008] In some embodiments, the first streaming network includes a one-dimensional convolutional layer (1D-CNN) and a bidirectional long short-term memory network (BiLSTM) connected in sequence; wherein, The one-dimensional convolutional layer performs a convolution operation on the input sequence tensor to extract transient event features, including the opening of the car door, the passenger sitting down, and the sudden change in the angle of sunlight. The bidirectional long short-term memory network processes the feature sequence obtained after convolution operations and outputs a local temporal feature vector containing contextual information.

[0009] In some embodiments, the second stream network is a Linformer network; the step of extracting a globally dependent feature vector reflecting long-term comfort patterns using a linear Transformer based on low-rank matrix approximation includes: Perform a linear mapping on the input sequence tensor to generate a query matrix, a key matrix, and a value matrix; The key matrix and the value matrix are projected onto a low-dimensional space along the sequence length dimension using two low-rank projection matrices to obtain compressed key and value matrices. Based on the query matrix, the compressed key matrix, and the compressed value matrix, linear attention is calculated, and a global dependency feature vector representing the trip stage preferences and user personalized profile is output.

[0010] In some embodiments, the fusion coefficients are calculated by setting a learnable scalar parameter. The scalar parameter is initialized to a preset negative value, and the fusion coefficient is obtained by mapping using the Sigmoid function. The initial fusion coefficient is less than 0.5, which makes the prediction dominated by the globally dependent feature vector in the early stage of training. As the gradient backpropagates during training, the learnable scalar parameter is automatically increased, gradually increasing the weight of the local temporal feature vector in the fusion.

[0011] In some embodiments, the preset negative value is -2.0, corresponding to an initial fusion coefficient of approximately 0.12.

[0012] In some embodiments, the method further includes a dynamic optimization step of the fusion weights based on prediction error feedback, wherein the dynamic optimization of the fusion weights based on prediction error feedback includes: The mean square error (MSE) between the predicted value and the actual value output by the multi-task decoder is used as the global loss function; During backpropagation, the gradient of the global loss function with respect to the learnable scalar parameters is continuously calculated. Based on the feedback from the gradient, in the early stage of training, the local feature weights are kept at a low level to prioritize optimizing the global trend fitting. In the later stage of training, when the global loss decreases gradually, the gradient automatically drives the learnable scalar parameter to increase, thereby improving the local feature weights.

[0013] This dynamic optimization process enables the model to automatically adapt to the intensity of transient disturbances and the stability of long-term patterns under different driving scenarios.

[0014] In some embodiments, the multi-task decoder is a multilayer perceptron (MLP), and the parallel outputs of the multilayer perceptron include: target temperature, airflow level, air outlet mode, internal and external circulation mode; light brightness, color temperature, color mode; volume, sound field mode; seat heating level, ventilation level, and fragrance release concentration.

[0015] In some embodiments, the multi-task decoder also outputs passenger thermal comfort prediction values ​​(such as PMV index) for future time periods to achieve a closed-loop evaluation of the cabin environment control effect.

[0016] To achieve the above objectives, another aspect of this application proposes an intelligent cockpit environment adaptive adjustment system, comprising: Data acquisition and preprocessing module: used to acquire multimodal sensing data in the intelligent cockpit in real time and preprocess the multimodal sensing data to obtain an input sequence tensor. The multimodal sensing data includes in-vehicle temperature, light intensity, passenger status and vehicle status data. The dual-stream heterogeneous feature extraction module includes a first feature extraction unit and a second feature extraction unit set in parallel. The first feature extraction unit uses convolutional and recurrent neural networks to extract local temporal feature vectors that reflect transient environmental changes, while the second feature extraction unit uses a linear Transformer based on low-rank matrix approximation to extract global dependency feature vectors that reflect long-term comfort patterns. Adaptive fusion module: used to calculate fusion coefficients based on a learnable scalar parameter, and dynamically weight and fuse the local temporal feature vector and the global dependent feature vector according to the fusion coefficients, and output the fused feature expression vector; Multi-task control command generation module: used to decode the fused feature expression vector and generate at least one cabin environment control command in parallel, the control command including one or more of the following: air conditioning control command, ambient lighting control command, audio control command, and seat accessory control command; Command output interface: used to send the cockpit environment control commands to the corresponding cockpit actuators to adjust the physical environment inside the cockpit.

[0017] To achieve the above objectives, another aspect of this application provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the above-described method.

[0018] To achieve the above objectives, another aspect of the embodiments of this application proposes a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method.

[0019] The embodiments of this application include at least the following beneficial effects: This application provides an intelligent cockpit environment adaptive adjustment method, system, electronic device, storage medium and program product based on heterogeneous dual-stream timing prediction. This solution achieves millisecond-level compensation for transient changes and high-precision tracking of long-term comfort patterns by combining a parallel high-frequency micro-sensing module and a macro-trend insight module with adaptive gating fusion, and finally generates multimodal cockpit environment control commands. Attached Figure Description

[0020] Figure 1 This is a schematic diagram of the overall architecture of the intelligent cockpit environment adaptive adjustment system based on heterogeneous dual-stream timing prediction in this application.

[0021] Figure 2 This is a flowchart illustrating the training process of the asymmetric initialization dual-stream adaptive temporal prediction algorithm described in this application.

[0022] Figure 3 This is a comparison chart of the overall performance of the model in this application with existing baseline models (Bi-LSTM, pure Linformer) on the test set.

[0023] Figure 4 This is a comparison chart of the local prediction details and actual values ​​of the model in the last 100 steps of the test set, showing the response accuracy at the turning points of peaks and troughs.

[0024] Figure 5 This is a bar chart showing the depth comparison of the mean squared error (MSE) between the model in this application and the pure Linformer model.

[0025] Figure 6 This diagram illustrates the final fitting effect and error gap of the model in this application on the test set.

[0026] Figure 7 This is a schematic diagram of the hardware structure of the electronic device provided in the embodiments of this application.

[0027] Figure 8 This is a flowchart of the steps of the intelligent cockpit environment adaptive adjustment method provided in the embodiments of this application. Detailed Implementation

[0028] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to limit it. In the following description, when referring to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with those of this application; they are merely examples of apparatuses and methods consistent with some aspects of the embodiments of this application as detailed in the appended claims.

[0029] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.

[0030] like Figure 8 As shown, this embodiment provides a method for adaptive adjustment of the intelligent cockpit environment, which specifically includes the following steps: Step S1: Collect multimodal sensing data in the smart cockpit in real time. The multimodal sensing data includes at least the in-vehicle temperature, light intensity, passenger status, and vehicle status data. The multimodal sensing data is preprocessed to obtain the input sequence tensor. Step S2: Input the input sequence tensors into parallel two-stream networks for feature extraction. The two-stream network includes a first-stream network and a second-stream network. The first-stream network extracts local temporal feature vectors reflecting transient environmental changes through convolution and recurrent neural networks. The second-stream network extracts global dependency feature vectors reflecting long-term comfort patterns through a linear Transformer (Linformer) based on low-rank matrix approximation. Step S3: Calculate the fusion coefficient based on a learnable scalar parameter, and dynamically weight and fuse the local temporal feature vector and the global dependent feature vector according to the fusion coefficient to obtain the fused feature expression vector; Step S4: Input the fused feature representation vector into the multi-task decoder to generate at least one of the following cabin environment control commands: air conditioning control command, ambient lighting control command, audio control command, and seat accessory control command; Step S5: Output the cockpit environment control command to the corresponding cockpit actuator to adjust the physical environment inside the cockpit.

[0031] The solutions of the embodiments of this application will be described in detail below with reference to the accompanying drawings and specific application examples.

[0032] (1) Intelligent cockpit environment adaptive adjustment system architecture This embodiment provides an intelligent cockpit environment adaptive adjustment system based on heterogeneous dual-stream timing prediction, the overall architecture of which is as follows: Figure 1 As shown.

[0033] 1.1) Data Acquisition and Preprocessing The system collects multimodal time-series data in real time through an onboard sensor network, including: a) Environmental parameters: Multi-point temperature inside the vehicle (accuracy) ), outside temperature and light intensity (0-100000 lux); b) Passenger status: seat pressure sensor (detects passenger sitting position), in-vehicle infrared camera (outputs passenger number and posture, after anonymization); c) Vehicle status: vehicle speed, door status (open / closed), window opening, current air conditioning mode, seat heating / ventilation feedback value; d) Trip information: Trip stage (departure / driving / approaching arrival) and estimated remaining time from the navigation system.

[0034] All sensor data are sampled synchronously at frequencies ranging from 10Hz to 100Hz. After normalization and imputation of missing values ​​using a sliding window (window length L = 120 time steps, configurable from 12 seconds to 1.2 seconds), the input sequence tensor is formed. (in For batch size, For sequence length, Number of input channels (in this embodiment) ).

[0035] 1.2) Dual-stream heterogeneous feature extraction 1.2.1) Left branch (i.e. Figure 1 Branch A (corresponding to the high-frequency microscopic sensing module) Input tensor First, it passes through a one-dimensional convolutional layer. This embodiment uses three parallel convolutional kernels (sizes 3, 5, and 7 respectively), each kernel having a certain number of output channels. With a stride of 1 and padding of "same", the convolution operation extracts short-term fluctuation features at different time scales. Its mathematical expression is:

[0036] in, Indicates the size of the convolution kernel. Indicates the convolution kernel at the th... Weight matrix for each position, Given a local slice vector of the input sequence. For learnable bias vectors, To correct the linear unit activation function, this process effectively extracts short-term fluctuation features and achieves preliminary feature dimensionality reduction. The outputs of the three branches are concatenated along the channel dimension to obtain the feature map. .

[0037] Then The input is fed into a Bidirectional Long Short-Term Memory (BiLSTM) network. BiLSTM consists of two independent LSTM layers, processing the sequence in the forward and backward directions, respectively. The state update equation for a single LSTM unit is:

[0038]

[0039]

[0040]

[0041]

[0042] In the above formula, express The input vector at time step (i.e., the output of the convolutional layer), ); This represents the hidden state vector from the previous time step. These represent the forget gate, the input gate, and the hidden state at the current moment, respectively. These represent the candidate cell state and the final cell state vector, respectively. (Mathematical notation) This represents the Sigmoid activation function. This represents the hyperbolic tangent activation function. This represents the Hadamard product. The bidirectional mechanism calculates the forward hidden states separately. and reverse hidden state Finally, the concatenation yields a local temporal feature vector containing contextual information. This embodiment takes Therefore, the feature dimension is 128. This feature vector contains temporal context information of transient events such as door opening, seat-sitting impact, and sudden change in sunlight.

[0043] 1.2.2) Right-side branch (i.e. Figure 1 Branch B (corresponding to the macro trend insight module): This embodiment employs the Linformer architecture to reduce computational complexity. The input tensor X is first linearly projected to obtain the query Q, key K, and value V, each with a dimension of L×d (d=128). The computational complexity of self-attention in a standard Transformer is... While L=120 is still acceptable, this embodiment introduces a low-rank projection matrix to demonstrate the scalability of long sequences. Where k=32. Linear attention is calculated as follows:

[0044] After passing through 6 layers of Linformer encoders, the output is a globally dependent feature vector. This feature vector characterizes the long-term patterns of the cabin environment, including: the comfort temperature baseline at different times, preferences at different stages of the journey, and personalized user profiles (e.g., user A prefers 22℃ + cool color temperature + light music, while user B prefers 24℃ + warm color temperature + white noise).

[0045] 1.3) Adaptive Feature Fusion The adaptive fusion module introduces a learnable scalar parameter. The fusion coefficient is calculated using the Sigmoid function. :

[0046] Final hybrid feature representation vector The weighted sum of the output features of the two branches:

[0047] in and It has been unified to the same dimension d=128 through linear projection.

[0048] 1.3.1) Asymmetric initialization strategy: Before model training begins, the feature fusion parameters are... Initialize to a preset negative value, such as -2.0. At this point, the initial fusion coefficients are calculated based on the Sigmoid function. This means that in the early stages of training, the weight of the macro-trend branch from the right-hand Linformer accounts for approximately 88% of the fused features output by the model, while the weight of the local micro-branch from the left-hand CNN-LSTM accounts for only 12%.

[0049] This asymmetric initialization artificially constructs a "warm-up mode": in the early stages of training, the model primarily relies on the Transformer branch, which has stronger generalization ability and is better at capturing global patterns, for optimization, avoiding the LSTM branch, which is prone to overfitting, from prematurely interfering with the learning process of global patterns. As training progresses, the gradient descent algorithm will automatically adjust... The value, As the size gradually increases, local details are gradually introduced.

[0050] 1.3.2) Dynamic optimization based on error feedback: Model training uses the mean squared error (MSE) between predicted and true values ​​as the loss function:

[0051] in, The total number of samples; The actual label scalar value; To predict scalar values, all learnable parameters are jointly optimized through backpropagation.

[0052] The updates are strictly controlled by the backpropagation chain rule. Specifically, the loss function... right The gradient is:

[0053] Among them, the loss function Changes will directly drive The update direction and step size.

[0054] The physical meaning of this gradient is as follows: a) In the early stages of training, the global prediction error is relatively large. This error is mainly used to drive the weight update of the Linformer branch in order to quickly fit the macro trend.

[0055] b) As training progresses, the global error gradually decreases, and the error gradient affects the... The driving force is enhanced, making Increase in the positive direction As the threshold increases, the gating of the left branch is smoothly opened, and local micro-features are introduced for fine-tuning.

[0056] This error-driven dynamic optimization process ensures from a mathematical perspective that the final output model parameters are in the optimal fusion ratio state under the current time series dataset, realizing adaptive evolution of "macro-oriented first, then micro-fine-tuning" without any manual intervention in hyperparameters.

[0057] 1.4) Generation of multi-task control instructions Fusion feature vectors The input is fed to a multi-task decoder, which is a three-layer multilayer perceptron (MLP) with hidden layer dimensions of 256 and 128 respectively. The output layer has multiple branches. a) Air conditioning branch: Output target temperature (continuous value, 22~26℃), fan speed level (0-5 integer), air outlet mode (one-hot: blowing face / blowing feet / defrosting), internal and external circulation (two categories); b) Ambient Light Branch: Output brightness (0-100% continuous value), color temperature (warm white / cool white mixing coefficient), color mode (static / dynamic / breathing), scene theme (commuting / resting / exercise); c) Audio Branch: Output Volume (0-30 integers), Sound Field Mode (Surround / Front / Rear); d) Seat accessory branch: output seat heating level (0-3), ventilation level (0-3), fragrance release concentration (0-100%).

[0058] In addition, the decoder also outputs a predicted value of passenger thermal comfort for the next minute (predicted PMV index) for closed-loop evaluation.

[0059] 1.5) Command Output and Execution The generated control commands are sent to the corresponding cabin actuators via CAN bus or Ethernet: air conditioning controller, ambient lighting driver, audio amplifier, seat heating and ventilation module, and fragrance release device. The actuators adjust the physical environment according to the commands, forming a closed loop.

[0060] (2) Model training and deployment Training data: 2000 hours of cabin sensor data and subjective passenger comfort ratings (1-7 Likert scale) were collected from 50 volunteers in real-world road environments. The data were divided into training, validation, and test sets in an 8:1:1 ratio.

[0061] Training configuration: The hardware platform is an NVIDIA Orin embedded GPU (200 TOPS), and the software framework is PyTorch. The optimizer used is AdamW, with an initial learning rate of 1e-3, a batch size of 64, and 100 training epochs. The loss function is a weighted sum of the losses for each task, where mean squared error is used for PMV prediction and cross-entropy loss is used for classification tasks.

[0062] Training process: Following asymmetric initialization, ( In the first 20 rounds, The value remained between 0.12 and 0.18, indicating the model primarily relied on the right-hand branch, and the MSE rapidly decreased from 0.1 to 0.01. From rounds 21 to 60, Gradually rising to 0.3~0.5, local branches begin to compensate for local mutations. Rounds 61-100... It stabilized at 0.6~0.75, and finally the MSE reached 0.00006.

[0063] Comparative experiment: Control group set up: 1) Pure Bi-LSTM model; 2) Pure Linformer model; 3) Simple splicing and fusion model (without adaptive weights); 4) Fixed weights ( The model in this embodiment significantly outperforms the control in terms of prediction accuracy (lowest MSE), mutation response delay (temperature response time after door opening reduced from 3.2s to 0.8s), and passenger satisfaction (average rating 6.2 vs 4.5).

[0064] Deployment Testing: The trained model was quantized to INT8 precision and deployed on the cockpit domain controller (SiliconDrive X9 series) of a mass-produced new energy vehicle. Field tests showed that in high-temperature summer conditions, the system could activate maximum cooling mode within 0.5 seconds after the door was opened, preventing temperature overshoot; in tunnel entry and exit scenarios, the ambient lighting brightness transitioned smoothly without noticeable jumps; and after different passengers boarded the vehicle, the system could automatically switch to the passenger's preferred temperature and fragrance settings within 30 seconds.

[0065] (3) Algorithm Flow Description Figure 2 The training process of the dual-stream adaptive temporal prediction algorithm based on asymmetric initialization in this embodiment is shown. The specific steps are as follows: Step 1: Acquire and preprocess multimodal sensing data to obtain the input tensor X; Step 2: Input the data into the dual-stream network separately. The left stream is processed by CNN-BiLSTM to extract local features. The right-hand flow extracts global features using Linformer. ; Step 3: Calculate the fusion coefficient ,in Initialize to -2.0; Step 4: Calculate the mixture features ; Step 5: Input to a multi-task MLP decoder to generate predictive control instructions. ; Step 6: Calculate the loss L = MSE( ) + auxiliary losses; Step 7: Backpropagation, update network parameters, including ; Step 8: Determine Check if the adaptive ascent has occurred and L has converged. If not, repeat steps 1-7. If yes, end the training.

[0066] (4) Experimental examples To further verify the actual technical effect of this embodiment, a quantitative evaluation was conducted on a dataset collected from real-world smart cockpit scenarios.

[0067] Figure 3 The overall performance comparison of the test set is shown. The pure Bi-LSTM model (green dashed line) causes the prediction curve to tend to be a straight line due to gradient vanishing; the pure Linformer model (blue) can track the trend but has lag at local fluctuations; the Ada-HDNet model of this application (red) has a prediction curve that highly overlaps with the true value (black).

[0068] Figure 4 This is a magnified view of the last 100 steps of the test set. At the turning points between peaks and troughs, the response speed of our model is significantly faster than that of Linformer, thanks to the compensation of high-frequency mutation features by the CNN-BiLSTM on the left.

[0069] Figure 5 The bar chart shows the error comparison. The mean squared error (MSE) of our model is 0.00006, which is better than the 0.00007 of the pure Linformer model, and the relative error is reduced by about 14.3%.

[0070] Figure 6 The model's fit on the entire test set is shown. The red prediction curve almost completely overlaps with the black true curve, and the light red error gap is very narrow, indicating that the model has extremely high prediction confidence.

[0071] (5) Advantages and beneficial effects Compared with the prior art, this application has the following beneficial effects: 1) Millisecond-level feedforward compensation for transient events: The left-hand CNN-BiLSTM branch can accurately capture signal abrupt changes caused by events such as opening a door, sitting down, and sudden changes in sunlight. Through adaptive gating, it automatically increases the weight of local features at the moment the event is triggered, enabling actuators such as air conditioners and seat heaters to achieve feedforward fast response, greatly shortening the response time and eliminating temperature overshoot.

[0072] 2) Full-journey modeling of long-term comfort patterns: The right-side Linformer branch utilizes low-rank linear attention to process sensor data for the entire journey (several hours) under the condition of limited computing power of the cockpit controller, learns the comfort preferences and user personal profiles at different stages of the journey, and avoids the adjustment deviation caused by the "no memory" of traditional controllers.

[0073] 3) Adaptive Feature Fusion and Dynamic Weight Optimization: Based on a single learnable scalar parameter, the Sigmoid-gated fusion structure is extremely lightweight and has clear physical meaning. Combined with asymmetric initialization ( Initialized to -2.0) and driven by MSE gradient automatic optimization, the model is dominated by the global branch in the early stage of training to ensure the convergence of macro laws, and automatically increases the weight of local branches in the middle and late stages to refine the fitting, completely eliminating the problem of asynchronous convergence in the joint training of heterogeneous networks.

[0074] 4) Computationally complex metric-level matching, suitable for embedded deployment: Using Linformer instead of the standard Transformer reduces the complexity of the right-hand branch from... Down to The computational resource consumption of the left-hand CNN-BiLSTM branch is matched to that of the left-hand branch, enabling the entire model to run in real time on edge devices such as smart cockpit domain controllers.

[0075] 5) Multimodal control command coordination generation: The multi-task decoder outputs multiple control commands such as air conditioning, ambient lighting, audio, and seat accessories in parallel from the same fused feature vector. Through joint training, it learns the inherent coordination relationship between multiple targets, avoiding command conflicts that may be caused by a single independent controller (such as simultaneous strong cooling and seat heating).

[0076] 6) Prediction-Control-Evaluation Closed Loop: Additional output of future passenger thermal comfort prediction values, enabling predictive evaluation of control effectiveness, providing feedback signals for upper-level decision-making, and forming a closed-loop optimization system.

[0077] In summary, the intelligent cockpit environment adaptive adjustment method based on heterogeneous dual-stream timing prediction provided in this application can be widely applied to intelligent cockpit domain controllers in various passenger cars, commercial vehicles, and special vehicles. Its core algorithm boasts advantages such as low computational complexity, high prediction accuracy, and fast response, and can be directly deployed on existing in-vehicle computing platforms without additional hardware modifications. Furthermore, this method can also be extended to fields requiring multimodal environment adaptive adjustment, such as smart homes and smart office environments, demonstrating significant industrial practical value.

[0078] This application embodiment also provides an intelligent cockpit environment adaptive adjustment system that can implement the above-mentioned method. The system includes: Data acquisition and preprocessing module: used to acquire multimodal sensing data in the intelligent cockpit in real time and preprocess the multimodal sensing data to obtain an input sequence tensor. The multimodal sensing data includes in-vehicle temperature, light intensity, passenger status and vehicle status data. The dual-stream heterogeneous feature extraction module includes a first feature extraction unit and a second feature extraction unit set in parallel. The first feature extraction unit uses convolutional and recurrent neural networks to extract local temporal feature vectors that reflect transient environmental changes, while the second feature extraction unit uses a linear Transformer based on low-rank matrix approximation to extract global dependency feature vectors that reflect long-term comfort patterns. Adaptive fusion module: used to calculate fusion coefficients based on a learnable scalar parameter, and dynamically weight and fuse the local temporal feature vector and the global dependent feature vector according to the fusion coefficients, and output the fused feature expression vector; Multi-task control command generation module: used to decode the fused feature expression vector and generate at least one cabin environment control command in parallel, the control command including one or more of the following: air conditioning control command, ambient lighting control command, audio control command, and seat accessory control command; Command output interface: used to send the cockpit environment control commands to the corresponding cockpit actuators to adjust the physical environment inside the cockpit.

[0079] It is understood that the content of the above method embodiments is applicable to this system embodiment. The specific functions implemented in this system embodiment are the same as those in the above method embodiments, and the beneficial effects achieved are also the same as those achieved in the above method embodiments.

[0080] This application also provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the above-described method. This electronic device can be any smart terminal, including tablet computers, in-vehicle computers, etc.

[0081] It is understood that the content of the above method embodiments is applicable to this device embodiment. The specific functions implemented by this device embodiment are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.

[0082] Please see Figure 7 , Figure 7 The hardware structure of an electronic device according to another embodiment is illustrated. The electronic device includes: The processor 701 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application. The memory 702 can be implemented as a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 702 can store the operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 702 and is called and executed by the processor 701 using the methods described in the embodiments of this application. The input / output interface 703 is used to implement information input and output; The communication interface 704 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.). Bus 705 transmits information between various components of the device (e.g., processor 701, memory 702, input / output interface 703, and communication interface 704); The processor 701, memory 702, input / output interface 703, and communication interface 704 are connected to each other within the device via bus 705.

[0083] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method.

[0084] It is understood that the content of the above method embodiments is applicable to this storage medium embodiment. The specific functions implemented in this storage medium embodiment are the same as those in the above method embodiments, and the beneficial effects achieved are also the same as those achieved in the above method embodiments.

[0085] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0086] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described method.

[0087] It is understood that the content of the above method embodiments is applicable to the embodiments of this program product. The specific functions implemented in the embodiments of this program product are the same as those in the above method embodiments, and the beneficial effects achieved are also the same as those achieved in the above method embodiments. The executable computer program code or "code" used to perform the various embodiments can be written in high-level programming languages ​​such as C, C++, Python, Smalltalk, Java, JavaScript, Visual Basic, Structured Query Language (e.g., Transact-SQL), Perl, or in various other programming languages.

[0088] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.

[0089] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.

[0090] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0091] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.

[0092] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0093] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.

[0094] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0095] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0096] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0097] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0098] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.

Claims

1. A smart cabin environment adaptive adjustment method, characterized in that, The method includes the following steps: Collect multimodal sensing data within the intelligent cockpit. The multimodal sensing data includes at least in-vehicle temperature, light intensity, passenger status, and vehicle status data. Preprocess the multimodal sensing data to obtain an input sequence tensor. The input sequence tensors are respectively input into a parallel two-stream network for feature extraction. The two-stream network includes a first-stream network and a second-stream network. The first-stream network extracts local temporal feature vectors reflecting transient environmental changes through convolution and recurrent neural networks. The second-stream network extracts global dependency feature vectors reflecting long-term comfort patterns through a linear Transformer based on low-rank matrix approximation. The fusion coefficient is calculated based on a learnable scalar parameter, and the local temporal feature vector and the global dependent feature vector are dynamically weighted and fused according to the fusion coefficient to obtain the fused feature expression vector. The fused feature representation vector is input into a multi-task decoder to generate at least one of the following cabin environment control commands: air conditioning control command, ambient lighting control command, audio control command, and seat accessory control command; The cockpit environment control commands are output to the corresponding cockpit actuators to adjust the physical environment inside the cockpit.

2. The method of claim 1, wherein, The first stream network comprises a one-dimensional convolutional layer and a bidirectional long short-term memory network connected in sequence; wherein, The one-dimensional convolutional layer performs a convolution operation on the input sequence tensor to extract transient event features, including the opening of the car door, the passenger sitting down, or a sudden change in the angle of sunlight. The bidirectional long short-term memory network processes the feature sequence obtained after convolution operations and outputs a local temporal feature vector containing contextual information.

3. The method of claim 1, wherein, The second-stream network is a Linformer network; the extraction of globally dependent feature vectors reflecting long-term comfort patterns using a linear Transformer based on low-rank matrix approximation includes: Perform a linear mapping on the input sequence tensor to generate a query matrix, a key matrix, and a value matrix; The key matrix and the value matrix are projected onto a low-dimensional space along the sequence length dimension using two low-rank projection matrices to obtain compressed key and value matrices. Based on the query matrix, the compressed key matrix, and the compressed value matrix, linear attention is calculated, and a global dependency feature vector representing the trip stage preferences and user personalized profile is output.

4. The method according to claim 1, characterized in that, The fusion coefficient is calculated by setting a learnable scalar parameter, initializing the scalar parameter as a preset negative value, and mapping the fusion coefficient through a Sigmoid function wherein the initial fusion coefficient is less than 0.5, so that the model is dominated by the global dependent feature vector in the early stage of training. As gradients backpropagate during training, the learnable scalar parameters are automatically increased, gradually enhancing the weight of local temporal feature vectors in the fusion process.

5. The method according to claim 1, characterized in that, The method further includes a dynamic optimization step for the fusion weights based on prediction error feedback, wherein the dynamic optimization of the fusion weights based on prediction error feedback includes: The mean square error between the predicted value and the actual value output by the multi-task decoder is used as the global loss function; During backpropagation, the gradient of the global loss function with respect to the learnable scalar parameters is continuously calculated. Based on the feedback from the gradient, in the early stage of training, the weights of local features are made less than a preset weight threshold to prioritize the optimization of global trend fitting. In the later stage of training, when the global loss decreases gradually, the gradient automatically drives the learnable scalar parameters to increase, thereby improving the weights of local features.

6. The method according to claim 1, characterized in that, The multi-task decoder is a multi-layer sensor, and the parallel outputs of the multi-layer sensor include: target temperature, airflow level, air outlet mode, internal and external circulation mode; light brightness, color temperature, color mode; volume, sound field mode; seat heating level, ventilation level, and fragrance release concentration.

7. The method according to claim 1, characterized in that, The multi-task decoder also outputs passenger thermal comfort predictions for future time periods to achieve a closed-loop evaluation of the cabin environment control effect.

8. An electronic device, characterized in that, The electronic device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the method according to any one of claims 1 to 7.

9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1 to 7.

10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1 to 7.