Real-time pricing method and system for supply chain carbon footprint based on multi-modal data

CN122243558APending Publication Date: 2026-06-19FUZHOU UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
FUZHOU UNIV
Filing Date
2026-05-21
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing supply chain carbon management methods are unable to reflect dynamic changes in the production process in real time, and carbon emission reduction targets are disconnected from economic benefit targets. The lack of an effective dynamic pricing mechanism results in a lack of economic impetus for carbon reduction strategies.

Method used

By acquiring multimodal sensor data and value stream data, performing timestamp alignment and feature fusion, a carbon stream-value stream coupled differential equation and correlation matrix are established. A multi-level carbon pricing strategy is generated using a multi-agent reinforcement learning model (MADDPG), and the pricing strategy is optimized through a closed-loop feedback mechanism.

🎯Benefits of technology

It quantifies the carbon flow path entropy at each node of the supply chain, dynamically generates multi-level carbon pricing strategies, balances carbon emission reduction with economic benefits, and achieves real-time monitoring and dynamic pricing of carbon footprint.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243558A_ABST
    Figure CN122243558A_ABST
Patent Text Reader

Abstract

This invention discloses a real-time carbon footprint pricing method and system for the supply chain based on multimodal data. The method includes: acquiring multimodal sensor data and value stream data from each node in the supply chain; performing timestamp alignment and feature fusion to generate a carbon flow-value flow coupled data sequence; establishing a carbon flow-value flow coupled differential equation and constructing a carbon flow-value flow correlation matrix based on the carbon flow-value flow coupled data sequence; calculating the carbon flow path entropy for each node in the supply chain; inputting the carbon flow path entropy, carbon flow value, and external carbon price into a multi-agent reinforcement learning model using the MADDPG algorithm to output a multi-level carbon pricing strategy; and distributing and executing the multi-level carbon pricing strategy and obtaining actual data feedback to update the model parameters. This invention quantifies carbon flow uncertainty by introducing carbon flow path entropy and utilizes multi-agent game theory to achieve multi-level carbon pricing, effectively balancing carbon emission reduction and economic benefits.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, specifically to a method and system for real-time pricing of supply chain carbon footprint based on multimodal data. Background Technology

[0002] With increasing global focus on climate change, supply chain carbon footprint management has become a critical aspect of business operations. Currently, supply chain carbon management methods largely rely on static carbon emission accounting models, which struggle to reflect dynamic changes in the production process in real time. Furthermore, they often separate carbon reduction targets from economic benefit objectives, resulting in a lack of economic incentives for carbon reduction strategies. To address this challenge, some research attempts to incorporate multi-source data fusion and artificial intelligence algorithms for optimization. For example, they analyze the relationship between carbon flow and value flow by constructing correlation models, or use reinforcement learning to generate adjustment strategies. However, these methods often focus on local optimization at the production workshop level, and the generated strategies are often macroscopic, lacking specific and executable dynamic pricing mechanisms for different supply chain levels (such as within enterprises, between enterprises, and for final products). This makes it difficult to effectively guide collaboration among various nodes to achieve a global balance between carbon reduction and economic benefits. Summary of the Invention

[0003] In view of the above problems, the present invention provides a method and system for real-time pricing of supply chain carbon footprint based on multimodal data. By quantifying the uncertainty of carbon flow paths and utilizing multi-agent game theory, the invention achieves automatic generation of multi-level carbon pricing strategies.

[0004] To achieve the above objectives, in a first aspect, this application provides a real-time pricing method for supply chain carbon footprint based on multimodal data, comprising:

[0005] Acquire multimodal sensor data and value stream data from each node of the supply chain. The multimodal sensor data includes at least energy consumption data, vibration data, location data, and temperature data, while the value stream data includes at least process cost data, resource utilization rate data, and unit energy consumption revenue data.

[0006] Timestamp alignment and feature fusion are performed on multimodal sensor data and value stream data to generate carbon stream-value stream coupled data sequences;

[0007] Based on the carbon flow-value flow coupled data sequence, a carbon flow-value flow coupled differential equation is established. The carbon flow-value flow coupled differential equation is used to describe the real-time interaction between carbon flow intensity and value flow intensity, and a carbon flow-value flow correlation matrix is ​​constructed.

[0008] Based on the carbon flow-value flow correlation matrix, the carbon flow path entropy of each node in the supply chain is calculated. The carbon flow path entropy is used to quantify the uncertainty and complexity of the carbon flow path.

[0009] The carbon flow path entropy, carbon flow value, and external carbon price are input into a multi-agent reinforcement learning model. The multi-agent reinforcement learning model uses the MADDPG algorithm to output a multi-level carbon pricing strategy. The multi-level carbon pricing strategy includes at least intra-enterprise carbon tax, inter-enterprise carbon trading price, and final product carbon price.

[0010] The multi-tiered carbon pricing strategy is deployed to the supply chain system for execution, and the actual carbon emission data, cost data, and value stream data are obtained after execution.

[0011] Actual carbon emission data, cost data, and value stream data are fed back into the carbon-value stream correlation matrix and the multi-agent reinforcement learning model to update the model parameters and generate an optimized pricing strategy.

[0012] Furthermore, time-stamp alignment and feature fusion are performed on multimodal sensor data and value stream data to generate carbon stream-value stream coupled data sequences, including:

[0013] Outlier removal and missing value interpolation are performed on the energy consumption data, vibration data, position data, and temperature data in the multimodal sensor data to generate a multimodal cleaned data sequence;

[0014] For each type of data in the multimodal cleaned data sequence, a dynamic time warping algorithm is used to align the timestamps, so that all data sequences are mapped to a unified time reference axis, generating a multimodal synchronized data sequence;

[0015] Sliding window feature extraction is performed on the multimodal synchronous data sequence. Within each sliding window, the sliding mean and sliding variance of energy consumption data, the root mean square value and peak factor of vibration data, the moving distance and average velocity of location data, and the heating rate and temperature difference amplitude of temperature data are calculated to generate a multimodal feature vector sequence.

[0016] The process cost data, resource utilization rate data and unit energy consumption revenue data in the value stream data are normalized respectively to generate a standardized value stream data sequence.

[0017] Sliding window feature extraction is performed on the value stream standardized data sequence. Within each sliding window, the sliding mean and sliding standard deviation of process cost data, the sliding mean and sliding slope of resource utilization rate data, and the sliding mean and sliding coefficient of variation of unit energy consumption revenue data are calculated to generate a value stream feature vector sequence.

[0018] The multimodal feature vector sequence and the value stream feature vector sequence are aligned window by window on the time axis, and the aligned feature vectors are then concatenated and their dimensions reduced to generate a carbon stream-value stream coupled data sequence.

[0019] Furthermore, based on the carbon flow-value flow coupled data sequence, a carbon flow-value flow coupled differential equation is established, including:

[0020] The carbon flow intensity sequence and value flow intensity sequence of each supply chain node are extracted from the carbon flow-value flow coupled data sequence. The carbon flow intensity sequence represents the carbon emissions of the node per unit time, and the value flow intensity sequence represents the economic output value of the node per unit time.

[0021] First-order difference operations are performed on the carbon flow intensity sequence and the value flow intensity sequence to generate the carbon flow intensity change rate sequence and the value flow intensity change rate sequence, respectively.

[0022] Construct a carbon flow-value flow coupled differential equation, which takes the following form:

[0023] The rate of change of value flow intensity equals the product of the value conversion efficiency factor and the rate of change of carbon flow intensity, plus the sum of the products of each external variable and its corresponding dynamic coupling coefficient, plus the product of the external control factor and the value flow intensity.

[0024] Among them, the value conversion efficiency factor is calculated by the overall equipment efficiency and the process yield rate, the external variables include at least the external carbon price and energy price, the dynamic coupling coefficient is calculated in real time by the sliding cross-correlation analysis between the carbon flow intensity sequence and the external variable sequence, and the external control factor is obtained by fitting historical data.

[0025] The carbon flow intensity change rate sequence, value flow intensity change rate sequence, external variable sequence, and the calculated value conversion efficiency factor, dynamic coupling coefficient, and external control factor are substituted into the carbon flow-value flow coupled differential equation. The equation is then subjected to parameter identification and residual verification to generate the calibrated carbon flow-value flow coupled differential equation.

[0026] Furthermore, a carbon stream-value stream correlation matrix is ​​constructed, including:

[0027] Based on the calibrated carbon flow-value flow coupled differential equation, each supply chain node is classified into carbon flow type and value type. The carbon flow type includes direct carbon emissions, indirect carbon emissions and implicit carbon emissions, and the value type includes output value, material loss value and energy consumption value.

[0028] Using carbon flow type as the row dimension of the matrix and value type as the column dimension of the matrix, an initial carbon flow-value flow association matrix is ​​constructed, with each element in the initial carbon flow-value flow association matrix initialized to zero.

[0029] Extract the carbon flow type value and value type value of each supply chain node at each time stamp from the carbon flow-value flow coupled data sequence, and map the carbon flow type value and value type value to the corresponding row and column of the initial carbon flow-value flow association matrix according to the time stamp to generate a sparse-filled version of the carbon flow-value flow association matrix.

[0030] For each element in the sparsely filled version, a weighted moving average algorithm within a sliding time window is used for smoothing to generate a smoothed filled version of the carbon flow-value flow correlation matrix.

[0031] Based on the dynamic coupling coefficients in the calibrated carbon-value flow coupling differential equation, each element in the smooth-filled version is dynamically corrected. The dynamic correction method is to multiply the element value in the smooth-filled version by the dynamic coupling coefficient between the corresponding carbon flow type and the corresponding value type to generate a dynamically updated version of the carbon-value flow association matrix.

[0032] Furthermore, based on the carbon flow-value flow correlation matrix, the carbon flow path entropy of each node in the supply chain is calculated, including:

[0033] Extract the set of carbon flow transmission paths for each supply chain node from the carbon flow-value flow correlation matrix. Each carbon flow transmission path in the set is formed by connecting the starting node, intermediate node and ending node in chronological order, and corresponds to a carbon flow transmission value.

[0034] For each carbon flow transmission path, calculate the proportion of its carbon flow transmission volume to the total carbon flow transmission volume of that starting node, and use this as the carbon flow proportion of that carbon flow transmission path.

[0035] Substituting the carbon flow percentage into the Shannon entropy calculation formula, the path entropy component of each carbon flow transmission path is generated. The Shannon entropy calculation formula is that the path entropy component is equal to the negative of the product of the carbon flow percentage and the logarithm of the carbon flow percentage.

[0036] The path entropy components of all carbon flow transmission paths at the same supply chain node are summed to generate the carbon flow path entropy of that node. The carbon flow path entropy is used to quantify the uncertainty and dispersion of the carbon flow transmission path at that node.

[0037] Repeat the above steps for all nodes in the supply chain to generate a carbon flow path entropy sequence for each node, and then timestamp the carbon flow path entropy sequence with the carbon flow-value flow association matrix to generate a time-stamped carbon flow path entropy sequence.

[0038] Furthermore, the carbon flow path entropy sequence is timestamped with the carbon flow-value flow association matrix to generate a time-stamped carbon flow path entropy sequence, including:

[0039] Each carbon flow path entropy value in the carbon flow path entropy sequence is assigned a corresponding timestamp identifier. The timestamp identifier is consistent with the timestamp of the carbon flow type value and the value type value of the corresponding node in the carbon flow-value flow association matrix, thus forming the alignment of the carbon flow path entropy and the carbon flow-value flow association matrix in the time dimension.

[0040] The sliding window cross-validation method is used to calculate the Pearson correlation coefficient and Spearman rank correlation coefficient between the carbon flow path entropy value and the value of each element in the carbon flow-value flow association matrix in each sliding window, thereby generating a sequence of correlation coefficients between the carbon flow path entropy and the carbon flow-value flow association matrix.

[0041] The carbon flow path entropy values ​​in the correlation coefficient sequence that are lower than the preset correlation threshold within the corresponding time window are marked as outliers, and the weighted average of the carbon flow path entropy values ​​in the adjacent valid windows before and after the time window is used to replace them, thus generating a corrected carbon flow path entropy sequence.

[0042] The modified carbon flow path entropy sequence is vector-concatenated with the carbon flow type value and value type value corresponding to the timestamp in the carbon flow-value flow association matrix to generate a time-tagged carbon flow path entropy sequence. Each element in the time-tagged carbon flow path entropy sequence includes a timestamp, a carbon flow path entropy value, a direct carbon emission value, an indirect carbon emission value, a hidden carbon emission value, an output value, a material loss value, and an energy consumption value.

[0043] Furthermore, the carbon flow path entropy, carbon flow value, and external carbon price are input into a multi-agent reinforcement learning model. This model employs the MADDPG algorithm to output a multi-level carbon pricing strategy, including:

[0044] Each enterprise in the supply chain is defined as an agent in a multi-agent reinforcement learning model. An independent state space, action space and reward function are constructed for each agent. The state space includes at least the carbon flow path entropy, carbon flow value, external carbon price, inventory level and order backlog of the corresponding node of the agent. The action space includes at least the carbon tax adjustment range within the enterprise, the carbon trading price adjustment range between enterprises and the carbon price adjustment range of the final product. The reward function is the sum of negative total carbon emissions, negative total cost and negative carbon flow path entropy penalty term.

[0045] For each agent, a policy network and a value network are constructed. The policy network is used to output actions based on the current state, and the value network is used to evaluate the expected cumulative reward of performing actions in the current state. Both the policy network and the value network adopt a fully connected neural network structure and include batch normalization layers and residual connections.

[0046] A centralized training and distributed execution framework is used to train the policy network and the value network. Centralized training means that during the training phase, each agent's value network is provided with the state and action information of all agents. Distributed execution means that during the execution phase, each agent's policy network outputs actions based solely on its own state information.

[0047] During training, historical experience data is randomly sampled from the experience replay buffer. The historical experience data includes at least the current state, current action, immediate reward, next state, and training termination flag.

[0048] Historical experience data is input into the value network to calculate the target value, and the value network parameters are updated by minimizing the mean square error between the value network output and the target value.

[0049] The strategy network parameters are updated by maximizing the expected cumulative reward of the value network output;

[0050] After training, the carbon flow path entropy, carbon flow value and external carbon price of each node are input into the policy network of each agent in real time. The policy network of each agent outputs the corresponding enterprise carbon tax value, inter-enterprise carbon trading quotation value and final product carbon price value, which are combined to generate a multi-level carbon pricing strategy.

[0051] Furthermore, the reward function is the sum of negative total carbon emissions, negative total cost, and negative carbon flow path entropy penalty terms, including:

[0052] The negative total carbon emissions are defined as the negative of the difference between the total carbon emissions of the corresponding node of the agent in the current time step and the preset carbon emission benchmark value. The total carbon emissions are obtained by summing the direct carbon emission values, indirect carbon emission values ​​and implicit carbon emission values ​​in the carbon flow-value flow correlation matrix. The preset carbon emission benchmark value is dynamically updated according to the moving average of the historical carbon emission data of the node.

[0053] The negative total cost is defined as the negative of the sum of the operating cost and carbon trading cost of the corresponding node of the agent in the current time step. The operating cost includes at least the energy consumption cost and the material loss cost. The carbon trading cost is calculated based on the product of the carbon trading volume of the node in the current time step and the external carbon price.

[0054] The negative carbon path entropy penalty term is defined as the inverse of the deviation between the carbon path entropy of the corresponding node of the agent in the current time step and the preset carbon path entropy threshold. The preset carbon path entropy threshold is dynamically calculated based on the historical moving average and moving standard deviation of the carbon path entropy of all nodes in the supply chain. When the carbon path entropy exceeds the preset carbon path entropy threshold, the absolute value of the negative carbon path entropy penalty term increases, so as to guide the agent to give priority to the pricing strategy with lower carbon path uncertainty.

[0055] The agent's immediate reward is generated by weighted summing of negative total carbon emissions, negative total cost, and negative carbon flow path entropy penalty. The weight coefficients of the weighted sum are adaptively adjusted during training using a Bayesian optimization algorithm.

[0056] Furthermore, a centralized training and distributed execution framework is used to train the policy network and the value network, including:

[0057] During the centralized training phase, a central value network set is constructed, which contains the same number of central value networks as the number of agents. The input of each central value network is a concatenated vector of the state information of all agents and a concatenated vector of the action information of all agents, and the output is the expected cumulative reward estimate of the corresponding agent.

[0058] In each training iteration of the centralized training phase, a batch of historical experience data is sampled from the experience replay buffer. The historical experience data includes the current state, current action, immediate reward, next state and training termination flag of all agents. The current state and current action are input into the central value network set to calculate the expected cumulative reward estimate in the current state.

[0059] Input the next state and the next action into the central value network set to calculate the target cumulative reward valuation in the next state;

[0060] The parameters of the central value network set are updated by minimizing the mean square error between the expected cumulative reward valuation and the target cumulative reward valuation.

[0061] In each training iteration of the centralized training phase, the current state of each agent is input into the corresponding policy network, the current action is output, and the current action is input into the corresponding central value network to calculate the expected cumulative reward valuation.

[0062] The parameters of the agent's policy network are updated by maximizing the expected cumulative reward estimate using the gradient ascent method.

[0063] During the distributed execution phase, the central value network set is removed from each agent, and each agent retains only its own policy network. Each policy network independently outputs actions based on its own node's carbon flow path entropy, carbon flow value, and external carbon price. These actions include adjustments to the carbon tax within the enterprise, adjustments to the carbon trading price between enterprises, and adjustments to the carbon price of the final product.

[0064] In a second aspect, the present invention also provides a real-time pricing system for supply chain carbon footprint based on multimodal data, applicable to the method described in the first aspect. The system includes a multimodal data acquisition module, a data fusion module, a coupling analysis module, a path entropy calculation module, a pricing decision module, a strategy execution module, and a feedback update module. The multimodal data acquisition module is used to acquire multimodal sensor data and value stream data from each node in the supply chain. The multimodal sensor data includes at least energy consumption data, vibration data, location data, and temperature data, while the value stream data includes at least process cost data, resource utilization rate data, and unit energy consumption revenue data. The data fusion module is connected to the multimodal data acquisition module and is used to perform timestamp alignment and feature fusion on the multimodal sensor data and value stream data to generate a carbon stream-value stream coupled data sequence. The coupling analysis module is connected to the data fusion module and is used to establish a carbon stream-value stream coupled differential equation based on the carbon stream-value stream coupled data sequence and construct the carbon stream-value stream... The system comprises three modules: an association matrix; a path entropy calculation module connected to the coupling analysis module, used to calculate the carbon flow path entropy of each node in the supply chain based on the carbon flow-value flow association matrix; a pricing decision module connected to the path entropy calculation module, used to input carbon flow path entropy, carbon flow value, and external carbon price into a multi-agent reinforcement learning model, which uses the MADDPG algorithm to output a multi-level carbon pricing strategy, including at least intra-enterprise carbon tax, inter-enterprise carbon trading price, and final product carbon price; a strategy execution module connected to the pricing decision module, used to distribute the multi-level carbon pricing strategy to the supply chain system for execution, and to obtain the actual carbon emission data, cost data, and value flow data after execution; and a feedback update module connected to the strategy execution module, coupling analysis module, and pricing decision module, used to feed back the actual carbon emission data, cost data, and value flow data to the carbon flow-value flow association matrix and the multi-agent reinforcement learning model, update the model parameters, and generate an optimized pricing strategy.

[0065] Unlike existing technologies, the above technical solution provides a real-time carbon footprint pricing method and system for the supply chain based on multimodal data. This includes: acquiring multimodal sensor data and value stream data from each node in the supply chain, performing timestamp alignment and feature fusion to generate a carbon flow-value flow coupled data sequence; establishing a carbon flow-value flow coupled differential equation and constructing a carbon flow-value flow correlation matrix based on the carbon flow-value flow coupled data sequence, and then calculating the carbon flow path entropy of each node in the supply chain; inputting the carbon flow path entropy, carbon flow value, and external carbon price into a multi-agent reinforcement learning model using the MADDPG algorithm, and outputting a multi-level carbon pricing strategy; and distributing and executing the multi-level carbon pricing strategy and obtaining actual data feedback to update the model parameters. This invention quantifies carbon flow uncertainty by introducing carbon flow path entropy and utilizes multi-agent game theory to achieve multi-level carbon pricing, effectively balancing carbon emission reduction and economic benefits.

[0066] The above description of the invention is merely an overview of the technical solution of this application. In order to enable those skilled in the art to better understand the technical solution of this application and to implement it based on the description and drawings, and to make the above-mentioned objectives and other objectives, features and advantages of this application easier to understand, the following description is provided in conjunction with the specific embodiments and drawings of this application. Attached Figure Description

[0067] The accompanying drawings are only used to illustrate the principles, implementation methods, applications, features, and effects of specific embodiments of the present invention and other related contents, and should not be considered as limitations on this application.

[0068] In the accompanying drawings of the instruction manual:

[0069] Figure 1 This is a schematic diagram illustrating steps S101 to S107 of the method described in the specific implementation embodiment;

[0070] Figure 2 This is a schematic diagram illustrating steps S201 to S204 of the method described in a specific implementation.

[0071] Figure 3 This is a schematic diagram illustrating steps S301 to S305 of the method described in a specific implementation.

[0072] Figure 4 This is a schematic diagram illustrating steps S401 to S407 of the method described in a specific embodiment;

[0073] Figure 5 This is a schematic diagram of the structure of the real-time pricing system described in a specific implementation.

[0074] The reference numerals used in the above figures are explained as follows:

[0075] 1. Real-time pricing system; 11. Multimodal data acquisition module; 12. Data fusion module; 13. Coupled analysis module; 14. Path entropy calculation module; 15. Pricing decision module; 16. Strategy execution module; 17. Feedback update module. Detailed Implementation

[0076] To illustrate the possible application scenarios, technical principles, implementable specific solutions, and achievable objectives and effects of this application in detail, the following description, in conjunction with the listed specific embodiments and accompanying drawings, provides a detailed explanation. The embodiments described herein are merely illustrative of the technical solutions of this application and are therefore intended to limit the scope of protection of this application.

[0077] In this document, the term "embodiment" means that a specific feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The term "embodiment" appearing in various places throughout the specification does not necessarily refer to the same embodiment, nor does it specifically limit its independence or connection with other embodiments. In principle, in this application, as long as there are no technical contradictions or conflicts, the technical features mentioned in each embodiment can be combined in any way to form corresponding implementable technical solutions.

[0078] Unless otherwise defined, the technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains; the use of related terms herein is merely for the purpose of describing particular embodiments and is not intended to limit this application.

[0079] In the description of this application, the term "and / or" is used to describe the logical relationship between objects, indicating that three relationships can exist. For example, A and / or B means: A exists, B exists, and A and B exist simultaneously. Additionally, the character " / " in this document generally indicates that the preceding and following objects have an "or" logical relationship.

[0080] In this application, terms such as “first” and “second” are used only to distinguish one entity or operation from another, and do not necessarily require or imply any actual quantity, hierarchy or order relationship between these entities or operations.

[0081] Without further limitations, the use of terms such as “comprising,” “including,” “having,” or other similar open-ended expressions in this application is intended to cover non-exclusive inclusion, which does not exclude the presence of additional elements in a process, method, or product that includes the stated elements, such that a process, method, or product that includes a list of elements may include not only those defined elements but also other elements not expressly listed, or elements inherent to such a process, method, or product.

[0082] The processor described in the embodiments of this application can be implemented by hardware, firmware, software, or a combination thereof. It can be a circuit, one or more of an application-specific integrated circuit (ASIC), a digital signal processor (DSP), a digital signal processing device (DSPD), a programmable logic device (PLD), a field-programmable gate array (FPGA), a central processing unit (CPU), a controller, a microcontroller, or a microprocessor. It also includes other physical, biological, or chemical structures that can implement the same or equivalent functions as the processors listed above, such as biological neurons, quantum computing units, DNA computing units, etc., so that the processor can execute some or all of the steps in the computer program or method involved in the various embodiments of this application, or any combination of the steps mentioned therein.

[0083] The computer program involved in the embodiments can be stored in a computer device readable storage medium, which includes, but is not limited to, disks, magnetic tapes, magnetic cards, floppy disks, flash memory, optical disks, optical cards, read-only memory (ROM), random access memory (RAM), erasable programmable ROM (EPROM), and electrically erasable programmable ROM (EEPROM), etc., and also includes other biological, physical, or chemical structures that can achieve the same or equivalent functions as the storage media listed above, such as DNA, RNA, proteins, and other units with information storage capabilities. In specific embodiments, the storage medium involved can be one of the above-mentioned media types, or a combination of the above-mentioned media types. In different embodiments, the computer program involved in the embodiments can be centrally stored in a single medium, or distributed and stored in multiple media. The memory containing the computer device readable storage medium can be non-volatile memory or random access memory. These computer device readable storage media can be built into the device, or can be connected to the device involved in the embodiments as an external device or part of an external device. In some embodiments, the memory having a computer device readable storage medium is deployed locally; in other embodiments, the memory may be deployed remotely from the processor, for example, as a network-attached memory accessed via RF circuitry or an external port and a communication network, wherein the communication network may be the Internet, one or more intranets, a local area network (LAN), a wide area network (WLAN), a storage area network (SAN), or a suitable combination thereof, as long as computer device access to the memory is enabled. Furthermore, the computer program involved in the embodiments may be stored in plaintext / ciphertext form, or it may be designed as training data, integrated and recombined through model training and implicitly stored in the parameter states of a deep neural network or other machine learning model.

[0084] Please see Figure 1 In a first aspect, this embodiment provides a real-time pricing method for supply chain carbon footprint based on multimodal data, including:

[0085] S101. Obtain multimodal sensor data and value stream data from each node of the supply chain. The multimodal sensor data shall include at least energy consumption data, vibration data, location data, and temperature data. The value stream data shall include at least process cost data, resource utilization rate data, and unit energy consumption revenue data.

[0086] S102. Perform timestamp alignment and feature fusion on multimodal sensor data and value stream data to generate a carbon stream-value stream coupled data sequence;

[0087] S103. Based on the carbon flow-value flow coupled data sequence, establish a carbon flow-value flow coupled differential equation. The carbon flow-value flow coupled differential equation is used to describe the real-time interaction between carbon flow intensity and value flow intensity, and construct a carbon flow-value flow correlation matrix.

[0088] S104. Based on the carbon flow-value flow correlation matrix, calculate the carbon flow path entropy of each node in the supply chain. The carbon flow path entropy is used to quantify the uncertainty and complexity of the carbon flow path.

[0089] S105. Input the carbon flow path entropy, carbon flow value and external carbon price into the multi-agent reinforcement learning model. The multi-agent reinforcement learning model adopts the MADDPG algorithm and outputs a multi-level carbon pricing strategy. The multi-level carbon pricing strategy includes at least the enterprise carbon tax, the inter-enterprise carbon trading price and the final product carbon price.

[0090] S106. Implement the multi-level carbon pricing strategy in the supply chain system and obtain the actual carbon emission data, cost data and value stream data after implementation;

[0091] S107. Feed back the actual carbon emission data, cost data and value stream data to the carbon flow-value flow correlation matrix and the multi-agent reinforcement learning model, update the model parameters and generate an optimized pricing strategy.

[0092] In step S101, the multimodal sensor data originates from a sensor network deployed at various nodes of the supply chain. Energy consumption data can be collected through smart meters or power monitoring devices, reflecting the real-time energy consumption level of equipment or production lines. Vibration data is acquired by accelerometers to characterize the stability of equipment operation. Location data is tracked via RFID or GPS positioning modules, tracking the spatial movement of materials or products within the supply chain. Temperature data comes from environmental or equipment temperature sensors, relating to the production environment or material storage conditions. Value stream data can be extracted by connecting to the Enterprise Resource Planning (ERP) system and Manufacturing Execution System (MES). Process cost data summarizes the direct material, labor, and manufacturing costs of each production stage; resource utilization rate data measures the capacity utilization efficiency of equipment or production lines; and unit energy consumption revenue data directly links energy consumption with economic output.

[0093] In step S102, timestamp alignment aims to address the timing discrepancies caused by inconsistent acquisition frequencies and clock asynchrony of multi-source data. By mapping different data sequences to a unified time reference axis, subsequent fusion operations become comparable in the time dimension. Feature fusion then integrates and reorganizes the aligned heterogeneous data at the feature level, transforming multimodal sensor data and value stream data from their respective independent feature spaces to a unified feature representation space. The generated carbon flow-value stream coupled data sequence spatiotemporally binds carbon-related data in the physical dimension with value-related data in the economic dimension. Each set of data points simultaneously includes physical states such as energy consumption, vibration, location, and temperature at that moment, as well as corresponding economic states such as cost, resource efficiency, and revenue, providing structured input data for subsequently establishing a dynamic relationship model between the two.

[0094] In step S103, the carbon flow-value flow coupled differential equation describes the dynamic relationship between carbon flow intensity and value flow intensity over time in differential form. Carbon flow intensity characterizes the carbon emissions per unit time at a node, while value flow intensity characterizes the economic output value per unit time at a node. The equation captures the real-time interaction by establishing a mathematical correlation between their rates of change. The carbon flow-value flow correlation matrix organizes the mapping relationship between different carbon flow types and different value types in a row-column structure. The element values ​​in the matrix reflect the degree of contribution or influence weight of a specific carbon flow type to a specific value type.

[0095] In step S104, the carbon flow path entropy is calculated based on the carbon flow path information contained in the carbon flow-value flow correlation matrix. This index draws on the concept of entropy in information theory to measure the dispersion and uncertainty of the carbon flow transmission path between nodes in the supply chain. A higher carbon flow path entropy value indicates a more dispersed carbon flow distribution and more diverse path choices, while a lower value indicates that the carbon flow is concentrated on a few fixed paths. This step allows the complexity of the carbon flow path to serve as an input feature for subsequent decision-making models, providing quantitative information on the structural characteristics of the carbon flow for the formulation of pricing strategies.

[0096] In step S105, the multi-agent reinforcement learning model models each enterprise in the supply chain as an independent decision-making agent. Each agent makes pricing decisions independently based on its observed local state information. The MADDPG algorithm serves as the training framework. During the training phase, the value network of each agent can learn using global information, while during the execution phase, the policy network of each agent relies only on its own local information for independent decision-making, thus balancing the needs of global collaborative optimization and local autonomous decision-making. In the output multi-level carbon pricing strategy, the enterprise-level carbon tax guides carbon reduction behavior at each internal stage, the inter-enterprise carbon trading price promotes the optimal allocation of carbon emission rights between upstream and downstream of the supply chain, and the final product carbon price transmits carbon cost signals to the end market. These three levels work together to form a complete carbon price transmission system.

[0097] In step S106, the multi-level carbon pricing strategy is distributed to the execution systems of each node in the supply chain through a standard interface, translating the pricing strategy into specific production scheduling instructions, trading instructions, or price tags. After execution, actual carbon emission data, cost data, and value stream data are collected in real time through sensor networks and business systems. These data accurately reflect the actual effects of the pricing strategy and constitute the input source for closed-loop feedback.

[0098] In step S107, the actual data collected after execution is fed back to the carbon-value flow correlation matrix and the multi-agent reinforcement learning model to update the model parameters. For the correlation matrix, the actual data is used to correct the values ​​of each element in the matrix, making it more accurately reflect the current real mapping relationship between carbon flow and value flow. For the reinforcement learning model, the actual data is stored as new experience samples in the experience replay buffer, which is used to update the parameters of the policy network and value network in subsequent training iterations. Through this continuous data-driven feedback mechanism, the pricing strategy can be dynamically adjusted according to changes in the supply chain operation status, realizing an upgrade from static pricing to adaptive pricing.

[0099] This embodiment lays a data foundation through the joint acquisition of multimodal sensor data and value stream data. A dynamic correlation model is established using carbon flow-value flow coupled differential equations and correlation matrices. The introduction of carbon flow path entropy provides quantitative characteristics of the uncertainty in carbon flow structure for pricing decisions. The MADDPG multi-agent reinforcement learning model achieves the collaborative generation of multi-level pricing strategies through a game-theoretic mechanism. This embodiment realizes real-time monitoring and dynamic pricing of the supply chain carbon footprint. A closed-loop feedback mechanism ensures that the model can continuously adapt to changes in the supply chain's operational status, achieving synergistic optimization of carbon reduction and economic objectives.

[0100] In some embodiments, timestamp alignment and feature fusion are performed on multimodal sensor data and value stream data to generate a carbon stream-value stream coupled data sequence, including:

[0101] Outlier removal and missing value interpolation are performed on the energy consumption data, vibration data, position data, and temperature data in the multimodal sensor data to generate a multimodal cleaned data sequence;

[0102] For each type of data in the multimodal cleaned data sequence, a dynamic time warping algorithm is used to align the timestamps, so that all data sequences are mapped to a unified time reference axis, generating a multimodal synchronized data sequence;

[0103] Sliding window feature extraction is performed on the multimodal synchronous data sequence. Within each sliding window, the sliding mean and sliding variance of energy consumption data, the root mean square value and peak factor of vibration data, the moving distance and average velocity of location data, and the heating rate and temperature difference amplitude of temperature data are calculated to generate a multimodal feature vector sequence.

[0104] The process cost data, resource utilization rate data and unit energy consumption revenue data in the value stream data are normalized respectively to generate a standardized value stream data sequence.

[0105] Sliding window feature extraction is performed on the value stream standardized data sequence. Within each sliding window, the sliding mean and sliding standard deviation of process cost data, the sliding mean and sliding slope of resource utilization rate data, and the sliding mean and sliding coefficient of variation of unit energy consumption revenue data are calculated to generate a value stream feature vector sequence.

[0106] The multimodal feature vector sequence and the value stream feature vector sequence are aligned window by window on the time axis, and the aligned feature vectors are then concatenated and their dimensions reduced to generate a carbon stream-value stream coupled data sequence.

[0107] In this embodiment, outlier removal can be based on statistical distribution or physical thresholds. For example, for energy consumption data, values ​​exceeding the historical moving average plus or minus three times the moving standard deviation are considered outliers; for vibration data, values ​​exceeding the sensor's range or physically impossible values ​​are marked as outliers; for location data, logically contradictory jumps or values ​​exceeding preset geographic boundaries are considered outliers; and for temperature data, values ​​exceeding the allowable range of the process specifications are considered outliers. Missing value imputation employs different strategies based on the duration and pattern of data loss. For brief interruptions, linear interpolation can be used, estimating the missing value by connecting adjacent valid data before and after the missing point; for continuous long-term losses, imputation based on the mean of historical data from the same period or weighted average imputation based on nearby similar operating conditions can be used.

[0108] The Dynamic Time Warping (RTW) algorithm constructs a cost matrix between two time series and uses dynamic programming to find an optimal curved path from the starting point to the ending point, minimizing the cumulative distance along this path. This allows the series to undergo non-linear scaling on the time axis to achieve the best match. In this scenario, the time axis of the sensor with the highest sampling frequency or the best data quality can be selected as a unified time reference axis, and other sensor data sequences can be mapped to this reference axis using this algorithm.

[0109] The length and step size of the sliding window are determined based on the production cycle time and sensor sampling frequency. For example, it can be set to cover the duration of a complete production cycle, while the step size determines the density of feature extraction. Within each sliding window, the moving mean of energy consumption data reflects the average energy consumption level within that window, and the moving variance reflects the degree of energy consumption fluctuation. The root mean square value of vibration data characterizes the energy magnitude of the vibration signal, and the peak factor reflects the presence of impact components in the signal. The moving distance of position data is the sum of the changes in position coordinates within the window, and the average velocity is the moving distance divided by the window duration. The heating rate of temperature data is the average slope of temperature change within the window, and the temperature difference amplitude is the difference between the highest and lowest temperatures within the window.

[0110] Normalization maps process cost data, resource utilization data, and unit energy consumption revenue data to a unified scale space. For example, min-max normalization can be used to linearly scale the data to the zero-to-one range, or Z-score standardization can be used to convert the data into a standard normal distribution with a mean of zero and a standard deviation of one.

[0111] The sliding window parameters used for value stream feature extraction are consistent with those of the multimodal data to ensure window-by-window alignment on the time axis. Within each sliding window, the moving mean of process cost data reflects the average cost level within that window, while the moving standard deviation reflects the fluctuation range of costs. The sliding slope of resource utilization data is obtained by linearly fitting the data points within the window, reflecting the trend of utilization changes. The sliding coefficient of variation of unit energy consumption revenue data is obtained by dividing the moving standard deviation by the moving mean, reflecting the relative volatility of revenue.

[0112] Vector concatenation links the multimodal feature vector and the value stream feature vector end-to-end along the feature dimension to form a higher-dimensional joint feature vector. This vector contains both physical and economic state information within the time window. Dimension reduction is used to remove redundant information and reduce the computational burden of subsequent models. Principal component analysis can be used to extract the principal components through linear transformation, or an autoencoder can be used to learn a low-dimensional representation through nonlinear mapping. The resulting carbon flow-value flow coupled data sequence corresponds one-to-one with the original time window.

[0113] This embodiment improves the quality of multimodal sensor data by outlier removal and missing value imputation, solves the problem of time asynchrony of multi-source data by using dynamic time warping algorithm, extracts statistical features with clear physical or economic meaning from the physical domain and economic domain respectively by sliding window feature extraction, and finally achieves deep fusion of the two types of features by vector concatenation and dimensionality reduction, transforming multimodal sensor data and value stream data into a time-synchronized, structurally unified, and information-condensed carbon stream-value stream coupled data sequence, providing high-quality structured input for the subsequent establishment of a dynamic relationship model between carbon stream and value stream.

[0114] Please see Figure 2 In some embodiments, a carbon-value flow coupled differential equation is established based on the carbon flow-value flow coupled data sequence, including:

[0115] S201. Extract the carbon flow intensity sequence and value flow intensity sequence of each supply chain node from the carbon flow-value flow coupled data sequence. The carbon flow intensity sequence represents the carbon emissions of the node per unit time, and the value flow intensity sequence represents the economic output value of the node per unit time.

[0116] S202. Perform first-order difference operations on the carbon flow intensity sequence and the value flow intensity sequence respectively to generate the carbon flow intensity change rate sequence and the value flow intensity change rate sequence.

[0117] S203. Construct a carbon flow-value flow coupled differential equation. The form of the carbon flow-value flow coupled differential equation is as follows:

[0118] The rate of change of value flow intensity equals the product of the value conversion efficiency factor and the rate of change of carbon flow intensity, plus the sum of the products of each external variable and its corresponding dynamic coupling coefficient, plus the product of the external control factor and the value flow intensity.

[0119] Among them, the value conversion efficiency factor is calculated by the overall equipment efficiency and the process yield rate, the external variables include at least the external carbon price and energy price, the dynamic coupling coefficient is calculated in real time by the sliding cross-correlation analysis between the carbon flow intensity sequence and the external variable sequence, and the external control factor is obtained by fitting historical data.

[0120] S204. Substitute the carbon flow intensity change rate sequence, value flow intensity change rate sequence, external variable sequence, and the calculated value conversion efficiency factor, dynamic coupling coefficient, and external control factor into the carbon flow-value flow coupled differential equation, perform parameter identification and residual verification on the equation, and generate the calibrated carbon flow-value flow coupled differential equation.

[0121] In step S201, the carbon flow intensity sequence can be obtained by multiplying the energy consumption data within each time window of the carbon flow-value flow coupled data sequence by the carbon emission factor corresponding to the energy type, and then dividing by the duration of the time window to obtain the carbon emissions per unit time. The carbon emission factor can be obtained from national or industry-issued carbon emission accounting guidelines based on the energy type (e.g., electricity, natural gas, coal). The value flow intensity sequence can be obtained by comprehensively converting the process cost data, resource utilization rate data, and unit energy consumption revenue data within each time window of the coupled data sequence to obtain the economic output value, and then dividing by the duration of the time window to obtain the economic output value per unit time. The process cost data reflects direct input, the resource utilization rate data reflects capacity utilization efficiency, and the unit energy consumption revenue data links energy consumption with economic output.

[0122] In step S202, the first-order difference operation generates a carbon flow intensity change rate sequence by calculating the difference between the carbon flow intensity value at the current timestamp and the carbon flow intensity value at the previous timestamp. Similarly, the same operation is performed on the value flow intensity sequence to generate a value flow intensity change rate sequence. The difference operation transforms the original sequence of horizontal quantities into the change quantity between adjacent time points, enabling the subsequent differential equation to describe the instantaneous change relationship between carbon flow intensity and value flow intensity.

[0123] In step S203, the overall equipment efficiency is the product of three indicators: equipment availability, performance efficiency, and quality rate. The process yield rate reflects the proportion of qualified products in the total output during the production process. The product of these two indicators represents the efficiency with which carbon inputs are transformed into effective economic output after equipment operation and the production process. Among the external variables, the external carbon price is obtained from real-time market data in the carbon trading market, and the energy price is obtained from electricity market or fuel market quotations.

[0124] The sliding cross-correlation analysis between the carbon flow intensity series and the external variable series calculates the correlation coefficient between the two series at different time delays within a fixed-length sliding window. The delay and coefficient value corresponding to the maximum absolute value of the correlation coefficient are taken as the dynamic coupling coefficient within that time window, thereby achieving adaptive adjustment of the coupling coefficient as the market environment and production status change. External regulatory factors are obtained by fitting regression analysis to historical data on the intensity of value flow and the intensity of external policy regulation, reflecting the direct multiplier effect of external regulatory measures such as carbon tax rate adjustments and carbon emission quota allocation on the intensity of value flow.

[0125] In step S204, optimization algorithms such as least squares or gradient descent are used to identify the parameters of the carbon flow-value flow coupled differential equation, solving for the parameter combination that minimizes the sum of squared errors between the predicted and observed values. Subsequently, residual verification is used to evaluate model quality, calculating the root mean square error and mean absolute percentage error of the model's predicted residuals, and examining the autocorrelation of the residual sequence to determine whether the model adequately captures the dynamic patterns in the data. The equation that passes residual verification is the calibrated carbon flow-value flow coupled differential equation, which can be used to describe the real-time interaction between the carbon flow intensity and value flow intensity at the current supply chain node.

[0126] This embodiment constructs a carbon-value flow coupled differential equation, quantifying the interaction between carbon flow intensity and value flow intensity into a mathematical expression that includes a value conversion efficiency factor, a dynamic coupling coefficient, and an external regulatory factor. The value conversion efficiency factor correlates the economic conversion efficiency of carbon reduction behavior with the quality of production and operation. The dynamic coupling coefficient is calculated in real-time through sliding cross-correlation analysis, enabling the model to adapt to changes in the external market environment. The external regulatory factor incorporates the impact of policy regulation into the model through historical data fitting. This differential equation provides a model foundation for subsequent carbon flow path entropy calculations and multi-agent pricing decisions, capable of describing the dynamic coupling relationship between carbon flow and value flow.

[0127] In some embodiments, constructing a carbon stream-value stream correlation matrix includes:

[0128] Based on the calibrated carbon flow-value flow coupled differential equation, each supply chain node is classified into carbon flow type and value type. The carbon flow type includes direct carbon emissions, indirect carbon emissions and implicit carbon emissions, and the value type includes output value, material loss value and energy consumption value.

[0129] Using carbon flow type as the row dimension of the matrix and value type as the column dimension of the matrix, an initial carbon flow-value flow association matrix is ​​constructed, with each element in the initial carbon flow-value flow association matrix initialized to zero.

[0130] Extract the carbon flow type value and value type value of each supply chain node at each time stamp from the carbon flow-value flow coupled data sequence, and map the carbon flow type value and value type value to the corresponding row and column of the initial carbon flow-value flow association matrix according to the time stamp to generate a sparse-filled version of the carbon flow-value flow association matrix.

[0131] For each element in the sparsely filled version, a weighted moving average algorithm within a sliding time window is used for smoothing to generate a smoothed filled version of the carbon flow-value flow correlation matrix.

[0132] Based on the dynamic coupling coefficients in the calibrated carbon-value flow coupling differential equation, each element in the smooth-filled version is dynamically corrected. The dynamic correction method is to multiply the element value in the smooth-filled version by the dynamic coupling coefficient between the corresponding carbon flow type and the corresponding value type to generate a dynamically updated version of the carbon-value flow association matrix.

[0133] In this embodiment, the classification of direct carbon emissions, indirect carbon emissions, and implicit carbon emissions is determined based on the source attribution. Direct carbon emissions originate from the enterprise's own fixed combustion facilities or production processes. Indirect carbon emissions originate from upstream emissions corresponding to purchased electricity or heat. Implicit carbon emissions originate from embedded carbon generated in the upstream links of the supply chain by raw materials or components. Output value, material loss value, and energy consumption value are obtained from the cost accounting module of the enterprise resource planning system. Output value corresponds to the economic contribution of qualified products calculated based on internal transfer prices. Material loss value corresponds to the material cost consumed by waste or scrap materials. Energy consumption value corresponds to the energy cost of electricity, fuel, etc. consumed in the production process. The boundary between carbon flow type and value type can be determined by referring to the relative magnitude of each dynamic coupling coefficient in the calibrated carbon flow-value flow coupling differential equation. Carbon flow types and value type pairs with larger coupling coefficients have a stronger correlation and should be given priority in inclusion in the corresponding classification dimension.

[0134] The initial carbon-value flow association matrix is ​​structured as a 3x3 matrix with carbon flow type as the row dimension and value type as the column dimension, and all elements are initialized to zero to await subsequent data filling. The row order of the matrix corresponds to the carbon flow type list, and the column order corresponds to the value type list, with the mapping relationship between row and column indices remaining unchanged throughout the processing. The sparse-filled version is obtained by filling the matrix with carbon flow type values ​​and value type values ​​at each time point in the carbon-value flow coupled data sequence, according to their row and column indices. The carbon flow type values ​​are filled into the corresponding rows, and the value type values ​​are filled into the corresponding columns. The element value at the intersection of the row and column represents the contribution of that carbon flow type to that value type at that moment. Since not all carbon flow types are associated with all value types at every time point, the filled matrix exhibits sparse characteristics, meaning that most elements are zero or close to zero.

[0135] The smooth-filled version uses a weighted moving average algorithm within a sliding time window to smooth each element in the sparse-filled version. The length of the sliding window is set according to the production cycle or data sampling frequency. The weights of each time point within the window are distributed in an exponential decay manner, with recent data assigned higher weights and older data assigned lower weights. This eliminates the impact of short-term random fluctuations on matrix elements and preserves stable correlation patterns. The dynamically updated version multiplies each element value in the smooth-filled version by the dynamic coupling coefficient between the corresponding carbon flow type and the corresponding value type. This dynamic coupling coefficient is extracted from the calibrated carbon flow-value flow coupling differential equation and reflects the sensitivity of the current carbon flow type change to the corresponding value type change. The dynamically corrected matrix elements incorporate real-time dynamic coupling information, enabling the correlation matrix to adaptively update with changes in the market environment and production status.

[0136] This embodiment provides a structural framework for constructing the initial carbon-value flow correlation matrix by classifying carbon flow types and value types. Sparse padding maps time-series data to the matrix space, weighted moving average smoothing eliminates short-term noise, and dynamic coupling coefficient correction introduces real-time dynamic information. This carbon-value flow correlation matrix presents the complex mapping relationship between carbon flow and value flow in a structured manner, providing a quantitative basis for subsequent calculation of carbon flow path entropy. Simultaneously, its dynamic update mechanism ensures that the carbon-value flow correlation matrix can be continuously corrected as the supply chain operating status changes.

[0137] Please see Figure 3 In some embodiments, the carbon flow path entropy of each node in the supply chain is calculated based on the carbon flow-value flow correlation matrix, including:

[0138] S301. Extract the set of carbon flow transmission paths for each supply chain node from the carbon flow-value flow correlation matrix. Each carbon flow transmission path in the set of carbon flow transmission paths is formed by connecting the starting node, intermediate node and ending node in chronological order, and corresponds to a carbon flow transmission amount value.

[0139] S302. For each carbon flow transmission path, calculate the proportion of its carbon flow transmission volume to the total carbon flow transmission volume of the starting node, and use it as the carbon flow proportion of the carbon flow transmission path.

[0140] S303. Substitute the carbon flow percentage into the Shannon entropy calculation formula to generate the path entropy component of each carbon flow transmission path. The Shannon entropy calculation formula is that the path entropy component is equal to the negative of the product of the carbon flow percentage and the logarithm of the carbon flow percentage.

[0141] S304. The path entropy components of all carbon flow transmission paths of the same supply chain node are summed to generate the carbon flow path entropy of that node. The carbon flow path entropy is used to quantify the uncertainty and dispersion of the carbon flow transmission path of that node.

[0142] S305. Repeat the above steps for all nodes in the supply chain to generate a carbon flow path entropy sequence for each node, and bind the carbon flow path entropy sequence with the carbon flow-value flow association matrix to generate a carbon flow path entropy sequence with a time tag.

[0143] In step S301, the set of carbon flow transmission paths can be obtained by treating the carbon flow-value flow association matrix as an adjacency matrix of a weighted directed graph and traversing all possible node sequences using a depth-first search algorithm. The starting node is the source node of the carbon flow output, the intermediate nodes are the transfer or processing nodes through which the carbon flow passes, and the terminating node is the node where the carbon flow finally converges or is consumed. Each path corresponds to a carbon flow transmission value, which is obtained by multiplying or summing the elements of the association matrix of each segment along the path. The specific calculation method is selected based on the physical model of carbon flow transmission, choosing either a product model or a summation model. During the path search process, cyclic paths containing duplicate nodes must be excluded, retaining only simple paths without loops.

[0144] In step S302, the total carbon flow transfer amount of the starting node is obtained by summing the carbon flow transfer amounts of all output paths of that node. The carbon flow ratio is the ratio of the carbon flow transfer amount of each path to the sum. The carbon flow ratio normalizes the carbon flow transfer amounts of different orders of magnitude to the range of 0 to 1, so that the subsequent entropy calculation can eliminate the influence of node size differences and only reflect the relative uniformity of carbon flow distribution.

[0145] In step S303, the logarithmic base in the Shannon entropy calculation formula is chosen to be either the natural constant e or base 2. Different bases only change the absolute scale of the entropy value without affecting the relative ordering. The path entropy component is obtained by multiplying the carbon flow percentage by the natural logarithm of the carbon flow percentage and then taking the negative. When the carbon flow percentage approaches zero, the path entropy component approaches zero. When the carbon flow percentage is uniformly distributed between 0 and 1, the path entropy component reaches a larger value.

[0146] In step S304, a larger carbon flow path entropy indicates that the carbon flow transmission path at that node is more dispersed and uncertain, while a smaller value indicates that the carbon flow is concentrated in a few fixed paths and has higher certainty. Carbon flow path entropy quantifies the complexity of the carbon flow path structure from an information theory perspective.

[0147] In step S305, the timestamp binding between the carbon flow path entropy sequence and the carbon flow-value flow association matrix is ​​achieved by assigning a timestamp identifier corresponding to a time window to each carbon flow path entropy value. This timestamp identifier is consistent with the timestamp of the corresponding time window in the carbon flow-value flow association matrix, thereby forming a carbon flow path entropy sequence with timestamps. Each element in the sequence contains a timestamp, a carbon flow path entropy value, and the index information of the carbon flow-value flow association matrix under that time window.

[0148] This embodiment introduces Shannon entropy theory to quantitatively analyze the carbon flow path structure, transforming the path information implicit in the carbon flow-value flow correlation matrix into a measurable uncertainty index. The set of carbon flow transmission paths reflects the actual flow trajectory of carbon in the network. Normalization of carbon flow proportions eliminates differences in node size. The application of the Shannon entropy formula allows for a numerical expression of the dispersion of the path structure, while timestamp binding provides a time-aligned data foundation for subsequent time-series analysis and dynamic pricing. This carbon flow path entropy index provides a quantitative input on the carbon flow structural characteristics for multi-agent reinforcement learning models, enabling pricing strategies to perceive and respond to changes in the complexity and uncertainty of carbon flow paths.

[0149] In some embodiments, the carbon flow path entropy sequence is timestamped with the carbon flow-value flow association matrix to generate a time-stamped carbon flow path entropy sequence, including:

[0150] Each carbon flow path entropy value in the carbon flow path entropy sequence is assigned a corresponding timestamp identifier. The timestamp identifier is consistent with the timestamp of the carbon flow type value and the value type value of the corresponding node in the carbon flow-value flow association matrix, thus forming the alignment of the carbon flow path entropy and the carbon flow-value flow association matrix in the time dimension.

[0151] The sliding window cross-validation method is used to calculate the Pearson correlation coefficient and Spearman rank correlation coefficient between the carbon flow path entropy value and the value of each element in the carbon flow-value flow association matrix in each sliding window, thereby generating a sequence of correlation coefficients between the carbon flow path entropy and the carbon flow-value flow association matrix.

[0152] The carbon flow path entropy values ​​in the correlation coefficient sequence that are lower than the preset correlation threshold within the corresponding time window are marked as outliers, and the weighted average of the carbon flow path entropy values ​​in the adjacent valid windows before and after the time window is used to replace them, thus generating a corrected carbon flow path entropy sequence.

[0153] The modified carbon flow path entropy sequence is vector-concatenated with the carbon flow type value and value type value corresponding to the timestamp in the carbon flow-value flow association matrix to generate a time-tagged carbon flow path entropy sequence. Each element in the time-tagged carbon flow path entropy sequence includes a timestamp, a carbon flow path entropy value, a direct carbon emission value, an indirect carbon emission value, a hidden carbon emission value, an output value, a material loss value, and an energy consumption value.

[0154] In this embodiment, the timestamp identifier is directly inherited from the timestamp associated with the carbon flow type value and the value type value of the corresponding node in the carbon flow-value flow association matrix. It is usually the end time of the sliding window or the Unix timestamp recorded by the sensor system when the data is collected, thereby ensuring that the carbon flow path entropy and the carbon flow-value flow association matrix are accurately aligned in the time dimension.

[0155] The sliding window cross-validation method uses the same sliding window length as that used in generating the carbon flow-value flow coupled data sequence. Within each window, the Pearson correlation coefficient and Spearman rank correlation coefficient are calculated between the carbon flow path entropy value and the values ​​of each element in the carbon flow-value flow association matrix. The Pearson correlation coefficient measures the degree of linear association, while the Spearman rank correlation coefficient measures the degree of monotonic association. Combining the two can comprehensively assess the consistency level between the carbon flow path entropy value and the carbon flow-value flow association matrix. The preset correlation threshold can be determined based on the statistical distribution of the absolute values ​​of the correlation coefficients within all sliding windows in historical data. For example, the median of the absolute values ​​of the correlation coefficients of all windows can be used. Windows with correlation coefficients below the preset threshold are considered to have an abnormal correlation between the carbon flow path entropy value and the carbon flow-value flow association matrix. Outlier replacement uses the weighted average of the carbon flow path entropy values ​​within adjacent valid windows. The weights are distributed according to the reciprocal of the time interval from the outlier window, with closer valid windows assigned higher weights.

[0156] Vector concatenation combines the carbon flow path entropy value at each timestamp in the corrected carbon flow path entropy sequence with the carbon flow type value and value type value at the corresponding timestamp in the carbon flow-value flow association matrix in a fixed order to form a composite vector.

[0157] This embodiment achieves time alignment between carbon flow path entropy and carbon flow-value flow association matrix through timestamp identification allocation. The sliding window cross-validation method, combined with Pearson correlation coefficient and Spearman rank correlation coefficient, comprehensively evaluates the consistency between carbon flow path entropy and carbon flow-value flow association matrix. The outlier labeling and replacement mechanism eliminates unreliable data caused by abnormal data collection or calculation errors. Vector concatenation integrates the corrected carbon flow path entropy with the carbon flow type values ​​and value type values ​​in the carbon flow-value flow association matrix into a unified time-labeled carbon flow path entropy sequence, providing a comprehensive input feature for subsequent multi-agent reinforcement learning models that simultaneously contains carbon flow path structure information and carbon flow-value flow mapping information.

[0158] Please see Figure 4 In some embodiments, carbon flow path entropy, carbon flow value, and external carbon price are input into a multi-agent reinforcement learning model. This model employs the MADDPG algorithm to output a multi-level carbon pricing strategy, including:

[0159] S401. Define each node enterprise in the supply chain as an agent in a multi-agent reinforcement learning model, and construct an independent state space, action space and reward function for each agent. The state space shall at least include the carbon flow path entropy, carbon flow value (which can be calculated by combining the carbon flow-value flow correlation matrix and the external carbon price), external carbon price, inventory level and order backlog of the corresponding node of the agent. The action space shall at least include the carbon tax adjustment range within the enterprise, the carbon trading price adjustment range between enterprises and the carbon price adjustment range of the final product. The reward function is the sum of negative total carbon emissions, negative total cost and negative carbon flow path entropy penalty term.

[0160] S402. Construct a policy network and a value network for each agent. The policy network is used to output actions based on the current state, and the value network is used to evaluate the expected cumulative reward of performing actions in the current state. Both the policy network and the value network adopt a fully connected neural network structure and include batch normalization layers and residual connections.

[0161] S403. A centralized training and distributed execution framework is used to train the policy network and the value network. Centralized training means that during the training phase, the value network of each agent is provided with the state information and action information of all agents. Distributed execution means that during the execution phase, the policy network of each agent only relies on its own state information to output actions.

[0162] S404. During training, historical experience data is randomly sampled from the experience replay buffer. The historical experience data includes at least the current state, current action, immediate reward, next state and training termination flag.

[0163] S405. Input historical experience data into the value network to calculate the target value, and update the value network parameters by minimizing the mean square error between the value network output and the target value.

[0164] S406. Update the strategy network parameters by maximizing the expected cumulative reward of the value network output;

[0165] S407. After training, the carbon flow path entropy, carbon flow value and external carbon price of each node are input into the policy network of each agent in real time. The policy network of each agent outputs the corresponding enterprise carbon tax value, inter-enterprise carbon trading quotation value and final product carbon price value, and combines them to generate a multi-level carbon pricing strategy.

[0166] In step S401, the inventory level is obtained from the inventory management module of the enterprise resource planning system, reflecting the current inventory levels of raw materials, work-in-process, and finished goods at that node; the order backlog is obtained from the sales order management system, reflecting the total number of orders received but not yet delivered. Due to differences in dimensions, each dimension in the state space needs to be normalized. For example, carbon flow path entropy, carbon flow value, external carbon price, inventory level, and order backlog are mapped to a unified scale using min-max normalization or Z-score standardization, respectively. Reasonable upper and lower bound constraints are set for the adjustment ranges of intra-enterprise carbon tax, inter-enterprise carbon trading bids, and final product carbon prices in the action space to prevent the strategy network output from exceeding the actual executable range. The weight coefficients of each component in the reward function can be initially set based on expert experience or historical data, for example, by determining the relative importance of each objective through the analytic hierarchy process (AHP).

[0167] In step S402, the input layer dimension of the policy network and the value network is determined by the state space dimension, and the output layer dimension is determined by the action space dimension. The number of hidden layers can be set to two or three, and the number of neurons in each layer is set according to the complexity of the state space and action space. The batch normalization layer is placed before the activation function of each hidden layer to standardize the input of each layer to accelerate training convergence. The residual connection directly adds the input of each hidden layer to the output of that layer, so that the gradient can be directly backpropagated to the shallow network.

[0168] In step S403, the centralized training phase provides each agent's value network with a concatenated vector of state and action information of all agents as input, enabling the value network to learn collaborative strategies from a global perspective; in the distributed execution phase, each agent's policy network outputs actions based solely on the local state information of its own node, which conforms to the limitation in actual deployment that nodes cannot obtain global information.

[0169] In step S404, the experience playback buffer stores historical experience data with a fixed capacity. When the buffer is full, the old data is replaced by a first-in-first-out strategy. During sampling, a batch of historical experience data is extracted from the buffer using a random sampling method to update the network parameters and break the temporal correlation of the data.

[0170] In step S405, the target value is obtained by calculating the sum of the expected cumulative reward and the immediate reward in the next state through the target value network. The parameters of the target value network are periodically copied from the current value network through soft updates. The value network parameters are updated by minimizing the mean square error between the value network output and the target value, so that the value network's estimate gradually approaches the actual cumulative reward.

[0171] In step S406, the update of the policy network parameters can be achieved by the gradient ascent method. The gradient of the policy network output action relative to the value network output is calculated, and the policy network parameters are adjusted along the gradient direction to increase the expected cumulative reward of the value network output.

[0172] In step S407, the policy network of each agent outputs in real time the adjustment range of the enterprise's carbon tax, the adjustment range of the inter-enterprise carbon trading price, and the adjustment range of the final product carbon price. These are then added to the current pricing value and subjected to boundary constraint checks to generate a multi-level carbon pricing strategy.

[0173] This embodiment models the supply chain carbon pricing problem as a multi-agent collaborative decision-making problem using the MADDPG algorithm. The state space of each agent integrates carbon flow path structure information and operational status information, and the action space covers the complete pricing hierarchy from within the enterprise to between enterprises and to the final product. The centralized training and distributed execution framework takes into account the needs of global collaborative optimization and local autonomous decision-making, and realizes the adaptive generation of multi-level carbon pricing strategies.

[0174] In some embodiments, the reward function is the sum of negative total carbon emissions, negative total cost, and a negative carbon flow path entropy penalty term, including:

[0175] The negative total carbon emissions are defined as the negative of the difference between the total carbon emissions of the corresponding node of the agent in the current time step and the preset carbon emission benchmark value. The total carbon emissions are obtained by summing the direct carbon emission values, indirect carbon emission values ​​and implicit carbon emission values ​​in the carbon flow-value flow correlation matrix. The preset carbon emission benchmark value is dynamically updated according to the moving average of the historical carbon emission data of the node.

[0176] The negative total cost is defined as the negative of the sum of the operating cost and carbon trading cost of the corresponding node of the agent in the current time step. The operating cost includes at least the energy consumption cost and the material loss cost. The carbon trading cost is calculated based on the product of the carbon trading volume of the node in the current time step and the external carbon price.

[0177] The negative carbon path entropy penalty term is defined as the inverse of the deviation between the carbon path entropy of the corresponding node of the agent in the current time step and the preset carbon path entropy threshold. The preset carbon path entropy threshold is dynamically calculated based on the historical moving average and moving standard deviation of the carbon path entropy of all nodes in the supply chain. When the carbon path entropy exceeds the preset carbon path entropy threshold, the absolute value of the negative carbon path entropy penalty term increases, so as to guide the agent to give priority to the pricing strategy with lower carbon path uncertainty.

[0178] The agent's immediate reward is generated by weighted summing of negative total carbon emissions, negative total cost, and negative carbon flow path entropy penalty. The weight coefficients of the weighted sum are adaptively adjusted during training using a Bayesian optimization algorithm.

[0179] In this embodiment, the preset carbon emission benchmark value is dynamically updated based on the moving average of the historical carbon emission data of that node. The sliding window length is consistent with the sliding window length used when generating the carbon flow-value flow coupled data sequence, so that the benchmark value can reflect the recent trend of carbon emission level changes. The negative total carbon emission amount is calculated by taking the negative of the difference between the total carbon emission amount and the preset carbon emission benchmark value. When the total carbon emission amount is lower than the benchmark value, the reward is positive; when it is higher than the benchmark value, the reward is negative, thereby guiding the agent to continuously reduce the carbon emission level.

[0180] In the negative total cost, operating costs include energy consumption costs and material loss costs, obtained from the energy management module and cost accounting module of the enterprise resource planning system, respectively; carbon trading costs are calculated by multiplying the carbon trading volume by the external carbon price, where the carbon trading volume reflects the net purchase or net sale of carbon emission rights at the current time step. The negative total cost is the negative sum of operating costs and carbon trading costs, resulting in a higher reward for lower total costs.

[0181] In the negative carbon path entropy penalty term, the preset carbon path entropy threshold is dynamically calculated based on the historical moving average and moving standard deviation of the carbon path entropy of all nodes in the supply chain. For example, it can be set as the mean plus one standard deviation. When the carbon path entropy exceeds the preset carbon path entropy threshold, the absolute value of the penalty term increases. The negative carbon path entropy penalty term negates the deviation between the carbon path entropy and the preset carbon path entropy threshold, so that the reward is positive when the carbon path entropy is below the threshold and negative when it is above the threshold.

[0182] During training, the Bayesian optimization algorithm uses the weight coefficients of negative total carbon emissions, negative total cost, and negative carbon flow path entropy penalty as hyperparameters. By constructing a Gaussian process surrogate model to approximate the mapping relationship between the weight coefficients and the cumulative reward, it iteratively searches for the optimal weight combination that maximizes the cumulative reward within a preset search space, thereby achieving adaptive adjustment of the weights of each component of the reward function.

[0183] This embodiment incorporates three objectives—carbon emissions, cost, and carbon flow path entropy—into the reward function and introduces a Bayesian optimization algorithm to adaptively adjust the weight coefficients. This enables the multi-agent reinforcement learning model to dynamically balance the three objectives of carbon emission reduction, economic benefits, and carbon flow structure stability during training, guiding the agents to generate pricing strategies that take into account the interests of all parties.

[0184] In some embodiments, a centralized training and distributed execution framework is used to train the policy network and the value network, including:

[0185] During the centralized training phase, a central value network set is constructed, which contains the same number of central value networks as the number of agents. The input of each central value network is a concatenated vector of the state information of all agents and a concatenated vector of the action information of all agents, and the output is the expected cumulative reward estimate of the corresponding agent.

[0186] In each training iteration of the centralized training phase, a batch of historical experience data is sampled from the experience replay buffer. The historical experience data includes the current state, current action, immediate reward, next state and training termination flag of all agents. The current state and current action are input into the central value network set to calculate the expected cumulative reward estimate in the current state.

[0187] Input the next state and the next action into the central value network set to calculate the target cumulative reward valuation in the next state;

[0188] The parameters of the central value network set are updated by minimizing the mean square error between the expected cumulative reward valuation and the target cumulative reward valuation.

[0189] In each training iteration of the centralized training phase, the current state of each agent is input into the corresponding policy network, the current action is output, and the current action is input into the corresponding central value network to calculate the expected cumulative reward valuation.

[0190] The parameters of the agent's policy network are updated by maximizing the expected cumulative reward estimate using the gradient ascent method.

[0191] During the distributed execution phase, the central value network set is removed from each agent, and each agent retains only its own policy network. Each policy network independently outputs actions based on its own node's carbon flow path entropy, carbon flow value, and external carbon price. These actions include adjustments to the carbon tax within the enterprise, adjustments to the carbon trading price between enterprises, and adjustments to the carbon price of the final product.

[0192] In this embodiment, each central value network in the central value network set adopts a fully connected neural network structure similar to the policy network, but the input layer dimension is expanded to the sum of the state dimensions of all agents plus the sum of the action dimensions of all agents, thereby achieving the fusion of global information. The central value network set contains the same number of central value networks as the number of agents, with each central value network corresponding to one agent and outputting the expected cumulative reward valuation for that agent.

[0193] The batch size of historical experience data sampled from the experience replay buffer can be set to commonly used values ​​such as 32 or 64, ensuring that the sample size for a single iteration is sufficient to support the statistical stability of gradient calculation. The concatenated vector of the current state and the current action is input into the central value network, outputting the expected cumulative reward estimate for the current state. The next state is obtained by inputting the next state into the policy network, and then the concatenated vector of the next state and the next action is input into the central value network, outputting the target cumulative reward estimate for the next state. This target cumulative reward estimate consists of the immediate reward multiplied by a discount factor, resulting in the target cumulative reward estimate for the next state.

[0194] The parameters of the central value network set are updated by minimizing the mean squared error between the expected cumulative reward estimate and the target cumulative reward estimate. The parameters of the target value network are periodically copied from the current central value network through soft updates to stabilize the training process. The parameters of the policy network are updated using gradient ascent. The gradient of the policy network's output action relative to the corresponding central value network output is calculated, and the policy network parameters are adjusted along the gradient direction to increase the expected cumulative reward estimate.

[0195] In the distributed execution phase, the central value network set is removed from each agent. Each policy network independently outputs the adjustment range of carbon tax within the enterprise, the adjustment range of carbon trading price between enterprises, and the adjustment range of carbon price of final products based on the carbon flow path entropy, carbon flow value and external carbon price of its own node. Decisions can be made without communication between agents.

[0196] This embodiment achieves global information fusion in the centralized training phase through a central value network set, enabling each agent's value network to learn the policy trends of other agents and thus form a collaborative strategy during the training phase. After the central value network set is removed in the distributed execution phase, each agent makes independent decisions based solely on its own local information, taking into account both the needs of global collaborative optimization and local autonomous execution. This is suitable for actual deployment scenarios where nodes in the supply chain cannot share global information in real time.

[0197] Please see Figure 5In a second aspect, this embodiment also provides a real-time pricing system 1 for supply chain carbon footprint based on multimodal data, applicable to the method described in the first aspect. The system includes a multimodal data acquisition module 11, a data fusion module 12, a coupling analysis module 13, a path entropy calculation module 14, a pricing decision module 15, a strategy execution module 16, and a feedback update module 17. The multimodal data acquisition module 11 is used to acquire multimodal sensor data and value stream data from each node of the supply chain. The multimodal sensor data includes at least energy consumption data, vibration data, location data, and temperature data, and the value stream data includes at least process cost data, resource utilization rate data, and unit energy consumption revenue data. The data fusion module 12 is connected to the multimodal data acquisition module 11 and is used to perform timestamp alignment and feature fusion on the multimodal sensor data and value stream data to generate a carbon stream-value stream coupled data sequence. The coupling analysis module 13 is connected to the data fusion module 12 and is used to establish a carbon stream-value stream coupled differential equation based on the carbon stream-value stream coupled data sequence and construct a carbon stream-value stream coupled differential equation. The system comprises a value stream correlation matrix; a path entropy calculation module 14 connected to the coupling analysis module 13, used to calculate the carbon stream path entropy of each node in the supply chain based on the carbon stream-value stream correlation matrix; a pricing decision module 15 connected to the path entropy calculation module 14, used to input the carbon stream path entropy, carbon stream value, and external carbon price into a multi-agent reinforcement learning model, which uses the MADDPG algorithm to output a multi-level carbon pricing strategy, including at least intra-enterprise carbon tax, inter-enterprise carbon trading price, and final product carbon price; a strategy execution module 16 connected to the pricing decision module 15, used to distribute the multi-level carbon pricing strategy to the supply chain system for execution, and to obtain the actual carbon emission data, cost data, and value stream data after execution; and a feedback update module 17 connected to the strategy execution module 16, the coupling analysis module 13, and the pricing decision module 15, used to feed back the actual carbon emission data, cost data, and value stream data to the carbon stream-value stream correlation matrix and the multi-agent reinforcement learning model, update the model parameters, and generate an optimized pricing strategy.

[0198] This system acquires raw data from sensor networks and enterprise resource planning systems through a multimodal data acquisition module 11. A data fusion module 12 integrates heterogeneous data into a carbon flow-value flow coupled data sequence. A coupling analysis module 13 establishes carbon flow-value flow coupled differential equations and a carbon flow-value flow correlation matrix. A path entropy calculation module 14 quantifies the uncertainty of carbon flow paths based on the correlation matrix. A pricing decision module 15 generates a multi-level carbon pricing strategy using the MADDPG algorithm. A strategy execution module 16 distributes the pricing strategy and collects actual data after execution. A feedback update module 17 sends the actual data back to the coupling analysis module 13 and the pricing decision module 15 to update the model parameters. These modules are sequentially connected to form a complete link from data acquisition to closed-loop feedback, enabling real-time monitoring of the supply chain carbon footprint and adaptive generation of multi-level carbon pricing strategies.

[0199] By adopting the above technical solutions, this invention differs from existing technologies and possesses the following beneficial effects: It generates a carbon-value stream coupled data sequence through the joint acquisition of multimodal sensor data and value stream data, along with timestamp alignment and feature fusion, laying a data foundation for the dynamic correlation analysis of carbon and value streams; the carbon-value stream coupled differential equation and carbon-value stream correlation matrix established based on this data sequence quantify the real-time interaction relationship between carbon stream intensity and value stream intensity into an analytical mathematical expression, enabling a structured presentation of the coupling degree between carbon and value streams; and the carbon stream path entropy calculated based on the carbon-value stream correlation matrix quantifies the carbon stream from an information theory perspective. The uncertainty and complexity of the carbon flow path introduce a quantitative indicator of carbon flow structural characteristics for pricing decisions. Carbon flow path entropy, carbon flow value, and external carbon price are input into a multi-agent reinforcement learning model using the MADDPG algorithm. Through collaborative game among the agents, a multi-level carbon pricing strategy is generated, encompassing intra-enterprise carbon taxes, inter-enterprise carbon trading prices, and final product carbon prices. This achieves a complete carbon price transmission system from within enterprises to between enterprises and then to final products. Actual carbon emission data, cost data, and value flow data after execution are fed back to the carbon flow-value flow correlation matrix and the multi-agent reinforcement learning model to update model parameters, forming a continuously iterative adaptive closed loop. This invention achieves real-time monitoring of the supply chain carbon footprint and adaptive generation of multi-level carbon pricing strategies, effectively balancing carbon reduction targets and economic benefits.

[0200] Finally, it should be noted that although the above embodiments have been described in the text and drawings of this application, this should not limit the scope of patent protection of this application. Any technical solutions that are based on the essential concept of this application and utilize the content described in the text and drawings of this application, resulting in equivalent structural or procedural substitutions or modifications, as well as the direct or indirect application of the technical solutions of the above embodiments to other related technical fields, are all included within the scope of patent protection of this application.

Claims

1. A real-time pricing method for supply chain carbon footprint based on multimodal data, characterized in that, include: Acquire multimodal sensor data and value stream data from each node of the supply chain. The multimodal sensor data includes at least energy consumption data, vibration data, location data, and temperature data. The value stream data includes at least process cost data, resource utilization rate data, and unit energy consumption revenue data. The multimodal sensor data and value stream data are time-stamp aligned and feature-fused to generate a carbon stream-value stream coupled data sequence; Based on the carbon flow-value flow coupled data sequence, a carbon flow-value flow coupled differential equation is established. This equation is used to describe the real-time interaction between carbon flow intensity and value flow intensity, and to construct a carbon flow-value flow correlation matrix. Based on the carbon flow-value flow correlation matrix, the carbon flow path entropy of each node in the supply chain is calculated. The carbon flow path entropy is used to quantify the uncertainty and complexity of the carbon flow path. The carbon flow path entropy, carbon flow value, and external carbon price are input into a multi-agent reinforcement learning model. The multi-agent reinforcement learning model uses the MADDPG algorithm to output a multi-level carbon pricing strategy. The multi-level carbon pricing strategy includes at least intra-enterprise carbon tax, inter-enterprise carbon trading price, and final product carbon price. The multi-level carbon pricing strategy is deployed to the supply chain system for execution, and the actual carbon emission data, cost data, and value stream data after execution are obtained. The actual carbon emission data, cost data, and value stream data are fed back to the carbon stream-value stream correlation matrix and the multi-agent reinforcement learning model to update the model parameters and generate an optimized pricing strategy.

2. The real-time pricing method for supply chain carbon footprint based on multimodal data according to claim 1, characterized in that, The multimodal sensor data and value stream data are time-stamp aligned and feature-fused to generate a carbon stream-value stream coupled data sequence, including: The energy consumption data, vibration data, position data, and temperature data in the multimodal sensor data are subjected to outlier removal and missing value interpolation processing to generate a multimodal cleaned data sequence. For each type of data in the multimodal cleaned data sequence, a dynamic time warping algorithm is used to align the timestamps, so that all data sequences are mapped to a unified time reference axis, generating a multimodal synchronized data sequence; Sliding window feature extraction is performed on the multimodal synchronous data sequence. Within each sliding window, the sliding mean and sliding variance of energy consumption data, the root mean square value and peak factor of vibration data, the moving distance and average speed of location data, and the heating rate and temperature difference amplitude of temperature data are calculated to generate a multimodal feature vector sequence. The process cost data, resource utilization rate data and unit energy consumption revenue data in the value stream data are normalized respectively to generate a standardized value stream data sequence. Sliding window feature extraction is performed on the value stream standardized data sequence. Within each sliding window, the sliding mean and sliding standard deviation of process cost data, the sliding mean and sliding slope of resource utilization rate data, and the sliding mean and sliding coefficient of variation of unit energy consumption revenue data are calculated to generate a value stream feature vector sequence. The multimodal feature vector sequence and the value stream feature vector sequence are aligned window by window on the time axis, and the aligned feature vectors are concatenated and dimension reduced to generate a carbon stream-value stream coupled data sequence.

3. The real-time pricing method for supply chain carbon footprint based on multimodal data according to claim 1, characterized in that, Based on the aforementioned carbon flow-value flow coupled data sequence, a carbon flow-value flow coupled differential equation is established, including: The carbon flow intensity sequence and value flow intensity sequence of each supply chain node are extracted from the carbon flow-value flow coupled data sequence. The carbon flow intensity sequence represents the carbon emissions of the node per unit time, and the value flow intensity sequence represents the economic output value of the node per unit time. Perform first-order difference operations on the carbon flow intensity sequence and the value flow intensity sequence respectively to generate a carbon flow intensity change rate sequence and a value flow intensity change rate sequence; Construct a carbon flow-value flow coupled differential equation, the form of which is: The rate of change of value flow intensity equals the product of the value conversion efficiency factor and the rate of change of carbon flow intensity, plus the sum of the products of each external variable and its corresponding dynamic coupling coefficient, plus the product of the external control factor and the value flow intensity. The value conversion efficiency factor is calculated by the overall equipment efficiency and the yield rate of the process. The external variables include at least the external carbon price and the energy price. The dynamic coupling coefficient is calculated in real time by the sliding cross-correlation analysis between the carbon flow intensity sequence and the external variable sequence. The external control factor is obtained by fitting historical data. The carbon flow intensity change rate sequence, value flow intensity change rate sequence, external variable sequence, and the calculated value conversion efficiency factor, dynamic coupling coefficient, and external control factor are substituted into the carbon flow-value flow coupled differential equation. The equation is then subjected to parameter identification and residual verification to generate the calibrated carbon flow-value flow coupled differential equation.

4. The real-time pricing method for supply chain carbon footprint based on multimodal data according to claim 3, characterized in that, Constructing a carbon stream-value stream correlation matrix, including: Based on the calibrated carbon flow-value flow coupled differential equation, each supply chain node is classified into carbon flow type and value type. The carbon flow type includes direct carbon emissions, indirect carbon emissions and implicit carbon emissions, and the value type includes output value, material loss value and energy consumption value. Using the carbon flow type as the row dimension of the matrix and the value type as the column dimension of the matrix, an initial carbon flow-value flow association matrix is ​​constructed, and each element in the initial carbon flow-value flow association matrix is ​​initialized to zero. Extract the carbon flow type value and value type value of each supply chain node at each time stamp from the carbon flow-value flow coupled data sequence, and map the carbon flow type value and value type value to the corresponding row and column of the initial carbon flow-value flow association matrix according to the time stamp to generate a sparse-filled version of the carbon flow-value flow association matrix. For each element in the sparse-filled version, a weighted moving average algorithm within a sliding time window is used for smoothing to generate a smooth-filled version of the carbon flow-value flow correlation matrix. Based on the dynamic coupling coefficient in the calibrated carbon-value flow coupling differential equation, each element in the smooth-filled version is dynamically corrected. The dynamic correction method is to multiply the element value in the smooth-filled version by the dynamic coupling coefficient between the corresponding carbon flow type and the corresponding value type to generate a dynamically updated version of the carbon-value flow association matrix.

5. The real-time pricing method for supply chain carbon footprint based on multimodal data according to claim 1, characterized in that, Based on the aforementioned carbon flow-value flow correlation matrix, the carbon flow path entropy of each node in the supply chain is calculated, including: Extract the set of carbon flow transmission paths for each supply chain node from the carbon flow-value flow correlation matrix. Each carbon flow transmission path in the set is formed by connecting the starting node, intermediate node and ending node in chronological order, and corresponds to a carbon flow transmission value. For each carbon flow transmission path, calculate the proportion of its carbon flow transmission volume to the total carbon flow transmission volume of that starting node, and use this as the carbon flow proportion of that carbon flow transmission path. Substituting the carbon flow percentage into the Shannon entropy calculation formula, a path entropy component for each carbon flow transmission path is generated. The Shannon entropy calculation formula states that the path entropy component is equal to the negative of the product of the carbon flow percentage and the logarithm of the carbon flow percentage. The path entropy components of all carbon flow transmission paths at the same supply chain node are summed to generate the carbon flow path entropy of that node. The carbon flow path entropy is used to quantify the uncertainty and dispersion of the carbon flow transmission path at that node. Repeat the above steps for all nodes in the supply chain to generate a carbon flow path entropy sequence for each node, and then timestamp the carbon flow path entropy sequence with the carbon flow-value flow association matrix to generate a time-stamped carbon flow path entropy sequence.

6. The real-time pricing method for supply chain carbon footprint based on multimodal data according to claim 5, characterized in that, The carbon flow path entropy sequence is timestamped and linked to the carbon flow-value flow association matrix to generate a time-stamped carbon flow path entropy sequence, including: Each carbon flow path entropy value in the carbon flow path entropy sequence is assigned a corresponding timestamp identifier. The timestamp identifier is consistent with the timestamp of the carbon flow type value and the value type value of the corresponding node in the carbon flow-value flow association matrix, thus forming an alignment of the carbon flow path entropy and the carbon flow-value flow association matrix in the time dimension. The sliding window cross-validation method is used to calculate the Pearson correlation coefficient and Spearman rank correlation coefficient between the carbon flow path entropy value and the value of each element in the carbon flow-value flow association matrix in each sliding window, thereby generating a sequence of correlation coefficients between the carbon flow path entropy and the carbon flow-value flow association matrix. The carbon flow path entropy values ​​in the corresponding time window whose absolute values ​​are lower than the preset correlation threshold in the correlation coefficient sequence are marked as outliers, and the weighted average of the carbon flow path entropy values ​​in the adjacent valid windows before and after the time window is used to replace them, thereby generating a corrected carbon flow path entropy sequence. The modified carbon flow path entropy sequence is vector-concatenated with the carbon flow type value and value type value corresponding to the timestamp in the carbon flow-value flow association matrix to generate a time-tagged carbon flow path entropy sequence. Each element in the time-tagged carbon flow path entropy sequence includes a timestamp, a carbon flow path entropy value, a direct carbon emission value, an indirect carbon emission value, a hidden carbon emission value, an output value value, a material loss value value, and an energy consumption value value.

7. The real-time pricing method for supply chain carbon footprint based on multimodal data according to claim 1, characterized in that, The carbon flow path entropy, carbon flow value, and external carbon price are input into a multi-agent reinforcement learning model. This model employs the MADDPG algorithm and outputs a multi-level carbon pricing strategy, including: Each enterprise in the supply chain is defined as an agent in a multi-agent reinforcement learning model. An independent state space, action space, and reward function are constructed for each agent. The state space includes at least the carbon flow path entropy, carbon flow value, external carbon price, inventory level, and order backlog of the corresponding node of the agent. The action space includes at least the carbon tax adjustment range within the enterprise, the carbon trading price adjustment range between enterprises, and the carbon price adjustment range of the final product. The reward function is the sum of negative total carbon emissions, negative total cost, and negative carbon flow path entropy penalty term. For each agent, a policy network and a value network are constructed. The policy network is used to output an action based on the current state, and the value network is used to evaluate the expected cumulative reward of performing the action in the current state. Both the policy network and the value network adopt a fully connected neural network structure and include a batch normalization layer and residual connections. The policy network and value network are trained using a centralized training and distributed execution framework. The centralized training refers to providing each agent's value network with the state and action information of all agents during the training phase. The distributed execution refers to each agent's policy network outputting actions based solely on its own state information during the execution phase. During training, historical experience data is randomly sampled from the experience replay buffer. The historical experience data includes at least the current state, current action, immediate reward, next state, and training termination flag. The historical experience data is input into the value network to calculate the target value, and the value network parameters are updated by minimizing the mean square error between the value network output and the target value. The strategy network parameters are updated by maximizing the expected cumulative reward of the value network output; After training, the carbon flow path entropy, carbon flow value and external carbon price of each node are input into the policy network of each agent in real time. The policy network of each agent outputs the corresponding enterprise carbon tax value, inter-enterprise carbon trading quotation value and final product carbon price value, which are combined to generate a multi-level carbon pricing strategy.

8. The real-time pricing method for supply chain carbon footprint based on multimodal data according to claim 7, characterized in that, The reward function is the sum of negative total carbon emissions, negative total cost, and negative carbon flow path entropy penalty term, including: The negative total carbon emissions are defined as the negative number of the difference between the total carbon emissions of the corresponding node of the agent in the current time step and the preset carbon emission benchmark value. The total carbon emissions are obtained by summing the direct carbon emission values, indirect carbon emission values ​​and implicit carbon emission values ​​in the carbon flow-value flow correlation matrix. The preset carbon emission benchmark value is dynamically updated according to the moving average of the node's historical carbon emission data. The negative total cost is defined as the negative of the sum of the operating cost and carbon trading cost of the corresponding node of the agent in the current time step. The operating cost includes at least energy consumption cost and material loss cost, and the carbon trading cost is calculated based on the product of the carbon trading volume of the node in the current time step and the external carbon price. The negative carbon path entropy penalty term is defined as the inverse of the deviation between the carbon path entropy of the corresponding node of the agent in the current time step and the preset carbon path entropy threshold. The preset carbon path entropy threshold is dynamically calculated based on the historical moving average and moving standard deviation of the carbon path entropy of all nodes in the supply chain. When the carbon path entropy exceeds the preset carbon path entropy threshold, the absolute value of the negative carbon path entropy penalty term increases, so as to guide the agent to give priority to the pricing strategy with lower carbon path uncertainty. The agent's immediate reward is generated by weighted summation of the negative total carbon emissions, negative total cost, and negative carbon flow path entropy penalty term. The weight coefficients of the weighted summation are adaptively adjusted during training using a Bayesian optimization algorithm.

9. The real-time pricing method for supply chain carbon footprint based on multimodal data according to claim 7, characterized in that, The policy network and value network are trained using a centralized training and distributed execution framework, including: During the centralized training phase, a central value network set is constructed, which contains a central value network with the same number of agents. The input of each central value network is a concatenated vector of the state information of all agents and a concatenated vector of the action information of all agents, and the output is the expected cumulative reward estimate of the corresponding agent. In each training iteration of the centralized training phase, a batch of historical experience data is sampled from the experience replay buffer. The historical experience data includes the current state, current action, immediate reward, next state, and training termination flag of all agents. The current state and current action are input into the central value network set to calculate the expected cumulative reward estimate in the current state. The next state and the next action are input into the central value network set to calculate the target cumulative reward valuation in the next state; The parameters of the central value network set are updated by minimizing the mean square error between the expected cumulative reward estimate and the target cumulative reward estimate; In each training iteration of the centralized training phase, the current state of each agent is input into the corresponding policy network, the current action is output, and the current action is input into the corresponding central value network to calculate the expected cumulative reward estimate. The parameters of the agent's policy network are updated by maximizing the expected cumulative reward estimate using the gradient ascent method. During the distributed execution phase, the central value network set is removed from each agent, and each agent retains only its own policy network. Each policy network independently outputs actions based on its own node's carbon flow path entropy, carbon flow value, and external carbon price. These actions include adjustments to the carbon tax within the enterprise, adjustments to the carbon trading price between enterprises, and adjustments to the carbon price of the final product.

10. A real-time pricing system for supply chain carbon footprint based on multimodal data, characterized in that, The system applicable to the method of any one of claims 1 to 9 comprises: The multimodal data acquisition module is used to acquire multimodal sensor data and value stream data from each node of the supply chain. The multimodal sensor data includes at least energy consumption data, vibration data, location data, and temperature data, and the value stream data includes at least process cost data, resource utilization rate data, and unit energy consumption revenue data. The data fusion module, connected to the multimodal data acquisition module, is used to perform timestamp alignment and feature fusion on the multimodal sensor data and value stream data to generate a carbon stream-value stream coupled data sequence; The coupling analysis module, connected to the data fusion module, is used to establish carbon flow-value flow coupled differential equations based on the carbon flow-value flow coupled data sequence and to construct a carbon flow-value flow correlation matrix. The path entropy calculation module, connected to the coupled analysis module, is used to calculate the carbon flow path entropy of each node in the supply chain based on the carbon flow-value flow correlation matrix. The pricing decision module is connected to the path entropy calculation module and is used to input the carbon flow path entropy, carbon flow value and external carbon price into the multi-agent reinforcement learning model. The multi-agent reinforcement learning model adopts the MADDPG algorithm and outputs a multi-level carbon pricing strategy. The multi-level carbon pricing strategy includes at least the enterprise carbon tax, the inter-enterprise carbon trading price and the final product carbon price. The strategy execution module, connected to the pricing decision module, is used to distribute the multi-level carbon pricing strategy to the supply chain system for execution and to obtain the actual carbon emission data, cost data and value stream data after execution. The feedback update module is connected to the strategy execution module, the coupling parsing module, and the pricing decision module, respectively. It is used to feed back the actual carbon emission data, cost data, and value stream data to the carbon stream-value stream correlation matrix and the multi-agent reinforcement learning model, update the model parameters, and generate an optimized pricing strategy.