A method and system for optimizing the arrangement of hull structure stress sensors

By combining deep reinforcement learning and inverse finite element analysis, the arrangement of stress sensors on the FPSO hull structure was optimized, solving the problems of limited strategy space and insufficient engineering constraints in the existing technology. This enabled the deployment of high-precision, low-cost stress sensors, improving the robustness and feasibility of the monitoring system.

CN122242070APending Publication Date: 2026-06-19QILU UNIVERSITY OF TECHNOLOGY (SHANDONG ACADEMY OF SCIENCES)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
QILU UNIVERSITY OF TECHNOLOGY (SHANDONG ACADEMY OF SCIENCES)
Filing Date
2026-05-20
Publication Date
2026-06-19

Smart Images

  • Figure CN122242070A_ABST
    Figure CN122242070A_ABST
Patent Text Reader

Abstract

This invention proposes a method and system for optimizing the arrangement of stress sensors on a ship's hull structure, belonging to the field of marine engineering structural health monitoring and intelligent sensing technology. The method includes: obtaining the structural mechanical characteristics and engineering constraints of candidate stress sensor locations in the hull structure of a floating production storage and offloading (FPSO) unit based on finite element analysis; constructing a reinforcement learning problem model and training environment based on these characteristics, where the training environment receives sensor layout adjustment instructions to update the layout and includes an inverse finite element solver for performance evaluation of the stress sensor layout schemes; iteratively optimizing the stress sensor layout scheme of the hull structure through interactive training between the reinforcement learning problem model and the environment until the training converges and outputs a stress sensor layout strategy that meets multi-objective optimization requirements. This invention can rapidly generate high-precision, low-cost, and engineering-feasible stress sensor layout schemes for hull structures under typical operating conditions and engineering constraints of FPSO units.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of marine engineering structural health monitoring and intelligent sensing technology, and particularly relates to a method and system for optimizing the arrangement of stress sensors for ship hull structures. Background Technology

[0002] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art.

[0003] Floating Production Storage and Offloading (FPSO) units are core equipment for deepwater and ultra-deepwater oil and gas development. During long-term service, FPSOs are typically in a fixed or semi-moored state, continuously subjected to complex marine environmental loads such as waves, wind, and currents. They also face structural load redistribution issues caused by frequent changes in crude oil loading and unloading conditions. Their hull structures are subjected to significant fatigue and cumulative damage over long periods, posing serious challenges to structural integrity and service safety. Therefore, structural health monitoring of FPSOs is necessary, and the effectiveness of this monitoring is highly dependent on the deployment scheme of the stress sensor network.

[0004] Existing sensor network optimization methods using reinforcement learning face the dilemma of the curse of dimensionality in action space and the difficulty in effectively embedding engineering constraints. They lack a mechanism to deeply integrate physical evaluation with the training process, resulting in slow convergence, a large number of invalid explorations, and an inability to solve the high-dimensional discrete optimization problem of FPSO stress sensor deployment. Summary of the Invention

[0005] To overcome the shortcomings of the prior art, this invention proposes a method and system for optimizing the arrangement of stress sensors on hull structures, in order to solve the problems of limited strategy space, insufficient adaptability to complex working conditions, and insufficient consideration of engineering constraints in the optimization of stress sensor layout on FPSO hull structures.

[0006] To achieve the above objectives, one or more embodiments of the present invention provide the following technical solutions: In a first aspect, the present invention discloses a method for optimizing the arrangement of stress sensors on a ship's hull structure, comprising: Based on finite element analysis, the structural mechanical characteristics and engineering constraint characteristics of candidate stress sensor locations in the hull structure of floating production storage and offloading (FPSO) units are obtained. Based on the aforementioned features, a reinforcement learning problem model and training environment are constructed. The training environment receives sensor layout adjustment instructions to update the layout and includes an inverse finite element solver for evaluating the performance of stress sensor layout schemes. Through interactive training between the reinforcement learning problem model and the environment, the layout scheme of stress sensors for the hull structure of the floating production storage and offloading unit is iteratively optimized until the training converges and outputs a stress sensor arrangement strategy that meets the multi-objective optimization requirements. The interactive training process includes: generating stress sensor layout adjustment decisions based on the reinforcement learning problem model using a deep reinforcement learning policy network; executing the decisions and calling the inverse finite element solver to evaluate the performance of the stress sensor layout; calculating a reward signal based on the performance evaluation results; and updating the policy network parameters using a cascaded constraint-aware dual-duel deep Q-network algorithm; the network algorithm includes a global decision network, a region optimization network, a constraint projection layer, and a priority experience replay unit, until the policy network converges.

[0007] Secondly, the present invention discloses a system for optimizing the arrangement of stress sensors for ship hull structures, comprising: The hull structure feature pre-calculation module is used to obtain the structural mechanical characteristics and engineering constraint characteristics of candidate stress sensor locations in the hull structure of a floating production storage and offloading unit based on finite element analysis. The hull stress sensor layout simulation environment module is used to construct a reinforcement learning problem model and training environment based on the features. The training environment receives sensor layout adjustment instructions to update the layout and includes an inverse finite element solver for performance evaluation of stress sensor layout schemes. The multi-objective stress sensor layout optimization intelligent agent module is used to iteratively optimize the stress sensor layout scheme of the hull structure of the floating production storage and offloading unit through interactive training between the reinforcement learning problem model and the environment, until the training converges and outputs a stress sensor arrangement strategy that meets the multi-objective optimization requirements. The interactive training process includes: generating stress sensor layout adjustment decisions based on the reinforcement learning problem model using a deep reinforcement learning policy network; executing the decisions and calling the inverse finite element solver to evaluate the performance of the stress sensor layout; calculating a reward signal based on the performance evaluation results; and updating the policy network parameters using a cascaded constraint-aware dual-duel deep Q-network algorithm; the network algorithm includes a global decision network, a region optimization network, a constraint projection layer, and a priority experience replay unit, until the policy network converges.

[0008] Thirdly, the present invention discloses an electronic device, including a memory and a processor, and computer instructions stored in the memory and running on the processor, wherein the computer instructions, when run by the processor, complete the steps of the above-mentioned method for optimizing the arrangement of stress sensors for ship structures.

[0009] Fourthly, the present invention discloses a computer-readable storage medium for storing computer instructions, which, when executed by a processor, complete the steps of the above-described method for optimizing the arrangement of stress sensors for ship structures.

[0010] Compared with the prior art, the beneficial effects of the present invention are as follows: This invention integrates an inverse finite element method solver into the FPSO hull structure stress sensor layout simulation environment. It enables efficient and accurate performance reconstruction and error calculation for any stress sensor layout scheme without relying on precise load distribution or material parameters, providing reliable reward feedback. Simultaneously, a multi-objective reward function is constructed for the FPSO hull structure stress sensor layout optimization scenario, allowing the agent to autonomously find the optimal balance between stress reconstruction accuracy, cost, and feasibility. The CC-D3QN algorithm employs a two-stage cascaded reinforcement learning mode: first, a global decision-making stage for the FPSO hull structure is executed to generate a regional-level stress sensor quantity allocation scheme; then, under allocation constraints, a regional-level point optimization stage is executed, thereby achieving more refined stress sensor quota allocation and point optimization, further improving the overall performance and engineering feasibility of the stress sensor network.

[0011] Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0012] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.

[0013] Figure 1 This is an overall flowchart of the method for optimizing the arrangement of stress sensors in a ship structure as described in Embodiment 1 of the present invention.

[0014] Figure 2 This is a schematic diagram of the internal structure of the reinforcement learning simulation training environment described in Embodiment 1 of the present invention.

[0015] Figure 3 This is a flowchart of the environmental physics assessment engine described in Embodiment 1 of the present invention.

[0016] Figure 4 This is a schematic diagram of the CC-D3QN algorithm structure described in Embodiment 1 of the present invention.

[0017] Figure 5 This is a schematic diagram showing the optimized arrangement of stress sensors for the FPSO hull structure as described in Embodiment 1 of the present invention.

[0018] Figure 6 This is a block diagram of the optimized arrangement system of ship structure stress sensors described in Embodiment 2 of the present invention. Detailed Implementation

[0019] It should be noted that the following detailed descriptions are exemplary and intended to provide further illustration of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0020] It should be noted that the terminology used herein is for the purpose of describing particular implementations only and is not intended to limit the exemplary implementations of the present invention.

[0021] Where there is no conflict, the embodiments and features in the embodiments of the present invention can be combined with each other.

[0022] With the rapid development of intelligent and green low-carbon technologies in marine engineering equipment, Floating Production Storage and Offloading (FPSO) units, as core equipment for deep-water and ultra-deep-water oil and gas development, have been widely used in offshore oil and gas production, storage, and offloading scenarios. To ensure the safe and stable operation of FPSOs throughout their entire lifecycle, real-time and accurate monitoring of their hull structural condition is of significant engineering importance. The monitoring efficiency of an FPSO hull structural health monitoring system is highly dependent on the arrangement of the stress sensor network. The number of stress sensors and their spatial distribution within the hull structure directly determine whether the system can effectively acquire key mechanical information reflecting the structural stress state and damage evolution characteristics, thus affecting the accuracy of structural response reconstruction and health assessment results. Therefore, it is necessary to scientifically optimize the design of the stress sensor network while ensuring engineering feasibility, to achieve a reasonable balance between monitoring accuracy, system robustness, and deployment cost. However, how to combine the complex geometric characteristics of the FPSO hull structure and its actual operating conditions under multiple working conditions and load coupling to formulate an optimal stress sensor arrangement scheme suitable for engineering practice remains a pressing technical challenge.

[0023] Regarding the optimization of stress sensor network layout in FPSO hull structural health monitoring, existing traditional research methods can be mainly divided into three categories: The first category is layout methods based on mechanical priors and gradient analysis. These methods typically rely on finite element models to calculate the structural strain energy density or stress gradient distribution, thereby deploying sensors in high-sensitivity regions. However, this method is highly dependent on load conditions and boundary conditions, making it difficult to adapt to the complex and variable marine environment experienced by FPSOs during actual service. The second category is search methods based on intelligent optimization algorithms, such as genetic algorithms and particle swarm optimization, which determine the optimal combination of sensors through global search. However, their computational cost increases significantly with the increase in structural scale and the number of candidate sensors, and it is difficult to achieve coordinated optimization of stress sensors in multiple regions and at multiple levels in engineering applications. The third category is layout methods based on design specifications or engineering experience. These methods are simple to implement but lack systematic optimization basis, making it difficult to achieve an effective trade-off between monitoring performance and deployment cost.

[0024] As mentioned earlier, optimizing the deployment of sensor networks for FPSO hull structural stress health monitoring faces multiple challenges. First, during long-term mooring and service, FPSOs must simultaneously withstand various complex conditions, including wave loads, mooring loads, and changes in production loading. The stress response of the hull structure exhibits significant uncertainty, making existing optimization methods based on single conditions or simplified assumptions insufficient to generate robust, high-precision monitoring solutions. Second, the deployment of FPSO hull structural stress sensors is subject to stringent engineering constraints, including cabin space, installation accessibility, explosion protection, and tolerance to the marine environment. Existing methods often involve post-hoc checks or simple eliminations, resulting in solutions with good theoretical performance but impractical engineering, or excessively sacrificing performance to meet feasibility requirements. Finally, the optimization objectives for FPSO hull structural stress monitoring involve multi-dimensional conflicts between reconstruction accuracy, sensor quantity and cost, and engineering feasibility. Traditional single-objective or fixed-weight methods struggle to flexibly balance accuracy, cost, and feasibility.

[0025] Therefore, this invention proposes a method for optimizing the arrangement of stress sensors on FPSO hull structures based on deep reinforcement learning. Addressing the evaluation challenges caused by complex and variable operating conditions, the inverse finite element method is introduced to evaluate the performance of the hull structure's stress response. This allows for performance reconstruction of any sensor layout without needing to know the specific external load conditions simulated during the evaluation. Furthermore, a deep reinforcement learning environment for FPSO hull structure stress monitoring is constructed. This environment integrates prior information from structural mechanics with engineering constraints, defines discrete layouts and quantity adjustment actions, and establishes a multi-objective reward mechanism based on stress reconstruction error, cost, and constraint satisfaction. This enables the agent to learn autonomously through trial and error and interaction, thereby autonomously exploring a high-precision, low-cost, and highly feasible FPSO hull structure stress sensor arrangement strategy in a high-dimensional solution space.

[0026] Example 1 In one or more embodiments, a method for optimizing the arrangement of stress sensors for ship hull structures is disclosed, such as... Figure 1 As shown, it includes the following steps: Step S1: Based on finite element analysis, obtain the structural mechanical characteristics and engineering constraint characteristics of the candidate stress sensor locations in the hull structure of the floating production storage and offloading unit.

[0027] First, a set of candidate locations for stress sensors on the FPSO hull structure is determined. In this embodiment, the determination of candidate locations is based on the FPSO hull finite element model and structural design specifications. Specifically, the mesh nodes representing the main load-bearing components in the target FPSO three-dimensional finite element model are defined as the initial set of candidate locations. Further, according to the FPSO structural design specifications, nodes located in stress concentration or high-risk fatigue damage areas such as the intersection of transverse and longitudinal bulkheads in the cargo oil tank area, the upper module support structure area, the mooring equipment connection area, and the waterline fluctuation zone are selected from the initial set of candidate locations to form the final set of candidate locations for stress sensors used in optimization calculations.

[0028] Then, the structural mechanical feature vectors of the candidate locations of the stress sensors are calculated. Specifically, based on the finite element model of the FPSO hull, at least two typical FPSO service conditions are defined, including full-load oil storage condition and ballast navigation condition; finite element analysis is performed on each condition to obtain the structural stress response field and extract the element strain energy density of each candidate location; weight coefficients are assigned according to the occurrence probability or monitoring importance of each condition, and the strain energy density of each location under all conditions is weighted and averaged to obtain the comprehensive strain energy density; the comprehensive strain energy density of all locations is normalized to generate a stress sensor structural mechanical feature vector corresponding to each candidate location. ,in, The numerical value represents the first The relative stress importance of each candidate location under all typical working conditions is indicated by the larger the value, the higher the importance of stress monitoring at that location.

[0029] The engineering constraint feature vector of the candidate stress sensor locations is quantified. Specifically, based on FPSO design drawings, outfitting specifications, and engineering maintenance experience, multiple installability evaluation criteria are set for each candidate location. These criteria include location accessibility, environmental tolerance, structural continuity, wiring convenience, and electromagnetic compatibility. Each candidate location is independently scored for each criterion, with scores ranging from 0 to 1 (continuous or discrete levels), where higher values ​​indicate better installability under that criterion. For each candidate location, its scores under all evaluation criteria are comprehensively calculated, and the minimum or weighted average of the scores for each sub-item is taken to obtain the comprehensive installability score for that location. This generates an FPSO stress sensor engineering constraint feature vector corresponding one-to-one with the set of candidate sensor locations. ,in, The numerical value represents the first The engineering feasibility of each candidate location is indicated by a lower value, which represents stronger constraints and poorer installability at that location.

[0030] Finally, the structural mechanical feature vector of the stress sensor is... Engineering constraint eigenvectors The pre-calculated mechanical features and engineering constraints, along with their corresponding candidate position geometric coordinates, are stored in a standardized data format as a fixed prior knowledge base for constructing state vectors during subsequent reinforcement learning training. In this embodiment, before reinforcement learning training begins, the pre-calculated mechanical features and engineering constraints are saved as a fixed prior knowledge base, which will be directly called later to avoid repeatedly calling external simulation software and improve training speed.

[0031] Step S2: Construct a reinforcement learning problem model and training environment based on the features. The training environment receives sensor layout adjustment instructions output by the policy network to update the layout and includes an inverse finite element solver for performance evaluation of stress sensor layout schemes.

[0032] Step S2-1: Construct a reinforcement learning problem model, specifically: model the stress sensor optimization layout problem as a two-stage cascaded reinforcement learning decision-making process including a global decision-making stage and a regional layout optimization stage; construct a state space including the current stress sensor layout information, the structural mechanical features, and engineering constraint features; construct an action space for adjusting the stress sensor layout; and construct a multi-objective reward function based on stress reconstruction performance and cost.

[0033] Specifically, a reinforcement learning problem model for optimizing the arrangement of stress sensors on an FPSO hull structure is constructed, including: defining the state space as a multi-dimensional vector fusing current stress sensor layout information, corresponding hull structural mechanical features, and engineering constraint features; the action space as a discrete set of stress sensor layout adjustment operations; and the reward function as a multi-objective function based on stress reconstruction error, network cost, and engineering constraint satisfaction, upon which a corresponding deep reinforcement learning policy network is constructed. The specific steps include: Step S2-1-1: Based on the structural characteristics of the FPSO hull and the current layout information, construct the state space of the global decision-making layer.

[0034] Specifically, for a pre-defined region in a fixed prior knowledge base Construct a three-dimensional subvector ,in, This indicates the deployment density of stress sensors in this area. This is a fixed constant, representing the inherent monitoring value of the area, obtained by analyzing the pre-calculated stress characteristic vector of the hull structure. The average is obtained. A fixed constant, representing the inherent installation difficulty of the area, is determined by the pre-calculated engineering constraint feature vector. Taking the average, The subvectors of each region are concatenated in region index order to obtain the region feature and deployment status vector. .

[0035] Furthermore, based on the feature values ​​of the locations of all currently deployed stress sensors, a four-dimensional vector is constructed. ,in, These represent the mean and variance of the stress importance and engineering constraint characteristic values ​​of the hull structure for deployed stress sensors.

[0036] The final state vector is the concatenation of the two parts mentioned above. 3 in total +4 dimensions.

[0037] Step S2-1-2: Define the action space of the global decision-making layer.

[0038] The action space of the global decision-making agent is defined as a space containing... A low-dimensional discrete set of actions Each action corresponds to a pre-defined macro-adjustment strategy with clear engineering semantics.

[0039] Specifically, define the action arrive In the current area Add a stress sensor and define the action. To reduce the number of sensors by one in the area with the largest number of deployed sensors, define an action. In the region with the most stringent engineering constraints, one stress sensor is removed, resulting in a final motion vector of [number missing]. +2 dimensions.

[0040] Step S2-1-3: Based on the target area stress sensor quota obtained from the above global decision layer, construct the regional optimization layer.

[0041] Specifically, a certain area All candidate locations of stress sensors within the system are defined as a binary deployment vector. ,in Indicates position Sensors have been deployed. This indicates that it has not been deployed. The vector represents the total number of candidate locations in the region. Defined as the state space of the agent within the region. The action space of the agent optimizing within the region. Let be a discrete set, defined as follows: if a deployed stress sensor is moved to another undeployed location within the same area, the number of deployed stress sensors within that area is determined by the global decision layer. The size of the action space is .

[0042] To ensure that the agent's actions meet engineering constraints, a constraint projection layer mechanism is introduced. This is based on the engineering constraint feature vectors of candidate positions. and preset threshold Construct an action mask matrix For candidate positions ,if Then the corresponding action mask Set to 0 otherwise, indicating an optional action; the raw Q-value output by the policy network. With mask matrix The effective Q value is obtained by multiplying elements one by one. The Q value of inoperable actions is assigned negative infinity or minimum value to ensure that the agent selects actions only at engineering feasible candidate positions, thereby achieving direct perception of engineering constraints.

[0043] Step S2-1-4: Define the reward function for the two-stage cascaded reinforcement learning.

[0044] Global rewards rely solely on the inherent monitoring value, installation difficulty, and sensor quantity quotas of each region to guide regional quota decisions; regional local rewards are based on the inverse finite element method (FEM). iFEM The calculated local reconstruction error is used to guide the optimization of specific locations. The two-stage reward functions are independent of each other, achieving cascaded optimization.

[0045] The stress sensor optimization placement process is modeled as two sequentially executed reinforcement learning decision-making stages, with the reward function for each stage defined independently.

[0046] The first stage is the global decision-making stage, used to generate a stress sensor allocation scheme for each structural region. The reward function for this stage does not directly depend on the specific sensor location layout, but is defined based on the inherent monitoring value of the extracted hull structural mechanical feature vectors, the inherent installation difficulty of the extracted engineering constraint feature vectors, and the performance proxy index constructed from the sensor quantity quota output by the agent. Specifically, the reward function for the global decision-making stage is:

[0047] in, For the reward function in the global decision-making stage; This represents the total number of stress sensors. The quota quantity for areas where engineering constraints are not met, i.e., the characteristic value of engineering constraints. Below the preset threshold The number of sensors, These are the weighting coefficients. The preset budget limit for the number of stress sensors. To prevent division by zero for extremely small constants, As a performance evaluation index used to approximate the rationality of the regional-level sensor quantity configuration, it is defined as the weighted sum of the ratios of the structural importance of each region to the stress sensor quantity quota, i.e.:

[0048] Among them, according to the pre-division For each region, the structural stress importance index is pre-calculated. Importance index of structural stress Based on the structural mechanics feature vectors corresponding to the candidate sensor locations in this region Statistical aggregation yields results that can be calculated using either the mean or variance. The global agent outputs the stress sensor quantity quota for each region. , For regional weights.

[0049] The second stage is the regional point location optimization stage, which optimizes the specific stress sensor locations under the given regional stress sensor quantity quota constraint. The reward function in this stage is directly defined based on the local structural response reconstruction error calculated by the inverse finite element method, and its form is:

[0050] in, The reward function for the region optimization phase; This is the proportionality coefficient; This represents a localized stress reconstruction error. Specifically, for the target area... The inverse finite element method solver is invoked to calculate the local stress reconstruction error in this region. The above method enables independent optimization of the two-stage reinforcement learning model.

[0051] This invention designs a multi-dimensional state space for stress monitoring of FPSO hull structures, which includes the current stress sensor layout state, the mechanical characteristics of the hull structure at candidate locations, and quantitative information on engineering constraints. This enables the agent to simultaneously perceive key stress areas of the structure, engineering feasibility conditions, and specific stress monitoring optimization preferences in each decision-making step, thereby significantly improving the accuracy and engineering feasibility of the stress sensor layout scheme.

[0052] Step S2-2: Based on the problem model, construct a reinforcement learning simulation training environment for optimizing the arrangement of stress sensors on the FPSO hull structure. This training environment integrates an inverse finite element solver, and its internal structure is as follows: Figure 2 As shown, the simulation training environment is used to receive stress sensor layout adjustment instructions output by the policy network and return the performance evaluation results of the corresponding layout. The output of the inverse finite element solver is used as the numerical evaluation of the environment to construct the reinforcement learning reward signal. At the same time, it supports the sequential invocation and independent training of multiple reinforcement learning decision stages.

[0053] The establishment of the environment includes the following steps: Step S2-2-1: Build the environment state manager.

[0054] The environment state manager is responsible for calculating and generating reinforcement learning state vectors at different levels in real time based on the current stress sensor layout and FPSO hull structural characteristics.

[0055] Specifically, when serving the global decision-making layer, the state manager calculates the global state vector according to the definition in step S2-1-1. First, the deployment density of each region is calculated based on the stress sensor layout at the current time step. Extract the inherent monitoring value of each region from the pre-stored feature database. and inherent installation difficulty Concatenate to form a state vector Then, based on the characteristic values ​​of the locations of all currently deployed stress sensors, the mean and variance are calculated to construct a state vector. Finally, and The state space is obtained by splicing. When serving the region optimization layer, the state manager generates a state vector containing only 0s and 1s based on the deployment status of all candidate locations in the specified region. .

[0056] Step S2-2-2: Build the environment action processor.

[0057] The environment action processor is responsible for receiving and executing the action instructions of the intelligent agent.

[0058] Specifically, regarding the actions of the overall decision-making level The action processor parses it according to the definition in S2-1-2. If the action is... The processor then checks whether the current deployment quantity in the area has reached the upper limit and whether the total budget allows it. If the conditions are met, the number of stress sensors in the area is increased. Add 1 if the action is The processor identifies the regions that meet the conditions and performs a decrement operation. If the action is... The processor then identifies the deployed sensors. The lowest region is selected, and the quantity is reduced by 1. For region optimization layer actions, before executing agent actions, the action processor first calculates the action mask M based on the constraint feature vector C, and feeds the action mask M back to the agent as environmental state information. The constraint projection layer then uses this mask to mask the original Q value output by the policy network. At the same time, the action processor only executes actions that conform to engineering constraints, ensuring that action execution is limited to feasible locations. Subsequently, the binary deployment state of the corresponding location points within the specified region is updated.

[0059] Step S2-2-3: Build the environment layout generator.

[0060] The environment layout generator receives quota instructions from the global agent. First, it adjusts the number of stress sensors in each region to the target value using deterministic rules. Then, it calls on the optimization agents in each region to finely adjust the point layout.

[0061] Specifically, its workflow consists of two phases. The first phase is quota alignment, which occurs when the updated regional quota scheme is received. Then, obtain its current layout vector. And calculate the current number of stress sensors. ,like It is necessary to increase Based on pre-stored comprehensive scoring rules, the sensor with the highest score is selected. 1 location, place these locations in The middle setting is 1. and The details of that situation will not be elaborated further. The second stage is layout fine-tuning, where the agent sequentially calls the state manager and action processor to update the stress sensor layout. The two-stage process for each region is scheduled and executed in parallel by the layout generator. Finally, the layout generator aggregates all the data. This is combined to form the overall ship layout.

[0062] Step S2-2-4: Build an environmental physics assessment engine.

[0063] The environmental physics assessment engine integrates the inverse finite element method (FEM). iFEM The solver's evaluation process is as follows: Figure 3 As shown.

[0064] Specifically, the aforementioned The rapid solution process can be mathematically reduced to solving for the stress sensor layout. For a linear system with parameters, the above system is shown in the formula:

[0065] in, For a system matrix that depends on the layout, The right-hand term vector depends on the layout and virtual measurement strain. This system generates virtual measurement data based on a pre-calculated theoretical strain field and noise model. The solution process involves only linear algebraic operations, eliminating the need for iterative or repeated runs of external finite element analysis, thus allowing for efficient integration into reinforcement learning training environments.

[0066] Step S2-2-5: Construct an environment reward function calculator.

[0067] The environmental reward calculator calculates reward signals for each level based on the output of the physics assessment engine and the current layout information.

[0068] Specifically, for the global decision-making level, the reward calculator obtains the evaluation indicators for calculation. Total number of stress sensors The global reward is calculated according to the formula defined in step S2-1-4. For the region optimization layer, the reward calculator obtains the local stress reconstruction error of the corresponding region. The local reward is calculated according to the formula, and all reward signals will serve as immediate feedback from the environment to the corresponding agent's actions.

[0069] Furthermore, the inverse finite element solver, as part of the reinforcement learning simulation environment, performs numerical reconstruction of the stress response of the FPSO hull structure based on strain data measured by stress sensors under the layout or virtual measurement data generated based on pre-calculated theoretical strain and noise models, without relying on external load information, and generates a reconstruction error index; the reconstruction error index is used as the numerical evaluation result of the environment to construct the reinforcement learning reward signal.

[0070] It should be understood that before reinforcement learning training begins, the theoretical strain field of the FPSO under several typical operating conditions is pre-calculated through forward finite element simulation. During the training interaction, the agent outputs a stress sensor layout scheme. The simulation environment extracts the corresponding strain values ​​from the theoretical strain field based on these node coordinates and adds noise to simulate the real sensor physical errors, i.e., virtual measurement data. Based on the current layout and virtual measurement data, the environment obtains the reconstructed stress field by solving a system of linear algebraic equations. Finally, the root mean square error, i.e., the reconstruction error index, is calculated using the reconstructed stress field and the theoretical strain field. The smaller the error, the more accurate the reconstruction. This index is then used to calculate the reward.

[0071] Step S3: Through interactive training between the reinforcement learning problem model and the environment, iteratively optimize the layout scheme of stress sensors for the hull structure of the floating production storage and offloading (FPSO) unit until the training converges and outputs a stress sensor arrangement strategy that meets the multi-objective optimization requirements. The interactive training process includes: generating stress sensor layout adjustment decisions based on the reinforcement learning problem model using a deep reinforcement learning policy network; executing the decisions and calling the inverse finite element solver to evaluate the performance of the corresponding stress sensor layout; calculating a reward signal based on the performance evaluation results; and updating and optimizing the policy network parameters using a cascaded constraint-aware dual-duel deep Q-network algorithm; the network algorithm includes a global decision network, a region optimization network, a constraint projection layer, and a priority experience replay unit, until the policy network converges.

[0072] Specifically, based on the unified training environment constructed in step S2-2, an iterative loop of "decision-optimization-evaluation-learning" is executed. Based on the state space and multi-objective reward function, a deep reinforcement learning policy network is used to generate stress sensor layout adjustment decisions. These decisions are executed, and the inverse finite element solver is invoked to evaluate the performance of the adjusted stress sensor layout scheme. A cascaded constraint-aware dual-duel deep Q-network (CC-D3QN) algorithm is used to collaboratively train the global decision-making layer and the regional optimization layer agents based on a cascaded decision-making mechanism. A reward signal is calculated based on the performance evaluation results, and the parameters of the deep reinforcement learning policy network are updated according to the reward signal until the policy network converges. After training, a stress sensor layout generation strategy for the FPSO hull structure that meets the multi-objective optimization requirements is output. The generated stress sensor arrangement scheme is as follows: Figure 5 As shown.

[0073] Furthermore, in order to solve the complex problem of optimizing the arrangement of stress sensors on the FPSO hull structure, this invention proposes a Cascaded Constrained Perception Dual-Duel Deep Q-Network (CC-D3QN) algorithm as a policy learning framework for the agent, based on the Dueling Double Deep Q-Network (D3QN) algorithm.

[0074] It should be understood that the D3QN algorithm, by decomposing the Q-value function into a state value function and an action advantage function for separate estimation, can maintain sensitivity to key high-value states in decision-making problems with large action spaces and significant differences in state values. This makes it particularly suitable for discrete optimization tasks such as stress sensor allocation and point relocation. Furthermore, D3QN employs a Double Q-learning structure, introducing an independent target network to calculate the target Q-value, effectively mitigating the Q-value overestimation problem that traditional DQN is prone to in complex environments, thus improving the stability and convergence efficiency of the training process. These characteristics give D3QN good fundamental adaptability when integrating FPSO hull structure features with reinforcement learning decision-making frameworks.

[0075] However, in the specific application scenario of optimizing the arrangement of stress sensors on FPSO hull structures, there are still problems such as strong engineering constraints, a high proportion of invalid actions, and high computational cost of inverse finite element physical evaluation. If traditional D3QN is used directly for training, it is easy to lead to a large number of invalid explorations and the reuse of low-value samples, thereby affecting the overall training efficiency.

[0076] To address the aforementioned problems, this invention introduces an engineering constraint-aware action space masking mechanism based on D3QN, proposing a Cascade Constraint-aware Dueling Double Deep Q-Network (CC-D3QN) algorithm, the network structure of which is as follows: Figure 4 As shown. This algorithm, while maintaining the original value and advantage decomposition structure of D3QN, integrates engineering feasibility constraint information during the action selection phase. Candidate actions that violate sensor installation conditions or engineering specifications are removed from the action space, ensuring that the agent only makes decisions within the engineering feasible domain, effectively reducing ineffective exploration and improving training efficiency. In this embodiment, the stress sensor optimization layout process is divided into multiple sequentially executed reinforcement learning decision-making stages. The output of the previous decision-making stage serves as the constraint or input parameter for the next decision-making stage, forming a cascaded optimization process. These multiple reinforcement learning decision-making stages include at least: a global decision-making stage applicable to generating stress sensor quantity allocation schemes in different regions of the FPSO hull structure; and a regional layout optimization stage for optimizing specific stress sensor installation locations under the constraints of the stress sensor quantity allocation scheme.

[0077] The deep reinforcement learning policy network employs the Cascaded Constraint-Aware Dual-Battle Deep Q-Network (CC-D3QN) algorithm, which includes: The global decision network generates global assignment actions based on the global state vector in the state space. The global state vector includes the stress sensor deployment density, inherent monitoring value, inherent installation difficulty, and the mean and variance of the structural stress importance and engineering constraint characteristic values ​​for each structural region. The regional optimization network generates local adjustment actions based on local state vectors in the state space under the constraint of global allocation actions; wherein, the local state vectors include binary vectors that characterize the deployment state of each candidate stress sensor location within the corresponding structural region. The constraint projection layer performs feasibility masking on the action selection space of the region optimization network based on engineering constraint features, prohibiting the selection of placement actions that do not conform to engineering constraints. Located at the action value output end of the region optimization network, the constraint projection layer performs feasibility masking on the action selection space based on engineering constraint features. Its specific execution process is as follows: First, based on the engineering constraint feature vector of the candidate positions... And the preset feasibility judgment threshold Construct an action mask vector that is consistent with the dimension of the action space. For the first in the action space The rules for generating the mask elements for each action are as follows:

[0078] in, For the first i A mask element for an action.

[0079] Subsequently, the constrained projection layer receives the original state-action value vector output by the optimized network. and using the action mask vector Project it onto the target to generate an effective action value vector. Its computational logic can be expressed as:

[0080] In the formula, This indicates element-wise multiplication. In practical engineering calculations, a very small negative penalty value is applied (e.g.) The formula does not conform to engineering constraints. The action Q value is set to a minimum, thereby setting the state-action value corresponding to the action in the invalid masking state to a minimum. This prevents the selection of actions that do not conform to engineering constraints in the action selection strategy of reinforcement learning, and ensures that the network outputs effective strategies only within the engineering feasible domain.

[0081] Priority experience replay unit, based on the reconstruction error index output by the inverse finite element solver, performs importance sampling on historical training samples; The global decision network and the regional optimization network are each configured with independent experience replay pools and work together through a strategy cascading approach.

[0082] In this embodiment, both the global decision network and the regional optimization network adopt the standard Dueling Double Deep Q-Network (D3QN) architecture, which includes a shared feature extraction layer and a two-branch structure for estimating the state value function V(s) and the action advantage function A(s,a), respectively. The Q-value estimates of each action are obtained by combining them.

[0083] The CC-D3QN algorithm proposed in this invention addresses the high-dimensional discrete characteristics and engineering constraints of FPSO sensor deployment. It achieves hard-coded masking of engineering constraints through a constraint projection layer, decouples global quota decision-making and local point optimization through a policy cascade architecture, and accelerates the learning convergence of key mechanical regions through physical-guided priority experience replay. Compared with the standard D3QN, it significantly reduces invalid exploration and improves training stability, making it suitable for optimization problems with complex engineering constraints such as FPSO.

[0084] Furthermore, considering the high computational overhead of inverse finite element solvers during training, this invention combines... iFEMThe output stress reconstruction error index is used to evaluate the importance of training samples and calculate the priority of experience samples accordingly. In subsequent experience replay sampling, historical experiences that have a significant impact on key stress areas are prioritized for network updates. This approach allows a limited number of physical evaluation results to participate more effectively in the policy learning process, further improving the learning efficiency and decision-making performance of CC-D3QN in the FPSO stress sensor placement optimization problem.

[0085] The deployment of the CC-D3QN algorithm in FPSO scenarios comprises two layers of decision-making agents, both of which are CC-D3QN instances. The upper-layer global decision model addresses the issue of stress sensor quota allocation across different regions of the FPSO hull structure. Its state space consists of statistical feature vectors representing the macroscopic structural characteristics and engineering constraints of each region, while its action space involves adjusting the stress sensor quota for each region. The goal is to learn a balance strategy between accuracy, cost, and feasibility based on the importance of the regional structure and resource constraints. The lower layer consists of a set of parallel regional-level point optimization models. Each model corresponds to a structural region, and its state space describes the specific arrangement of stress sensors within that region. Its action space involves moving stress sensor points within the region. The goal is to improve the monitoring accuracy of the region under the given regional quota constraints from the upper layer by integrating an inverse finite element evaluation environment. The two layers of agents are cascaded through the regional stress sensor quota constraints and the layout generation process. The global decision layer does not directly rely on specific physical reconstruction results, while the lower-layer regional optimization layer performs refined optimization using the inverse finite element evaluation environment, thereby achieving synergistic improvement of the overall layout scheme.

[0086] The specific training process is as follows: At the start of training, the system initializes: a pre-computed feature library is loaded into the environment, initial stress sensor layouts are generated randomly by grouping by region; simultaneously, the global and regional CC-D3QN network parameters and their corresponding target networks are initialized, and their respective experience replay pools are established. Subsequently, the system enters an iterative training loop.

[0087] In each training step, firstly, the environment state manager generates a state vector based on the current ship layout and inputs it into the global decision network. The network employs... The `-greedy` strategy selects an action to increase or decrease the number of sensors in a specific area. The environmental action processor parses and executes this action, updating the number of stress sensors on the target.

[0088] Next, the environment layout generator initiates a parallel region optimization network. The global agent (global decision network) and the region optimization agent (region optimization network) employ a cascaded training mechanism: the global agent independently selects actions based on the region quota, and its training rewards do not depend on the results of the lower-level local stress reconstruction; each region optimization agent trains independently after receiving global quota constraints. For each region, the generator first adjusts the number of stress sensors quickly using deterministic rules based on the new target quota to ensure that the quota requirements are met, resulting in an intermediate layout. Subsequently, it calls the corresponding region optimization CC-D3QN network. This network receives the local state vector of the region as input, calculates the action mask based on the engineering constraint feature vector through the constraint projection layer, masks infeasible points, and outputs a series of Q-values ​​for actions. The agent selects and executes actions accordingly, performing multi-step fine-tuning adjustments to the layout, and finally outputs the optimized layout for that region. After all regions are optimized, a new overall ship layout is generated.

[0089] Then, the environmental physics assessment engine calls the iFEM solver to reconstruct the final ship layout output by the layout generator and calculate the local stress reconstruction error of each region.

[0090] Finally, the system enters the learning phase. The global experience and local optimization experience of each region generated in this iteration are stored in the global replay pool and the replay pool of each region, respectively. The priority experience replay unit calculates the sample priority based on the reconstruction error index output by iFEM, and samples a small batch of experience data from its respective replay pool accordingly. The priority experience replay unit calculates the sample priority and samples based on the reconstruction error index output by the inverse finite element solver. The specific implementation process is as follows: First, calculate the priority of the sample. For the first sample in the experience pool... A number of empirical samples, prioritizing them. Defined as the sum of the local stress reconstruction error and the minimum constant corresponding to this sample:

[0091] in, This represents the local stress reconstruction error corresponding to sample i. The parameter is a very small constant, which is used to avoid the denominator being 0.

[0092] Then, mini-batch data is extracted based on a proportional sampling strategy. The system calculates the probability of each sample in the experience pool being sampled. :

[0093] in, Adjust hyperparameters according to priority.

[0094] Finally, to eliminate the network update bias introduced by non-uniform probability sampling, the system calculates importance sampling weights for small batches of empirical data after extraction. :

[0095] in, The total number of samples in the playback pool. This is the annealing hyperparameter used to control the degree of deviation correction.

[0096] In the subsequent update of the CC-D3QN algorithm network parameters, the system will use the aforementioned weights. The result is added to the mean squared error loss function of the target Q value calculated by Double Q-learning and then backpropagated.

[0097] Network parameters are updated using the CC-D3QN algorithm: the target Q-value is calculated through Double Q-learning, backpropagation is performed using mean squared error loss, and the target network parameters are synchronized using a soft update method. Exploration rate As training progress gradually diminishes, the agent shifts from extensive exploration to relying on learned strategies for utilization.

[0098] The above iterative process is repeated until the strategy converges, reaches the preset performance threshold, or reaches the preset number of training rounds, at which point training is complete. At this point, the parameters of the converged global network and the optimization networks for each region are saved. In practical deployment, for a new target FPSO, only the trained CC-D3QN model needs to be loaded, along with its hull structure stress characteristics. The system can then run a no-exploration automatic decision-making and optimization process, outputting a stress sensor placement scheme that meets multiple objectives in a short time.

[0099] This invention introduces a deep reinforcement learning framework into the field of FPSO hull structure health monitoring. By constructing a reinforcement learning environment that integrates stress sensor layout, hull structure mechanical characteristics and engineering constraint information, the intelligent agent can autonomously learn stress sensor placement strategies that meet the needs of FPSO engineering applications under complex engineering constraints and multi-objective optimization conditions.

[0100] This embodiment provides a method for optimizing the arrangement of stress sensors on FPSO hull structures, aiming to solve the problem of optimizing the arrangement of stress sensor networks under complex operating conditions and strict engineering constraints in FPSOs. Its core lies in constructing an optimization framework that integrates inverse finite element physical evaluation and a cascaded constraint-aware dual-duel deep Q-network (CC-D3QN) algorithm: a strategy cascade is formed through global-regional two-level instances of the CC-D3QN algorithm; hard-coded masking of engineering constraints is achieved through a constraint projection layer; and efficient learning guided by physics is achieved through priority experience replay units. This significantly improves training efficiency and engineering feasibility while meeting multi-objective optimization requirements.

[0101] This invention provides a method and system for optimizing the placement of stress sensors based on deep reinforcement learning for stress health monitoring of FPSO hull structures. Addressing the design problem of stress sensor networks under complex operating conditions and engineering constraints in FPSOs, it proposes an optimization framework integrating the inverse finite element method and deep reinforcement learning. Specifically, this framework comprises two main parts: the first part is constructing a reinforcement learning simulation environment including an inverse finite element solver and an FPSO prior feature library; the second part is designing a deep reinforcement learning agent oriented towards multi-objective optimization, including a state space integrating multi-source information, a defined layout, an action space for quantity adjustment, and a reward function balancing accuracy, cost, and feasibility. In this invention, the stress sensor placement problem of FPSO hull structures is first constructed as a sequential decision-making task. The agent undergoes iterative training with the simulation environment: at each time step, the agent encodes a multi-dimensional state vector based on the current layout, the queried mechanical and constraint features, and optimization weights. Based on this state vector, the agent selects and executes layout adjustment actions from the action space. The environment calls an inverse finite element solver to evaluate the stress response of the FPSO hull structure. Based on the multi-objective reward function, it calculates the multi-objective reward by integrating stress reconstruction error, cost, and constraint satisfaction, and feeds it back to the agent. The agent employs the CC-D3QN algorithm, based on the state... action The reward sequence continuously optimizes its policy network. After training convergence, the obtained policy can quickly generate corresponding high-precision, low-cost, and engineering-feasible optimal arrangement schemes for stress sensors on FPSO hull structures, based on different optimization objective weights.

[0102] Example 2 In one or more embodiments, a system for optimizing the arrangement of stress sensors for ship hull structures is disclosed, such as... Figure 6 As shown, it specifically includes: The hull structure feature pre-calculation module is used to obtain the structural mechanical characteristics and engineering constraint characteristics of candidate stress sensor locations in the hull structure of a floating production storage and offloading unit based on finite element analysis. The hull stress sensor layout simulation environment module is used to construct a reinforcement learning problem model and training environment based on the features. The training environment receives sensor layout adjustment instructions to update the layout and includes an inverse finite element solver for performance evaluation of stress sensor layout schemes. The multi-objective stress sensor layout optimization intelligent agent module is used to iteratively optimize the stress sensor layout scheme of the hull structure of the floating production storage and offloading unit through interactive training between the reinforcement learning problem model and the environment, until the training converges and outputs a stress sensor arrangement strategy that meets the multi-objective optimization requirements. The interactive training process includes: generating stress sensor layout adjustment decisions based on the reinforcement learning problem model using a deep reinforcement learning policy network; executing the decisions and calling the inverse finite element solver to evaluate the performance of the stress sensor layout; calculating a reward signal based on the performance evaluation results; and updating the policy network parameters using a cascaded constraint-aware dual-duel deep Q-network algorithm; the network algorithm includes a global decision network, a region optimization network, a constraint projection layer, and a priority experience replay unit, until the policy network converges.

[0103] Example 3 This embodiment provides an electronic device, including a memory and a processor, as well as computer instructions stored in the memory and running on the processor. When the computer instructions are executed by the processor, they complete the steps of the above-described method for optimizing the arrangement of stress sensors for ship structures.

[0104] Example 4 This embodiment provides a computer-readable storage medium for storing computer instructions, which, when executed by a processor, complete the steps of the above-described method for optimizing the arrangement of stress sensors for ship structures.

[0105] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0106] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0107] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment, whereby a series of operational steps are performed to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0108] The descriptions of each embodiment in the above embodiments have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.

[0109] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for optimizing the arrangement of stress sensors on a ship's hull structure, characterized in that, include: Based on finite element analysis, the structural mechanical characteristics and engineering constraint characteristics of candidate stress sensor locations in the hull structure of floating production storage and offloading (FPSO) units are obtained. Based on the aforementioned features, a reinforcement learning problem model and training environment are constructed. The training environment receives sensor layout adjustment instructions to update the layout and includes an inverse finite element solver for evaluating the performance of stress sensor layout schemes. Through interactive training between the reinforcement learning problem model and the environment, the layout scheme of stress sensors for the hull structure of the floating production storage and offloading unit is iteratively optimized until the training converges and outputs a stress sensor arrangement strategy that meets the multi-objective optimization requirements. The interactive training process includes: generating stress sensor layout adjustment decisions based on the reinforcement learning problem model using a deep reinforcement learning policy network; executing the decisions and calling the inverse finite element solver to evaluate the performance of the stress sensor layout; calculating a reward signal based on the performance evaluation results; and updating the policy network parameters using a cascaded constraint-aware dual-duel deep Q-network algorithm; the network algorithm includes a global decision network, a region optimization network, a constraint projection layer, and a priority experience replay unit, until the policy network converges.

2. The method for optimizing the arrangement of stress sensors for ship hull structures as described in claim 1, characterized in that, The reinforcement learning problem model is constructed as follows: the stress sensor optimization layout problem is modeled as a two-stage cascaded reinforcement learning decision-making process including a global decision-making stage and a regional layout optimization stage; a state space including the current stress sensor layout information, the structural mechanical features and engineering constraint features is constructed; an action space for adjusting the stress sensor layout is constructed; and a multi-objective reward function based on stress reconstruction performance and cost is constructed.

3. The method for optimizing the arrangement of stress sensors for ship hull structures as described in claim 2, characterized in that, The construction of the multi-objective reward function based on stress reconstruction performance and cost includes: The reward function for the global decision-making stage is constructed based on the importance of regional structures, the rationality of the allocation of stress sensor quantities, and the satisfaction of engineering constraints: in, For the reward function in the global decision-making stage; This represents the total number of stress sensors. The quota quantity for areas where engineering constraints are not met; These are the weighting coefficients; This represents the upper limit of the budget for the number of stress sensors. It is a constant; A performance evaluation metric used to characterize the rationality of the number of sensors configured at the regional level; The reward function for the regional layout optimization stage is constructed based on the hull stress response reconstruction error calculated by the inverse finite element solver: in, The reward function for the region optimization phase; This is the proportionality coefficient; This represents the error in local stress reconstruction.

4. The method for optimizing the arrangement of stress sensors for ship hull structures as described in claim 1, characterized in that, The inverse finite element solver, as part of the reinforcement learning simulation environment, performs numerical reconstruction of the stress response of the floating production storage and offloading (FPSO) hull structure based on strain data measured by stress sensors under the aforementioned layout or virtual measurement data generated based on a pre-calculated theoretical strain and noise model, without relying on external load information, and generates a reconstruction error index.

5. The method for optimizing the arrangement of stress sensors for ship hull structures as described in claim 1, characterized in that, The cascaded constraint-aware dual-battle depth Q-network algorithm includes: A global decision network generates globally assigned actions based on the global state vector in the state space. The regional optimization network generates local adjustment actions based on local state vectors in the state space, under the constraint of global action allocation. The constraint projection layer performs feasibility masking on the action selection space of the regional optimization network based on engineering constraint characteristics, so that placement actions that do not conform to engineering constraints are prohibited from being selected. Priority experience replay unit, based on the reconstruction error index output by the inverse finite element solver, performs importance sampling on historical training samples; The global decision network and the regional optimization network are each configured with independent experience replay pools and work together through a strategy cascading approach.

6. The method for optimizing the arrangement of stress sensors for ship hull structures as described in claim 5, characterized in that, The global state vector includes the stress sensor deployment density, inherent monitoring value, inherent installation difficulty, and mean and variance of structural stress importance and engineering constraint characteristic values ​​for each structural region. The local state vector includes a binary vector that characterizes the deployment status of each candidate stress sensor location within the corresponding structural region.

7. The method for optimizing the arrangement of stress sensors for ship hull structures as described in claim 5, characterized in that, Specifically, the constraint projection layer includes: Based on the engineering constraint feature vector of the candidate position and the preset threshold, construct an action mask vector that is consistent with the dimension of the action space; Determine whether the engineering constraint features of each candidate position meet the preset threshold. For candidate positions that meet the threshold, configure the corresponding mask element in the action mask vector as a valid state; for candidate positions that do not meet the threshold, configure the corresponding mask element as an invalid state. The system receives the original state-action value vector output by the region optimization network and projects it using the action mask vector to generate an effective action value vector.

8. A system for optimizing the arrangement of stress sensors for ship hull structures, characterized in that, include: The hull structure feature pre-calculation module is used to obtain the structural mechanical characteristics and engineering constraint characteristics of candidate stress sensor locations in the hull structure of a floating production storage and offloading unit based on finite element analysis. The hull stress sensor layout simulation environment module is used to construct a reinforcement learning problem model and training environment based on the features. The training environment receives sensor layout adjustment instructions to update the layout and includes an inverse finite element solver for performance evaluation of stress sensor layout schemes. The multi-objective stress sensor layout optimization intelligent agent module is used to iteratively optimize the stress sensor layout scheme of the hull structure of the floating production storage and offloading unit through interactive training between the reinforcement learning problem model and the environment, until the training converges and outputs a stress sensor arrangement strategy that meets the multi-objective optimization requirements. The interactive training process includes: generating stress sensor layout adjustment decisions based on the reinforcement learning problem model using a deep reinforcement learning policy network; executing the decisions and calling the inverse finite element solver to evaluate the performance of the stress sensor layout; calculating a reward signal based on the performance evaluation results; and updating the policy network parameters using a cascaded constraint-aware dual-duel deep Q-network algorithm; the network algorithm includes a global decision network, a region optimization network, a constraint projection layer, and a priority experience replay unit, until the policy network converges.

9. An electronic device, characterized in that, It includes a memory and a processor, as well as computer instructions stored in the memory and running on the processor, which, when executed by the processor, complete the method for optimizing the arrangement of stress sensors for ship structures as described in any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, Used to store computer instructions, which, when executed by a processor, complete the method for optimizing the arrangement of stress sensors for ship structures as described in any one of claims 1-7.