Hybrid sub-connected reflector, communication system, parameter optimization method and related devices

By employing a subarray design with shared phase-shifting and shared amplification circuits in the hybrid sub-connected reflector, combined with an independent mode switching switch and energy harvesting circuit, the problem of low energy utilization in self-powered communication systems is solved, achieving high reliability and high throughput communication in complex environments.

CN122247462APending Publication Date: 2026-06-19GUANGZHOU UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGZHOU UNIVERSITY
Filing Date
2026-04-03
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing hybrid sub-connected reflector architectures have low energy utilization in self-powered communication systems, resulting in limited communication reliability and throughput. In particular, the system is paralyzed when environmental energy is scarce, and it cannot unleash its optimal performance potential.

Method used

The subarray design employs a shared phase-shifting circuit and a shared amplification circuit. Each reflection unit is equipped with an independent mode switching switch, enabling flexible switching between active and passive reflection modes. Combined with the energy harvesting circuit and battery unit, transmission parameters are optimized to improve energy utilization.

Benefits of technology

Significantly reduce hardware costs and static power consumption, ensuring that the system maintains basic communication connectivity when energy is limited, and releasing signal amplification potential when energy is sufficient, thereby improving communication reliability and throughput.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247462A_ABST
    Figure CN122247462A_ABST
Patent Text Reader

Abstract

The hybrid sub-connected reflector, communication system, parameter optimization method, and related equipment proposed in this application include: a hybrid sub-connected reflector comprising: an energy harvesting circuit, a battery unit, and multiple sub-arrays. The energy harvesting circuit is connected to the battery unit to harvest ambient energy and store it in the battery unit to maintain the operation of the multiple sub-arrays. Each sub-array includes a shared phase-shifting circuit, a shared amplification circuit, multiple reflective units, and a mode switching switch corresponding to each reflective unit. In each sub-array, all reflective units are connected to the shared phase-shifting circuit, and a corresponding mode switching switch is connected in series on the radio frequency branch where each reflective unit is located. One end of the mode switching switch is connected to the reflective unit or the shared phase-shifting circuit, and the other end of the mode switching switch is connected to the shared amplification circuit. The hybrid sub-connected reflector is used to control the corresponding mode switching switch based on a mode control signal, thereby improving the energy utilization rate of the hybrid sub-connected intelligent reflector.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of wireless communication technology, and in particular to hybrid sub-connected reflectors, communication systems, parameter optimization methods, and related equipment. Background Technology

[0002] With the development of wireless communication technology, reconfigurable intelligent surfaces (RIS) have become a key technology for improving network coverage and spectral efficiency due to their ability to dynamically regulate the reflection and propagation environment of electromagnetic waves. However, traditional RIS typically rely on fixed power supplies, which greatly limits their flexible deployment in remote, disaster-prone, or high-risk areas without stable power grid coverage. To address this issue, self-powered RIS with environmental energy harvesting (EH) capabilities have emerged, achieving self-sufficiency by capturing energy from nature or the radio frequency environment. In practical deployments, purely passive RIS suffers from "multiplicative fading," resulting in limited link gain, while purely active RIS, although capable of amplifying signals, faces extremely high hardware power consumption. Therefore, hybrid RIS architectures, combining the advantages of both, have been widely studied. To further reduce the high hardware costs and static circuit power consumption caused by the requirement for independent circuitry in each unit of large-scale hybrid RIS, the industry has further proposed a sub-connected architecture. This involves grouping adjacent reflective units to share related radio frequency or circuit modules, thereby achieving a trade-off between power consumption and communication performance to some extent.

[0003] In related technologies, most mainstream sub-connection schemes only support a single operating mode of fully active or fully passive operation. Existing hybrid sub-connection architectures typically employ fixed physical divisions between active and passive units, or merely share power amplifier circuits while still requiring each reflector unit to maintain an independent phase-shifting circuit. Due to the randomness and time-varying nature of environmental energy acquisition and communication channels in self-powered scenarios, existing hybrid sub-connection architectures, with their fixed operating modes and static power consumption overhead from independent phase-shifting circuits, often fail to function properly when environmental energy is scarce because the collected energy is insufficient to drive the independent circuits. Conversely, when energy is abundant, they are limited by fixed component groupings, preventing them from unleashing their optimal performance potential. This results in low energy utilization of existing hybrid sub-connection RIS architectures in practical self-powered communication systems, thus limiting the communication reliability and throughput of self-powered communication systems. Summary of the Invention

[0004] This application provides a hybrid sub-connected reflector, a communication system, a parameter optimization method, and related equipment, which can improve the energy utilization of the self-powered intelligent reflector, thereby improving the communication reliability and throughput of the self-powered communication system.

[0005] To achieve the above objectives, a first aspect of this application provides a hybrid sub-connected reflective surface, comprising: The system includes an energy harvesting circuit, a battery cell, and multiple subarrays. The energy harvesting circuit is connected to the battery cell and is used to harvest ambient energy and store it in the battery cell to maintain the operation of the multiple subarrays. Each of the subarrays includes a shared phase-shifting circuit, a shared amplification circuit, multiple reflection units, and a mode switching switch corresponding to each of the reflection units; In each of the subarrays, all the reflecting units are connected to the shared phase-shifting circuit. A corresponding mode switching switch is connected in series on the radio frequency branch where each reflecting unit is located. One end of the mode switching switch is connected to the reflecting unit or the shared phase-shifting circuit, and the other end of the mode switching switch is connected to the shared amplifier circuit. When the hybrid sub-connecting reflective surface receives a mode control signal, the hybrid sub-connecting reflective surface is used to control the corresponding mode switching switch to close or open based on the mode control signal, so as to control the corresponding reflective unit to switch between active reflection mode and passive reflection mode.

[0006] To achieve the above objectives, a second aspect of this application provides a communication system, comprising: A hybrid sub-connected reflective surface, as shown in the first aspect; A base station and multiple mobile terminals, wherein the base station communicates with each of the mobile terminals via the hybrid sub-connecting reflective surface.

[0007] To achieve the above objectives, a third aspect of this application proposes a method for optimizing transmission parameters of a communication system, the communication system being as shown in the second aspect, the method comprising: The system acquires the first channel state information between each mobile terminal and the base station, the second channel state information between the hybrid sub-connection reflector and the base station, and the third channel state information between each mobile terminal and the hybrid sub-connection reflector in the current time frame; it also acquires the battery energy state information of the hybrid sub-connection reflector in the current time frame; and acquires the transmission parameter optimization model. The first channel state information, the second channel state information, the third channel state information, and the battery energy state information are input into the transmission parameter optimization model for data matching processing to obtain the output optimized mode switching sequence, optimized phase shift sequence, optimized amplification coefficient sequence, optimized transmission power, and optimized auxiliary communication duration. The optimized receive beamforming of the base station is obtained by solving the optimized transmit power, the first channel state information, the second channel state information, the third channel state information, the optimized phase shift sequence, the optimized amplification coefficient sequence, and the optimized mode switching sequence. Based on the optimized auxiliary communication duration, the current time frame is divided into an auxiliary communication phase and a pure energy harvesting phase; During the auxiliary communication phase, the opening and closing of the mode switching switch is controlled based on the optimized mode switching sequence, the shared phase shift circuit and the shared amplification circuit are controlled based on the optimized phase shift sequence and the optimized amplification coefficient sequence, and the mobile terminal is controlled to send data to the base station based on the optimized transmit power and the optimized receive beamforming. During the pure energy harvesting phase, the hybrid sub-connected reflector is controlled to harvest ambient energy, and the mobile terminal is controlled to send data to the base station via a direct link.

[0008] To achieve the above objectives, a fourth aspect of this application provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the transmission parameter optimization method for the communication system as described in the third aspect.

[0009] To achieve the above objectives, a fifth aspect of the present application provides a storage medium, which is a computer-readable storage medium storing a computer program that, when executed by a processor, implements the transmission parameter optimization method for the communication system as described in the third aspect.

[0010] The hybrid sub-connected reflector, communication system, parameter optimization method, and related equipment proposed in this application include: a hybrid sub-connected reflector comprising: an energy harvesting circuit, a battery cell, and multiple sub-arrays. The energy harvesting circuit is connected to the battery cell to harvest ambient energy and store it in the battery cell to maintain the operation of the multiple sub-arrays. Each sub-array includes a shared phase-shifting circuit, a shared amplification circuit, multiple reflective units, and a mode switching switch corresponding to each reflective unit. In each sub-array, all reflective units are connected to the shared phase-shifting circuit, and a corresponding mode switching switch is connected in series on the radio frequency branch where each reflective unit is located. One end of the mode switching switch is connected to the reflective unit or the shared phase-shifting circuit, and the other end of the mode switching switch is connected to the shared amplification circuit. When the hybrid sub-connected reflector receives a mode control signal, it controls the corresponding mode switching switch to close or open based on the mode control signal, thereby controlling the corresponding reflective unit to switch between active reflection mode and passive reflection mode. This application embodiment breaks through the design limitation of existing technologies where each reflective unit still needs to maintain an independent phase-shifting circuit by setting a shared phase-shifting circuit and a shared amplification circuit in each sub-array. This significantly reduces the hardware cost and static power consumption of large-scale arrays, effectively avoiding the risk of system paralysis due to excessive power consumption of the underlying drive when the ambient energy is scarce. At the same time, by configuring an independent mode switching switch for each reflective unit, each reflective unit can flexibly and independently switch between active and passive reflection modes according to the random and time-varying ambient energy acquisition status and communication channel quality. This completely overcomes the technical defects of existing sub-connection schemes, such as a single working mode or fixed physical division of active and passive components. This innovative physical architecture of "shared underlying circuit and independent switching of superimposed units" allows the reflective surface to reduce power consumption to maintain basic communication connectivity when energy is limited by switching to passive mode, and to activate more active units as needed when energy is sufficient to release the optimal signal amplification and phase modulation potential. This greatly improves the energy utilization rate of the hybrid sub-connection intelligent reflective surface architecture and fundamentally ensures the communication reliability and long-term high throughput performance of the self-powered communication system in complex environments.

[0011] Other features and advantages of this application will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the application. The objectives and other advantages of this application may be realized and obtained by means of the structures particularly pointed out in the description, claims and drawings. Attached Figure Description

[0012] Figure 1 This is a schematic diagram of a communication system with a hybrid sub-connector reflective surface provided in an embodiment of this application.

[0013] Figure 2This is a flowchart of a method for optimizing transmission parameters of a communication system according to another embodiment of this application.

[0014] Figure 3 This is a schematic diagram of the overall architecture of a transmission parameter optimization model based on the hierarchical hybrid near-end policy optimization (HPPO) algorithm, provided in another embodiment of this application.

[0015] Figure 4 This is a schematic diagram of the hardware structure of an electronic device provided in another embodiment of this application. Detailed Implementation

[0016] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0017] It should be noted that although functional modules are divided in the device schematic diagram and the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the order in the flowchart.

[0018] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.

[0019] In this application, vectors are represented by bold lowercase letters, and matrices are represented by bold uppercase and lowercase letters. The dimension is A complex matrix. and Represents the transpose and conjugate transpose of a vector or matrix. Symbol It represents the statistical expectation. This indicates a diagonalization operation. The dimension is The identity matrix. and These represent the norm of the matrix and the modulus of the complex number, respectively.

[0020] With the development of wireless communication technology, reconfigurable intelligent surfaces (RIS) have become a key technology for improving network coverage and spectral efficiency due to their ability to dynamically regulate the reflection and propagation environment of electromagnetic waves. However, traditional RIS typically rely on fixed power supplies, which greatly limits their flexible deployment in remote, disaster-prone, or high-risk areas without stable power grid coverage. To address this issue, self-powered RIS with environmental energy harvesting (EH) capabilities have emerged, achieving self-sufficiency by capturing energy from nature or the radio frequency environment. In practical deployments, purely passive RIS suffers from "multiplicative fading," resulting in limited link gain, while purely active RIS, although capable of amplifying signals, faces extremely high hardware power consumption. Therefore, hybrid RIS architectures, combining the advantages of both, have been widely studied. To further reduce the high hardware costs and static circuit power consumption caused by the requirement for independent circuitry in each unit of large-scale hybrid RIS, the industry has further proposed a sub-connected architecture. This involves grouping adjacent reflective units to share related radio frequency or circuit modules, thereby achieving a trade-off between power consumption and communication performance to some extent.

[0021] In related technologies, most mainstream sub-connection schemes only support a single operating mode of fully active or fully passive operation. Existing hybrid sub-connection architectures typically employ fixed physical divisions between active and passive units, or merely share power amplifier circuits while still requiring each reflector unit to maintain an independent phase-shifting circuit. Due to the randomness and time-varying nature of environmental energy acquisition and communication channels in self-powered scenarios, existing hybrid sub-connection architectures, with their fixed operating modes and static power consumption overhead from independent phase-shifting circuits, often fail to function properly when environmental energy is scarce because the collected energy is insufficient to drive the independent circuits. Conversely, when energy is abundant, they are limited by fixed component groupings, preventing them from unleashing their optimal performance potential. This results in low energy utilization of existing hybrid sub-connection RIS architectures in practical self-powered communication systems, thus limiting the communication reliability and throughput of self-powered communication systems.

[0022] To improve the energy utilization of self-powered intelligent reflectors and thus enhance the communication reliability and throughput of self-powered communication systems, this application embodiment breaks through the design limitation of existing technologies where each reflector unit still needs to maintain an independent phase-shifting circuit by setting a shared phase-shifting circuit and a shared amplification circuit in each sub-array. This significantly reduces the hardware cost and static power consumption of large-scale arrays and effectively avoids the risk of system paralysis due to excessive power consumption of the underlying drive when environmental energy is scarce. At the same time, by configuring an independent mode switching switch for each reflector unit, each reflector unit can flexibly and independently switch between active and passive reflection modes based on the random and time-varying environmental energy acquisition status and communication channel quality. This completely overcomes the technical defects of existing sub-connection schemes, such as a single working mode or fixed physical division of active and passive components. This innovative physical architecture of "shared underlying circuitry and independent switching of superimposed units" enables the reflector to reduce power consumption and maintain basic communication connectivity by switching to passive mode when energy is limited, and to activate more active units as needed when energy is sufficient to release the optimal signal amplification and phase modulation potential. This greatly improves the energy utilization rate of the hybrid sub-connected intelligent reflector architecture and fundamentally ensures the communication reliability and long-term high throughput performance of the self-powered communication system in complex environments.

[0023] The hybrid sub-connector reflector, communication system, parameter optimization method, and related equipment proposed in the embodiments of this application will be further described below. First, a hybrid sub-connector reflector and a communication system incorporating the hybrid sub-connector reflector will be described. (Refer to...) Figure 1 This is a schematic diagram of a communication system with a hybrid sub-connector reflective surface provided in an embodiment of this application. Figure 1 As shown, the communication system consists of a base station, multiple (e.g., J) users carrying mobile terminals, and a hybrid sub-connection reflector deployed between them. In this communication system, the communication link between the base station and each mobile terminal can be intelligently controlled and enhanced through the hybrid sub-connection reflector, aiming to improve signal coverage, enhance communication quality, and overcome signal attenuation problems in traditional wireless propagation environments.

[0024] like Figure 1As shown, the hybrid sub-connected reflector includes an energy harvesting circuit, battery cells, and multiple sub-arrays. The energy harvesting circuit is connected to the battery cells to harvest and store ambient energy, sustaining the operation of the multiple sub-arrays. The Reconfigurable Intelligent Surface (RIS), deployed in remote or poorly powered areas, can capture natural or electromagnetic energy from the surrounding environment, including solar, wind, and radio frequency (RF) energy tower radiation, through its configured energy harvesting (EH) circuitry. The harvested ambient energy is transmitted and stored in rechargeable battery cells, achieving communication-assisted self-sufficiency. In terms of physical array division, this self-powered reflector contains a total of N programmable reflective units, which are divided into L sub-arrays. Each sub-array contains a fixed number of Q reflective units, satisfying the mathematical relationship N = L × Q. The power accumulated in the battery cells is strictly managed and is dedicated to maintaining the hardware drive consumption of these multiple subarrays during subsequent signal reflection and amplification. The system's energy consumption within the current time frame is strictly constrained by the causal relationship of the available energy in the battery cells.

[0025] In some embodiments, the internal hardware configuration of the subarray includes a shared phase-shifting circuit, a shared amplification circuit, multiple reflection units, and a mode switching switch corresponding to each reflection unit. Here, the "shared phase-shifting circuit" refers to a radio frequency control module used to adjust the phase characteristics of the incident electromagnetic wave to change its reflection direction, while the "shared amplification circuit" refers to a hardware module used to amplify the power gain of the reflected signal that has been attenuated due to path loss.

[0026] In traditional fully-connected (FC) architectures, each reflector unit requires a dedicated phase shifter and amplifier, leading to high hardware costs and extremely high static power consumption in large-scale arrays. To address this technical challenge, this application allows Q reflector units within a single subarray to share the same shared phase shifter and power amplifier. Crucially, to maintain control over individual physical characteristics despite this high degree of sharing, the hardware circuitry provides each of the Q reflector units within the subarray with an independent mode switch, enabling independent control of the operating state of each reflector unit.

[0027] To achieve hardware logic decoupling between phase shift control and power amplification functions in the reflector unit, such as Figure 1The diagram on the right shows a schematic of each subarray in the hybrid sub-connected reflector. In each subarray, all reflective units are configured to maintain a basic connection with the shared phase-shifting circuit to ensure that the reflective units have basic electromagnetic wave phase modulation capabilities under any operating state. Simultaneously, a corresponding mode switching switch is connected in series on the RF signal transmission branch where each reflective unit is located. Specifically, one end of the mode switching switch is connected to the reflective unit or the shared phase-shifting circuit, while the other end converges and connects to the shared amplifier circuit. Through this innovative circuit topology design of "phase shift always on, amplification controlled," the mode switching switch essentially becomes an independent gate valve controlling whether the corresponding reflective unit is connected to the high-power amplification link. When the mode switching switch is open, the reflective unit only physically cuts off the RF interaction with the shared amplifier circuit, but can still maintain its normal passive phase deflection function relying on the shared phase-shifting circuit, thus entering an extremely energy-efficient passive reflection mode. When the mode switching switch is closed, the RF signal of that branch is fully connected, and the reflective unit simultaneously enjoys the power multiplication benefits of phase shift modulation and the shared amplifier, thus entering a high-performance active reflection mode. This topology design not only rigorously conforms to the underlying physical evolution logic of the active and passive modes of intelligent reflective surfaces, but also fundamentally endows the system with the hardware flexibility to freely schedule between maintaining basic communication connectivity and releasing maximum communication gain.

[0028] In some embodiments of the dynamic switching control of the reflector unit's operating mode, when the hybrid sub-connecting reflector receives a mode control signal, the hybrid sub-connecting reflector controls the corresponding mode switching switch to close or open based on the mode control signal, thereby controlling the corresponding reflector unit to switch between active reflection mode and passive reflection mode. In the solution of this application, the control signal is transmitted through an indicator variable. To precisely issue switching state commands to the q-th reflective unit in the l-th subarray. When When the corresponding mode switch is turned off, the reflective unit is disconnected from the radio frequency energy interaction with the shared amplifier circuit and operates in "Passive Mode" using only its own physical metamaterial properties. In this mode, the unit only changes the signal phase without consuming the system's amplification power. When the mode switching switch closes, the reflective unit successfully connects to the shared amplifier circuit and enters "Active Mode." In this mode, the unit not only adjusts the phase but also substantially amplifies the power of the weak incident signal. This flexible dynamic switching mechanism constitutes the mode indication matrix of the entire array. That is, the mode switch sequence, where This enables the system to perform fine-grained environmental energy scheduling and resource allocation.

[0029] In some embodiments, the direct communication link between the mobile terminal and the base station often suffers severe fading due to obstructions such as urban buildings and terrain, leading to channel quality degradation. To overcome this physical limitation, the communication system deploys the aforementioned hybrid sub-connection reflector in space. During communication, when the uplink wireless signal transmitted by the mobile terminal reaches the reflector, the reflector utilizes the environmental energy it collects and stores, combined with the coordinated switching of its internal active and passive reflection units, to accurately reflect the reconstructed, phase-shifted, and partially amplified signal to the base station receiver. The base station then uses appropriate receive beamforming techniques such as linear minimum mean square error (MMSE) to simultaneously receive and process signals from both the reflected link and the direct link, thereby significantly improving the overall data throughput and communication reliability of the multi-user system.

[0030] This application proposes a hybrid sub-connected reflector and communication system that significantly reduces the high hardware costs and massive static circuit power consumption associated with large-scale antenna arrays by highly integrating shared phase-shifting and amplification circuits at the sub-array level. This innovative structure effectively avoids the risk of system power depletion due to excessive power consumption of the underlying radio frequency drive in harsh environments with limited energy harvesting. Furthermore, by connecting an independent mode-switching switch in series with each reflector unit on a shared architecture, the system achieves both extreme energy efficiency and a high degree of beam control freedom. The base station can intelligently and granularly control each reflector unit to flexibly and independently switch between passive reflection mode (low power, connectivity-preserving) and active reflection mode (high power, high gain) based on real-time fluctuations in environmental energy storage and dynamically changing channel transmission quality. This design, which combines highly shared underlying hardware with independent switching of top-level units, ensures that the device can survive under extreme conditions where energy is extremely limited, while also ensuring that more active units can be activated as needed when the ambient energy is abundant to unleash powerful signal amplification potential. This fundamentally maximizes the energy utilization of the system and significantly improves the long-term operational stability and overall network throughput performance of the self-powered communication system.

[0031] Based on the above description of the hybrid sub-connected reflector and communication system, the following will further describe a method for optimizing transmission parameters of a communication system provided by embodiments of this application. The method for optimizing transmission parameters of a communication system provided in these embodiments can be applied to a base station in a communication system or a processor connected to a base station, etc. (Refer to...) Figure 2 This is an optional flowchart of a method for optimizing transmission parameters of a communication system provided in an embodiment of this application. Figure 2 The method may include, but is not limited to, steps 100 to 600. It is also understood that this embodiment... Figure 2The order of steps 100 to 600 is not specifically limited. The order of steps can be adjusted or some steps can be reduced or added according to actual needs.

[0032] Step 100: Obtain the first channel state information between each mobile terminal and the base station, the second channel state information between the hybrid sub-connection reflector and the base station, and the third channel state information between each mobile terminal and the hybrid sub-connection reflector in the current time frame. Also, obtain the battery energy state information of the hybrid sub-connection reflector in the current time frame and obtain the transmission parameter optimization model.

[0033] Step 100 is described in detail below.

[0034] In step 100 of some embodiments, the base station first needs to obtain the global environmental state characteristics of the entire communication system at the start of the current time frame. Specifically, this includes obtaining the first channel state information between each mobile terminal and the base station, the second channel state information between the hybrid sub-connection reflector and the base station, and the third channel state information between each mobile terminal and the hybrid sub-connection reflector in the current time frame. It also obtains the battery energy state information of the hybrid sub-connection reflector in the current time frame and obtains the transmission parameter optimization model. Here, "Channel State Information" (CSI) is a parameter used to characterize the fading and phase shift characteristics of radio signals propagating along different physical spatial paths. The first channel state information represents the channel coefficient matrix of the direct link between each mobile terminal and the base station (e.g., ...). The second channel state information represents the channel coefficient matrix of the reflection link between the hybrid sub-connection reflector and the base station (e.g., The third channel state information represents the channel coefficient vector of the incident link between each mobile terminal and the reflector (e.g., At the same time, the system acquires battery energy state information (such as...). This represents the initial remaining charge in the current rechargeable battery that can sustain the operation of the reflector system. Furthermore, the system locally invokes a pre-trained offline transmission parameter optimization model, typically a neural network model built on a deep reinforcement learning architecture such as the Hierarchical Hybrid Proximal Policy Optimization (HPPO) algorithm, for subsequent intelligent decision-making.

[0035] The process of obtaining the transmission parameter optimization model includes steps 110 to 150.

[0036] Step 110: Based on the phase shift parameter sequence, amplification factor parameter sequence, mode switching parameter sequence of the hybrid sub-connected reflector, as well as the transmit power parameter and beamforming parameter of each mobile terminal, construct the first decoding signal-to-noise ratio function in the auxiliary communication stage and the second decoding signal-to-noise ratio function in the pure energy harvesting stage.

[0037] Step 120: Based on the transmission power parameters, phase shift parameter sequence, amplification factor parameter sequence, mode switching parameter sequence, and auxiliary communication duration, obtain the battery energy update function of the hybrid sub-connected reflector.

[0038] Steps 110 to 120 are described in detail below.

[0039] In step 110 of some embodiments, a first decoding signal-to-noise ratio (SINR) function for the auxiliary communication stage and a second decoding SINR function for the pure energy harvesting stage are constructed based on the phase shift parameter sequence, amplification factor parameter sequence, and mode switching parameter sequence of the hybrid sub-connector reflector, as well as the transmit power parameter and beamforming parameter of each mobile terminal. Specifically, the decoding signal-to-noise ratio (SINR) is a core physical quantity that measures the signal quality at the receiving end and determines the system communication rate. Since the communication process in this application is divided into two stages with completely different working mechanisms within a time frame, corresponding mathematical evaluation functions need to be established separately. In the auxiliary communication stage, the signal received by the base station is the superposition of the signal directly transmitted by the mobile terminal and the signal enhanced by reflection from the reflector. Its channel quality is simultaneously controlled by the phase shift parameter sequence of the reflector (affecting the phase alignment of the reflected signal), the amplification factor parameter sequence (affecting the power intensity of the reflected signal), and the mode switching parameter sequence (determining which reflecting units participate in the energy-consuming amplification work). The system combines these reflector parameters, along with the mobile terminal's transmit power parameters and the base station's beamforming parameters, to calculate the signal power to noise power ratio for this stage, thereby constructing the first decoding signal-to-noise ratio function. In the pure energy harvesting stage, the hybrid sub-connector reflector enters a power-off sleep state to stop auxiliary communication, and the base station receives signals only through the fading-affected direct link. At this time, the signal-to-noise ratio depends solely on the physical conditions of the direct channel and the mobile terminal's transmit power and beamforming parameters. Based on this, the system separately calculates and constructs the second decoding signal-to-noise ratio function.

[0040] In this application, let ,definition ,in This represents the phase of the element in the l-th subarray within the t-th time frame. Subsequently, this represents the phase shift matrix corresponding to the hybrid sub-connected reflector RIS. This refers to the phase shift parameter sequence. Similarly, it is defined... That is, the sequence of amplification factor parameters. This represents the amplification factor of the amplifier at the l-th subarray in the t-th time frame. Let... This represents the amplification factor value corresponding to each unit. The final result is the RIS reflection coefficient matrix. This means that the value of the magnification factor is taken as follows. For the indicator matrix Exponenterate each term and then multiply by the phase shift matrix. For a reflective unit operating in active mode, the simultaneous amplification signal and noise matrix can be expressed as: In the first phase of the t-th time frame (i.e., the auxiliary communication phase), the base station simultaneously receives signals reflected by the RIS and signals directly transmitted by the user. Therefore, the signal received by the base station can be expressed as shown in the following formula.

[0041]

[0042] These are the beamforming parameters for the first stage. Given the transmit power parameters for the first stage, the signal-to-interference-plus-noise ratio (SINR) received by user terminal j relative to the base station at time t in the first stage can be expressed as the following formula, which is the first decoding signal-to-noise ratio function.

[0043]

[0044] in, This refers to the noise power at the base station. This represents the noise power at the intelligent reflector end.

[0045] In the second stage (i.e., the RIS pure energy harvesting stage), the base station only receives signals directly transmitted by the user terminal, and the received signals are shown in the following formula.

[0046]

[0047] in, These are the beamforming parameters for the second stage. Let be the transmit power parameters for the second stage. The SINR of user terminal j at the base station in the second stage of the t-th time frame is given by the following formula, which is the second decoding signal-to-noise ratio function.

[0048]

[0049] Within time frame t, the total throughput obtained by the base station from J user terminals can be expressed as shown in the following formula.

[0050]

[0051] In step 120 of some embodiments, a battery energy update function for the hybrid sub-connected reflector is obtained based on the transmit power parameter, phase shift parameter sequence, amplification factor parameter sequence, mode switch parameter sequence, and auxiliary communication duration. For self-powered communication systems, stable operation highly depends on precise power management, requiring the system to accurately track battery charging and discharging dynamics. The battery energy update function mathematically characterizes the remaining usable battery power at the end of the current time frame. In actual operation, battery energy consumption mainly occurs during the auxiliary communication phase. This energy consumption includes the static hardware power consumption for activating the shared phase shift circuit and shared amplification circuit, as well as the dynamic power consumption for amplifying the radio frequency signal. The magnitude of these energy consumption values ​​is jointly determined by the number of activated units determined by the mode switch parameter sequence, the amplification factor determined by the amplification factor parameter sequence, the phase shift parameter sequence, and the incident signal strength (related to the transmit power parameter of the mobile terminal). The total energy consumption term is calculated by multiplying the obtained instantaneous power by the auxiliary communication duration of that phase. The system constructs a complete battery energy update function that reflects the causal relationship of energy flow in the system by adding the remaining battery energy in the previous time frame to the total environmental energy captured by the energy harvesting circuit in the current time frame, and then subtracting the total energy consumption term calculated above. The details are described below.

[0052] The battery energy update function of the hybrid sub-connected reflector is obtained based on the transmission power parameter, phase shift parameter sequence, amplification factor parameter sequence, mode switch parameter sequence and auxiliary communication duration, including the following steps 121 to 124.

[0053] Step 121: Based on the static power consumption of the phase-shifting circuit and the static power consumption of the amplifier in the shared amplifier circuit, and combined with the sub-array activation state indicated by the mode switch parameter sequence, calculate the total static circuit power consumption of the hybrid sub-connected reflector.

[0054] Step 122: Based on the amplifier conversion efficiency, third channel state information, transmit power parameters, phase shift parameter sequence, amplification factor parameter sequence, and input noise variance, calculate the dynamic amplification power consumption of the hybrid sub-connected reflector.

[0055] Step 123: Add the total static circuit power consumption to the dynamic amplification power consumption, and then multiply by the auxiliary communication duration to obtain the energy consumption item for the current time frame.

[0056] Step 124: Based on the remaining battery energy in the previous time frame, the environmental energy acquisition items and energy consumption items in the current time frame, and the maximum battery capacity, perform boundary constraint calculations to obtain the battery energy update function.

[0057] Steps 121 to 124 are described in detail below.

[0058] In step 121 of some embodiments, the system calculates the total static circuit power consumption of the hybrid sub-connected reflector based on the static power consumption of the phase-shifting circuit and the static power consumption of the shared amplifier circuit, combined with the sub-array activation state indicated by the mode switch parameter sequence. In the hardware operation of a self-powered communication system, the circuit generates inherent basic energy loss once it is powered on; this is referred to as static power consumption. For the hybrid sub-connected architecture proposed in this application, multiple reflector units within each sub-array physically share the same set of phase-shifting and amplification hardware. Here, "phase-shifting circuit static power consumption" refers to the fixed loss of the underlying RF network used to maintain the phase modulation function of the components; "amplifier static power consumption" refers to the inherent basic power consumption of the shared amplifier circuit when it is in standby or running state. In this step, the processor, by parsing the mode switch parameter sequence, can accurately determine the closed or open state of each reflector unit, and thus deduce and determine the overall "activation state" of each sub-array. Specifically, if at least one reflective element in a subarray has its mode switch closed (i.e., operating in active reflection mode), the system determines that the shared amplifier circuit of that subarray has been activated. In this case, the static power consumption of the subarray will include both the static power consumption of the phase-shifting circuit and the static power consumption of the amplifier. Conversely, if all elements of the subarray are in the open passive reflection mode, the array only generates static power consumption of the phase-shifting circuit. By iterating through and accumulating the power consumption of all subarrays under these conditions, the system can accurately calculate the total static circuit power consumption of the entire physical reflective array under the current control command.

[0059] In this application, for the energy model .make Defined as the presence indicator of an active cell in the l-th subarray, i.e., the active state of the subarray, its formula is as follows.

[0060]

[0061] Specifically, if If at least one RE in the l-th subarray is in active mode, then at least one RE exists in active mode; otherwise... Therefore, the static circuit power consumption of the l-th subarray can be expressed as: Then the total static power consumption can be expressed as: This refers to the total static circuit power consumption of the hybrid sub-connected reflector.

[0062] In step 122 of some embodiments, the system calculates the dynamic amplification power consumption of the hybrid sub-connected reflector based on amplifier conversion efficiency, third channel state information, transmit power parameters, phase shift parameter sequence, amplification factor parameter sequence, and input noise variance. Unlike the static power consumption that maintains the basic operation of the circuit, "dynamic amplification power consumption" refers to the additional power consumption generated by the energy conversion process when the shared amplifier circuit substantially amplifies a weak radio frequency signal. In active reflection mode, the strength of the input radio frequency signal received by the reflector directly depends on the transmit power parameters set by the mobile terminal and the third channel state information after the signal propagates through space (i.e., the physical fading of the incident link from the user to the reflector). Simultaneously, the input signal inevitably contains the basic thermal noise of the space environment (mathematically represented by the input noise variance). After calculating the actual received power of the incident signal by combining these underlying physical quantities, the system further introduces the phase shift parameter sequence and amplification factor parameter sequence issued by the control terminal to deduce the total power of the radio frequency signal expected to be output by the amplifier circuit. Finally, considering the inherent thermal conversion loss in the physical hardware when converting internal DC power into space radio frequency energy, the system introduces a preset amplifier conversion efficiency index. The ratio of RF output power to actual DC power consumption is converted to the reciprocal to accurately calculate the dynamic amplification power consumption cost actually incurred by the entire array when physically amplifying the signal.

[0063] In this application, the circuit power consumption (i.e., dynamic amplification power) of the RIS is shown in the following formula, which includes the power consumed by the phase shifting circuit and the amplification circuit, as well as the amplification power consumption of the active unit.

[0064]

[0065] In step 123 of some embodiments, the system adds the total static circuit power consumption to the dynamic amplification power consumption, and then multiplies it by the auxiliary communication duration to obtain the energy consumption item for the current time frame. The product of power and time is the fundamental law of work (energy) in physics. After clarifying the instantaneous power composition of the reflector during operation, the system needs to convert it into the absolute power consumption value within a specific time span. Specifically, the processor first mathematically adds the total static circuit power consumption used to maintain the operation of the underlying hardware infrastructure to the dynamic amplification power consumption used for RF signal spatial gain enhancement to obtain the overall instantaneous operating power of the hybrid sub-connected reflector in the working state. Subsequently, the system extracts the time weight dynamically allocated to the signal reflection and amplification process in the adaptive communication protocol of this application, i.e., the auxiliary communication duration. By directly multiplying the total instantaneous operating power obtained from the above calculation by the auxiliary communication duration, the system can accurately quantify and derive the total energy share actually extracted from the internal battery by the entire self-powered reflector array to complete the communication enhancement task within the communication auxiliary window phase of the current time frame, i.e., the final output energy consumption item. .

[0066] In step 124 of some embodiments, the system performs boundary constraint calculations based on the remaining battery energy of the previous time frame, the environmental energy acquisition item of the current time frame, the energy consumption item, and the maximum battery capacity to obtain the battery energy update function. The battery of a self-powered device is like a dynamically balanced reservoir; the rise and fall of its stored energy directly determines the system's survival or shutdown state. The "battery energy update function" is a mathematical model used to accurately characterize this cross-frame energy flow causal relationship. During model derivation, the system first reads the physical charge that was actually retained at the end of the previous time frame, i.e., the remaining battery energy of the previous time frame. Next, the system accumulates the charge captured and stored from the natural or radio frequency environment (i.e., the environmental energy acquisition item) within the current time frame through the energy acquisition circuit, representing energy injection; simultaneously, it subtracts the calculated energy consumption item lost due to the drive circuit and amplification signal, representing energy expenditure and consumption. To ensure that the mathematical model does not violate the energy storage limits of real physical hardware, the system further introduces the physical upper limit parameter of the battery's maximum capacity. By performing boundary constraint calculations that "take the smaller value" (i.e., if the total charge and discharge charge calculated mathematically exceeds the battery's physical capacity, the excess energy is considered invalid, and the actual charge is forcibly truncated to the battery's maximum capacity value), a tightly closed-loop battery energy update function is finally constructed, thereby accurately predicting the actual available battery charge at the end of the current time frame.

[0067] In this application, let E(t) represent the remaining charge of the rechargeable battery of the self-powered RIS at the beginning of the t-th time frame, i.e., the state of energy. Assume that in the (t+1)-th time frame, the RIS can only consume the charge accumulated in the t-th time frame and before. Therefore, the charge consumption of the RIS in the t-th time frame must not exceed its initial battery charge, as shown in the following formula.

[0068]

[0069] Further derivation shows that the charge of the rechargeable battery at the beginning of the (t+1)th time frame can be expressed as the following formula, which is the battery energy update function.

[0070]

[0071] in This represents the amount of electricity collected within the t-th time frame. This indicates the battery's maximum capacity.

[0072] Through steps 121 to 124 above, by deconstructing the complex physical power consumption into static power consumption closely linked to the activation state of the underlying subarray and amplified power consumption dynamically related to channel characteristics and amplification gain, and strictly combining energy integration with the auxiliary communication duration of time-domain scheduling, this method extremely accurately reproduces the real power consumption law of the innovative hardware architecture under different action combinations. More importantly, by introducing environmental energy harvesting characteristics and the maximum capacity boundary of the battery for convergence calculation, the system successfully constructs a battery energy update function that follows the law of conservation of physical energy and the energy storage limit of the underlying hardware. This function is like a high-fidelity digital twin environment, providing absolutely reliable underlying state feedback for the long-term time-series scheduling of subsequent deep reinforcement learning algorithms. It fundamentally ensures that any parameter optimization strategy output by the algorithm will not deviate from the actual carrying capacity of the battery hardware, thus laying a solid data logic foundation for the long-term, uninterrupted, and stable operation of the self-powered communication system in energy-constrained environments.

[0073] Step 130: Based on the first decoding signal-to-noise ratio function, the second decoding signal-to-noise ratio function, and the auxiliary communication duration, construct a transmission parameter optimization problem with the goal of maximizing the long-term throughput of the system.

[0074] Step 130 will be described in detail below.

[0075] In step 130 of some embodiments, a transmission parameter optimization problem is constructed based on the first decoding signal-to-noise ratio (SNR) function, the second decoding SNR function, and the auxiliary communication duration, with the goal of maximizing the long-term system throughput. System throughput refers to the amount of effective data successfully transmitted by the network per unit time, and is the most direct indicator for evaluating the capacity and service quality of a communication system. To achieve optimal network performance, the system needs to comprehensively consider the transmission rates of different communication stages. Specifically, based on Shannon's theorem, the system uses the aforementioned first and second decoding SNR functions to calculate the instantaneous data transmission rates in the auxiliary communication stage and the pure energy harvesting stage, respectively. Subsequently, using the auxiliary communication duration and remaining duration as time weights, the data rates of these two stages are weighted and summed over the entire time frame to obtain the total system data transmission volume for the current single frame. Furthermore, considering the randomness of energy acquisition in a self-powered environment, pursuing the optimality of a single communication often leads to subsequent power depletion and disconnection. Therefore, the system performs cumulative expectation calculation on the total data transmission volume over multiple consecutive time frames, and combines the physical constraint that battery energy cannot be overdrawn and the maximum power of the device to rigorously construct a complex nonlinear transmission parameter optimization problem at the mathematical level, which aims to maximize the long-term throughput of the system, as described below.

[0076] The process involves constructing a transmission parameter optimization problem with the goal of maximizing the long-term throughput of the system, based on the first decoding signal-to-noise ratio function, the second decoding signal-to-noise ratio function, and the auxiliary communication duration. This includes the following steps 131 to 135.

[0077] Step 131: Calculate the energy consumption of the hybrid sub-connected reflector in the current time frame, obtain the current available energy of the battery, and construct energy causal constraints based on the numerical relationship between the energy consumption and the current available energy of the battery.

[0078] Step 132: Obtain the auxiliary communication duration and the total duration of the current time frame. Based on the numerical relationship between the auxiliary communication duration and the current time frame, construct the duration boundary constraints.

[0079] Step 133: Based on the first decoding signal-to-noise ratio function and the second decoding signal-to-noise ratio function, calculate the total amount of data uploaded by a single user in the auxiliary communication stage and the pure energy harvesting stage, respectively, and construct service quality constraints based on the numerical relationship between the total amount of data uploaded by a single user and the minimum service quality threshold.

[0080] Step 134: Obtain the maximum transmit power threshold, the maximum amplification threshold, and the discrete operating space. Based on the numerical relationship between the transmit power parameter and the maximum transmit power threshold, the numerical relationship between the amplification coefficient parameter sequence and the maximum amplification threshold, and the numerical relationship between the mode switch parameter sequence and the discrete operating space, construct the physical constraints of the device.

[0081] Step 135: Based on the first decoding signal-to-noise ratio function, the second decoding signal-to-noise ratio function, the auxiliary communication duration, the energy causal constraint, the duration boundary constraint, the quality of service constraint, and the equipment physical constraint, construct a transmission parameter optimization problem with the goal of maximizing the long-term throughput of the system.

[0082] Steps 131 to 135 are described in detail below.

[0083] In step 131 of some embodiments, the system calculates the energy consumption of the hybrid sub-connected reflector in the current time frame, obtains the current available battery energy, and constructs an energy causal constraint based on the numerical relationship between the energy consumption and the current available battery energy. In a self-powered communication system, the "energy causal constraint" is the core physical law ensuring the system's continuous operation. Its essence is that the amount of electricity consumed by the system at any given time must not exceed the amount of electricity it has already acquired and stored. Specifically, in this step, the processor first calculates the overall energy consumption of the hybrid sub-connected reflector during the auxiliary communication phase to maintain the shared phase-shifting circuit, the shared amplification circuit, and the operation of each reflector unit in active mode, based on decision parameters. Then, the system reads the current available battery energy, which is the actual remaining physical amount of electricity in the battery unit at the end of the previous time frame. By strictly limiting the energy consumption to be less than or equal to the current available battery energy in the mathematical model, the system successfully constructs the energy causal constraint. This constraint mathematically restricts the subsequent parameter optimization, fundamentally avoiding the serious risk of excessive discharge caused by the algorithm's pursuit of extreme performance in a single communication, which could lead to a complete power outage and system crash of the reflector.

[0084] In step 132 of some embodiments, the system obtains the auxiliary communication duration and the total duration of the current time frame, and constructs duration boundary constraints based on the numerical relationship between the auxiliary communication duration and the current time frame. Under the adaptive auxiliary communication protocol proposed in this application, a complete fixed time frame is dynamically divided into two different stages: one for communication and the other for energy harvesting. This requires that the division of time parameters must conform to the basic physical time evolution logic. The system extracts the dynamic optimization variable output by the intelligent model, namely the auxiliary communication duration, and simultaneously obtains the total duration of the current time frame, which is predefined in the system parameters. By limiting the auxiliary communication duration to be greater than or equal to zero and simultaneously less than or equal to the total duration of the current time frame (i.e., mathematically expressed as duration greater than or equal to 0 and less than or equal to T), the system constructs duration boundary constraints. The existence of this constraint ensures that the scheduling and allocation of time resources by the system are physically reasonable and meaningful, avoiding the generation of invalid action instructions that exceed the actual physical frame length limit when the optimization algorithm explores the state space.

[0085] In step 133 of some embodiments, the system calculates the total amount of data uploaded by a single user during the assisted communication phase and the pure energy harvesting phase, respectively, based on the first decoding signal-to-noise ratio (SNR) function and the second decoding SNR function. Based on the numerical relationship between the total amount of data uploaded by a single user and the minimum quality of service (QoS) threshold, a QoS constraint is constructed. In multi-user wireless communication networks, QoS is a crucial indicator for measuring whether each mobile terminal can obtain a basic communication experience. It prevents network resources from being excessively tilted to maximize total throughput, causing some edge users to be "starved" and disconnected. In this step, the system uses the first decoding SNR function combined with the assisted communication duration to calculate the amount of data transmitted by the mobile terminal with reflector enhancement assistance; simultaneously, it uses the second decoding SNR function combined with the duration of the pure energy harvesting phase to calculate the amount of data transmitted by the terminal solely through the direct link. Adding these two parts yields the total amount of data uploaded by a single user within that time frame. Subsequently, the system obtains a preset minimum service quality threshold and constructs a service quality constraint by limiting the total amount of data uploaded by each mobile terminal to be greater than or equal to this minimum service quality threshold. This forces the network to take into account and guarantee the basic communication connectivity needs of all individual users while pursuing the maximization of the overall system throughput.

[0086] In step 134 of some embodiments, the system obtains the maximum transmit power threshold, the maximum amplification threshold, and the discrete operating space. Based on the numerical relationship between the transmit power parameter and the maximum transmit power threshold, the numerical relationship between the amplification factor parameter sequence and the maximum amplification threshold, and the numerical relationship between the mode switch parameter sequence and the discrete operating space, it constructs physical constraints for the device. Actual deployed hardware devices are all subject to the absolute limitations of the inherent manufacturing parameters of the physical components, and cannot output radio frequency energy or gain without limit. Specifically, due to hardware limitations of the power amplifier, the transmit power parameter of the mobile terminal's radio frequency front-end cannot exceed its hardware-permitted maximum transmit power threshold; similarly, the amplification factor of the shared amplifier circuit in the hybrid sub-connector reflector is limited by the saturation region of the electronic components, and its amplification factor parameter sequence cannot exceed its maximum amplification threshold. Furthermore, since the mode switching switch only has two definite physical states at the physical level—closed and open—its corresponding mode switch parameter sequence must be strictly limited to a discrete operating space containing only the digits 0 and 1. By clarifying the numerical relationship between these continuous or discrete parameters and their physical extreme values, the system constructs the physical constraints of the device, ensuring that all optimized control strategies in the final output are fully within the execution capabilities of the underlying real hardware.

[0087] In step 135 of some embodiments, the system constructs a transmission parameter optimization problem aimed at maximizing the long-term throughput of the system, based on the first decoding signal-to-noise ratio (SNR) function, the second decoding SNR function, the auxiliary communication duration, energy causal constraints, duration boundary constraints, quality of service (QoS) constraints, and device physical constraints. After completing all the preliminary modeling preparations, the system systematically integrates and assembles all the above-mentioned independent mathematical models and physical constraints. The system uses the auxiliary communication duration and the pure energy harvesting phase duration as time weights, integrates the first and second decoding SNR functions to characterize the overall data transmission rate of a single frame, and then establishes the final optimization objective function of "maximizing the long-term throughput of the system" through long-term expectation accumulation calculation across multiple time frames. Subsequently, the system uses the carefully established energy causal constraints, duration boundary constraints, QoS constraints, and device physical constraints from the previous steps as the hard outer boundary of this objective function. Through this rigorous mathematical logic assembly, the system successfully constructs a comprehensive, realistic, and flawless transmission parameter optimization problem, providing a precise virtual twin environment for subsequent algorithm solutions.

[0088] In this application, the goal of the self-powered RIS-assisted uplink communication system is to maximize the long-term throughput of the system—that is, the mathematical expectation of the total amount of information uploaded by all users to the base station—under the constraints of RIS energy consumption and user upload service quality requirements. The optimization variables include the operating mode indicator sub-vector. Amplifier amplification factor vector Phase shift of L groups of reflection units The first stage of base station beamforming vector Second-stage base station beamforming vector The transmit power of J users in the first phase The second phase involves the transmission power of J users. Communication duration .make The corresponding transmission parameter optimization problem can be represented by the following formula.

[0089]

[0090] In problem (P1), C1-C9 are the relevant constraints described above. C1 represents the energy state evolution of the rechargeable battery. C2 represents the energy consumption constraint for time frame t, C3 represents the discrete constraint for mode switching, and C4 represents the discrete constraint for phase shift values. C5 is determined by the maximum value (i.e., ρ). max The constraint amplification factor, C6, is determined by p. max The C7 sets an upper limit on user transmit power by defining Q. min (Minimum number of data bits that each user must upload to the base station in each time frame) Enforces QoS requirements. C8 imposes beamforming normalization constraints on the base station during the reception phase. C9 specifies the duration constraints of RIS-assisted communication within each time frame.

[0091] Through steps 131 to 135 above, by deeply binding the abstract long-term network throughput target with the concrete underlying physical environment (such as the dynamic flow of battery power, the power limits of hardware devices, the irreversible derivation of physical timing, and the basic communication rights of each user), this scheme realistically recreates the operational boundaries of the hybrid sub-connected reflector system in the real world in a one-to-one ratio within a virtual mathematical computation space. This approach, which rigorously decomposes and separately constructs multi-dimensional constraints such as energy, duration, quality of service, and device physics, not only ensures that the subsequent deep reinforcement learning model will never deviate from actual physical laws when optimizing policies, effectively avoiding the risk of outputting ineffective policies that are "theoretically optimal but will cause hardware crashes," but also fundamentally guarantees that the entire self-powered communication system can robustly, fairly, and extremely exploit the strongest potential of the sub-array shared hardware without overdrawing any precious battery energy, ultimately achieving a comprehensive maximization of system communication reliability and long-term network throughput.

[0092] Step 140: Decouple the transmission parameter optimization problem into a lower-level beamforming optimization sub-problem and an upper-level resource joint optimization sub-problem based on the beamforming parameters.

[0093] Step 140 will be described in detail below.

[0094] In step 140 of some embodiments, the transmission parameter optimization problem is decoupled into a lower-level beamforming optimization subproblem and an upper-level resource joint optimization subproblem based on beamforming parameters. The transmission parameter optimization problem constructed in the aforementioned steps involves a large number of variables of different types, including not only continuous variables (such as amplification factor, transmission power, duration, and beamforming) but also discrete variables (such as mode switching Boolean states). This mixed action space makes the problem mathematically highly non-convex, making it difficult to solve directly using traditional algorithms. To reduce computational complexity and find the optimal solution, the system adopts a "decoupling" strategy. Decoupling refers to breaking down a highly coupled and complex large problem into multiple interconnected smaller problems with different solution methods. Specifically, the system extracts the "beamforming parameters," which are purely applied to the base station receiver and exhibit continuous characteristics, to form a "lower-level beamforming optimization subproblem" that, given other network parameters, can quickly obtain a closed-form optimal solution using standard convex optimization criteria such as minimum mean square error (MMSE). The remaining control variables that are highly intertwined with the allocation of reflector resources (including phase shift, amplification, switching, duration, and transmission power) are uniformly classified into the "upper-level resource joint optimization subproblem" that requires long-term strategy exploration using artificial intelligence algorithms, thereby greatly streamlining the solution logic of mathematical optimization.

[0095] In this application's scheme, in the transmission parameter optimization problem (P1), due to C1... It is random, and energy scheduling between different time frames is coupled with each other, with discrete optimization variables. The transfer parameter optimization problem (P1) is a high-dimensional stochastic programming and mixed integer programming problem, which is difficult to solve using traditional optimization methods because it is highly coupled with other optimization variables in constraints C2 and C7, and there are many optimization variables.

[0096] To address this problem, this application proposes a hierarchical deep reinforcement learning-based approach. In this approach, problem (P1) is decomposed into two sub-problems: an upper-level resource joint optimization sub-problem and a lower-level beamforming optimization sub-problem. The objective of the lower-level beamforming optimization sub-problem is to... ,optimization The upper-level resource joint optimization subproblem is the optimal solution obtained based on the lower-level subproblem. ,optimization .

[0097] when Given the problem P1, it can be simplified to the lower beamforming optimization subproblem as shown in the following formula (P1-1).

[0098]

[0099] Furthermore, this problem (P1-1) can be further transformed into the following deterministic problem, namely, the lower-level beamforming optimization subproblem is as shown below (P1-1').

[0100]

[0101] question This is a classic convex optimization problem, and its solution can be obtained analytically. The received beamforming vector of user j based on MMSE during the first stage (i.e., the auxiliary communication stage) can be expressed as shown in the following formula.

[0102]

[0103]

[0104] The receiving beamforming vector of user j based on MMSE during the second phase (i.e., the pure energy harvesting phase) can be expressed as shown in the following formula.

[0105]

[0106]

[0107] Based on the problem (P1) and the lower-level subproblems (P1-1), the solution can be obtained. The upper-level resource joint optimization subproblem is shown in the following formula (P1-2).

[0108]

[0109] Problem (P1-2) remains a complex dynamic programming and mixed-integer programming problem. Therefore, this application proposes a deep reinforcement learning method to solve this problem. First, problem (P1-2) is modeled as an MDP process. Then, to address the presence of discrete and continuous mixed actions in the action space of this MDP, an HPPO DRL algorithm based on an H-AC neural network architecture is proposed to solve the decision process and train the deep neural network. Using the trained deep neural network, the optimal solution to problem (P1-2) can be obtained, as described below.

[0110] Step 150: Construct an initial transmission parameter optimization model based on the upper-level resource joint optimization sub-problem and the battery energy update function, and perform multiple rounds of training on the initial transmission parameter optimization model to obtain the transmission parameter optimization model.

[0111] Step 150 will be described in detail below.

[0112] In step 150 of some embodiments, an initial transmission parameter optimization model is constructed based on the upper-layer resource joint optimization sub-problem and the battery energy update function. This initial transmission parameter optimization model is then trained multiple times to obtain the final transmission parameter optimization model. For the extremely complex upper-layer resource joint optimization sub-problem, which requires energy scheduling across time frames and exhibits typical Markov Decision Process (MDP) characteristics, the system introduces deep reinforcement learning (DLM) technology. First, the system merges the battery state (described by the battery energy update function) and the real-time channel state into an environmental state space, and sets the throughput target as the reward function for reinforcement learning. Based on this, an initial transmission parameter optimization model containing a deep neural network architecture (such as an actor network and a commentator network) is constructed. Subsequently, in the constructed simulation training environment, the system controls this initial model to continuously perform action exploration, state observation, and reward acquisition. After multiple rounds of training and experience replay over a large number of time steps, the network node parameters within the model are fully iterated and solidified through a gradient update mechanism until the model fully converges. The resulting transmission parameter optimization model possesses strong environmental adaptability and online reasoning capabilities, and can instantly output a set of optimal resource configuration parameter sequences based on the system state input at any given time.

[0113] The initial transmission parameter optimization model is constructed based on the joint optimization sub-problem of upper-layer resources and the battery energy update function, including the following steps 151 to 155.

[0114] Step 151: Determine the state space based on the first channel state information, the second channel state information, the third channel state information, and the energy state output by the battery energy update function.

[0115] Step 152: Determine the action space based on the amplification factor parameter sequence, transmission power parameter, auxiliary communication duration, mode switch parameter sequence, and phase shift parameter sequence.

[0116] Step 153: Based on the system's long-term throughput and quality of service constraints in the upper-level resource joint optimization subproblem, construct the throughput reward function.

[0117] Step 154: Determine the initial commenter parameters of the commenter network, the initial first actor parameters of the first actor network, and the initial second actor parameters of the second actor network.

[0118] Step 155: Based on the state space, action space, throughput reward function, initial commenter parameters, initial first actor parameters, and initial second actor parameters, obtain the initial transmission parameter optimization model.

[0119] Steps 151 to 155 are described in detail below.

[0120] In this application, when using deep reinforcement learning to solve the upper-layer resource joint optimization sub-problem (P1-2), the base station acts as an agent learning the optimal policy. Furthermore, to model the upper-layer resource joint optimization sub-problem (P1-2) as a Markov decision process, the corresponding state space needs to be defined. Action space and the reward function r. Wherein, the state space... It is the set of all possible states, the action space. This is the set of all possible actions. Therefore, specifically, for problem (P1-2), its corresponding Markov decision process state, actions, and reward function can be described as follows.

[0121] In step 151 of some embodiments, the system determines the state space based on the first channel state information, the second channel state information, the third channel state information, and the energy state output by the battery energy update function. In the Deep Reinforcement Learning (DRL) algorithm framework, "state space" refers to the set of environmental features that the agent (here, the control unit of the base station) can observe before making a decision. Since this application faces a dynamic resource allocation problem across time frames, the system must fully perceive changes in the physical environment. Specifically, the system uses the first channel state information (direct link), the second channel state information (reflection link), and the third channel state information (incident link), which reflect the propagation fading of spatial electromagnetic waves, as observation features of the wireless environment; at the same time, it uses the latest layer of physical battery available power (i.e., energy state) output by the battery energy update function as an observation feature of the hardware dimension. By mathematically concatenating and normalizing these multi-dimensional dynamic variables, the system constructs a complete state space, enabling the subsequent intelligent model to "see clearly" the system's communication capability and energy endurance base in the current time frame.

[0122] In the scheme of this application, the state space The definition should include as much environmental information as possible. In the self-sustaining RIS-assisted communication system studied in this application, the state consists of the channel state information (CSI) of all channels and the available energy state information of the rechargeable battery. Therefore, at the t-th time step of training the neural network, the instantaneous state is... The CSI of all channels and the available energy state of the rechargeable battery are defined as follows: (i.e., ...) .

[0123] In step 152 of some embodiments, the system determines the action space based on the amplification factor parameter sequence, transmission power parameter, auxiliary communication duration, mode switching parameter sequence, and phase shift parameter sequence. Corresponding to the state space, the "action space" refers to the set of all legal decision instructions that an agent can take after observing the environmental state. Since this application proposes a hybrid sub-connection architecture and adaptive timing protocol, its decision variables are extremely complex, exhibiting high-dimensional and a mixture of discrete and continuous characteristics. Specifically, the system integrates continuous variables used to regulate the reflector physical hardware (i.e., the amplification factor parameter sequence), continuous variables of the terminal transmitter (i.e., the transmission power parameter), and continuous variables of time-domain scheduling (i.e., the auxiliary communication duration) with variables possessing discrete or even Boolean characteristics (i.e., the mode switching parameter sequence for controlling the on / off state of control elements, and the quantized phase shift parameter sequence). The system clearly defines the physical extreme values ​​and value dimensions of these parameters, thereby establishing a hybrid action space and clearly defining the operational boundaries of the model's control over the underlying communication hardware and timing.

[0124] In this application, the motion space In the upper-level resource joint optimization subproblem (P1-2), the optimization variables include the working mode indicator vector and the amplification coefficient vector. Phase shift matrix The first stage of base station beamforming vector Second-stage base station beamforming vector Phase 1 User Transmit Power Vector Second phase user transmit power vector RIS auxiliary communication time length This corresponds to the amplification factor parameter sequence, transmission power parameter, auxiliary communication duration, mode switching parameter sequence, and phase shift parameter sequence. Therefore, at the t-th time step, the instantaneous action... Can be defined as .

[0125] Furthermore, to satisfy the constraints of problem (P1-2), the action taken at the k-th time step should meet the following requirements: 1) To satisfy constraints C2 and C9, The value of should satisfy ; 2) To satisfy constraint C3, for any element in It should meet ; 3) To satisfy constraint C4, for any It should meet ; 4) To satisfy constraint C5, for any element in It should meet ; 5) To satisfy constraint C6, for any and They should satisfy respectively and Constraint C7 will be addressed in the definition of the reward function to ensure that the action taken by the agent after learning the optimal policy satisfies this constraint.

[0126] In step 153 of some embodiments, the system constructs a throughput reward function based on the long-term throughput and quality of service constraints in the upper-layer resource joint optimization subproblem. In reinforcement learning, the "reward function" is the sole evaluation metric guiding the evolution of the neural network, acting like a "compass" during model training. The core optimization objective of this scheme is to maximize the long-term throughput of the system; therefore, the total data rate obtained by the system after performing a certain action constitutes the basic positive score for the reward. However, to prevent the algorithm model from sacrificing users with poor edge channels in pursuit of the ultimate total throughput (i.e., falling into a "selfish" strategy of local optima), the system transforms the quality of service (QoS) constraint into a penalty term and integrates it into the reward mechanism. When the action sequence output by the model causes the transmission rate of any mobile terminal to fall below the minimum quality of service threshold, the system triggers a severe numerical penalty (e.g., assigning zero points or a large negative score). Through this mathematical design combining positive incentives and negative penalties, the system successfully constructs a throughput reward function, forcing the model to simultaneously consider maximizing the global rate and ensuring communication fairness for individual users during exploration.

[0127] In this application, regarding the reward function, when using deep reinforcement learning methods to solve the optimal solution of the optimization problem, the reward function is usually defined based on the objective function in the optimization problem. For the upper-layer resource joint optimization subproblem (P1-2), its optimization objective is the long-term throughput of the system. Therefore, in this application, the amount of information received by the base station in the t-th time frame is taken as the immediate reward at the t-th time step (i.e., derived from the optimization objective function of the upper-layer resource joint optimization subproblem). Furthermore, considering that the action taken at the t-th time step may cause constraint C7 to be unsatisfactory, the throughput reward function of the corresponding MDP for problem (P1-2) is defined as shown in the following formula.

[0128]

[0129] In step 154 ​​of some embodiments, the system determines the initial commenter parameters of the commenter network, the initial first actor parameters of the first actor network, and the initial second actor parameters of the second actor network. For a complex mixed action space containing both discrete and continuous variables, traditional single neural networks struggle to simultaneously and stably output these two vastly different types of decision values. Therefore, this application employs an advanced dual-actor architecture at the algorithm's underlying layer, such as Hierarchical Proximal Policy Optimization (HPO). Specifically, the "first actor network" is specifically designed to output discrete actions (such as mode switching and phase shifting), while the "second actor network" is specifically designed to output continuous actions (such as amplification factor, power, and duration); simultaneously, a "commenter network" is configured to evaluate the value of the current state and guide the gradient updates of both actor networks. In this step, the system randomly assigns values ​​to the internal node weights and biases of the three deep neural networks or preloads them using prior knowledge, thereby establishing the initial commentator parameters, the initial first actor parameters, and the initial second actor parameters, preparing an unpolished "initial brain" for subsequent deep learning training.

[0130] In step 155 of some embodiments, the system obtains an initial transmission parameter optimization model based on the state space, action space, throughput reward function, initial reviewer parameters, initial first actor parameters, and initial second actor parameters. After defining and initializing all core reinforcement learning elements, the system logically assembles them at the software level. The system uses the state space as the data input port of the neural network, the action space as the decision output port of the neural network, the throughput reward function as the feedback arbiter for evaluating the quality of the model output, and three neural networks containing initial reviewer parameters, initial first actor parameters, and initial second actor parameters as the core inference engine. By binding these components together in a virtual simulation environment, a structurally complete but not yet empirically iterated initial transmission parameter optimization model is obtained. This initial model already possesses the complete physical and algorithmic logic for interacting with the digital twin environment of the communication system, undergoing trial and error, and receiving feedback.

[0131] In this application, specifically in this embodiment, the outer subproblem involves the joint optimization of RIS phase, operating mode, amplification factor, and auxiliary duration. To address this hybrid action space characteristic, a Hybrid Proximal Policy Optimization (HPPO) algorithm is employed for solution.

[0132] The HPPO algorithm is a deep reinforcement learning method based on policy gradient. Its network architecture includes a Critic network for estimating the state-value function and two independent Actor networks: a discrete actor network for outputting discrete action policies and a continuous actor network for outputting continuous action policies.

[0133] Let the parameters (i.e., the initial first actor parameters) of the discrete actor network (i.e., the first actor network) be: The parameters of the continuous actor network (i.e., the second actor network) (i.e., the initial second actor parameters) are: The parameters of the critic network (i.e., the initial critic network) are: The policy parameters are updated using a gradient-based method, which in turn updates the value function.

[0134] Specifically, let the state-value function of the agent be... Its meaning is from the current state Initially, the expected cumulative reward is determined by the current policy. The core of the HPPO algorithm lies in constructing an agent objective function for policy updates. For discrete actor networks (... ) and continuous actor network ( The optimization problem for each of these problems can be expressed as shown in the following formula.

[0135]

[0136] in, Indicates time frame Expectations; Indicating a new strategy and old strategies The probability ratio is defined as ; The advantage function, in this embodiment, is defined as the temporal difference residual calculated by the critic network, and is specifically expressed as shown in the following formula.

[0137]

[0138] in, The immediate reward for the current time frame. As a discount factor, and These are the estimates for the next state and the current state, respectively.

[0139] To improve training stability, the HPPO algorithm provided in this application employs a truncation mechanism to ensure that the update magnitude between the old and new strategies is not too large. Specifically, This is a truncation function. To control the cutoff range of the hyperparameters, as the gradient is updated, regardless of the dominance function... Whether positive or negative, this mechanism ensures that the target strategy does not deviate too far from the behavioral strategy; It is policy entropy. The entropy coefficient is added to the objective function to encourage the agent to try different actions, thereby ensuring that the algorithm explores fully during training and avoids getting trapped in local optima.

[0140] During the network parameter update phase, the parameters of the discrete actor network (i.e., the first actor network), the continuous actor network (i.e., the second actor network), and the critic network (i.e., the commenter network) are all updated using the Mini-batch Stochastic Gradient Descent (SGD) method. That is, at each update, parameters are randomly drawn from the experience replay pool. Each transformation record updates the network parameters. For two actor networks ( ), respectively by maximizing the above objective functions To update parameters (Including the initial first actor parameters and the initial second actor parameters) is shown in the following formula.

[0141]

[0142] in, This represents the learning rate of the corresponding actor network. Indicates the parameter Calculate the gradient. For Critic, update the parameters by minimizing the mean squared error between the estimated and target values. (i.e., the initial commenter parameters) are shown in the following formula.

[0143]

[0144] in, This represents the learning rate of the critic network. The target state value function is expressed by the following formula: .

[0145] Through steps 151 to 155 above, by accurately mapping complex communication physical constraints (such as channel fading and battery turnover) into the state space of a Markov decision process, and mapping the high-dimensional control methods of hybrid sub-connection hardware into a hybrid action space, this application constructs a rigorous mathematical bridge between artificial intelligence and underlying physical hardware. Furthermore, by innovatively separating the first actor network and the second actor network to independently handle discrete and continuous actions respectively, and combining this with a fair throughput reward function, this method overcomes the technical bottleneck of traditional convex optimization algorithms' inability to solve the high-dimensional mixed-integer nonlinear programming (MINLP) problem from the root of the algorithm architecture. This not only endows the self-powered communication system with an intelligent core framework capable of self-learning and self-evolution, but also ensures that the subsequently trained final-state model can robustly and rapidly output the optimal global parameter configuration strategy in complex multi-user networks with extremely limited energy.

[0146] The process of training the initial transmission parameter optimization model multiple times to obtain the transmission parameter optimization model includes the following steps 156 to 1510.

[0147] Step 156: In each training time step, input the current state information into the first actor network and the second actor network respectively, and sample the current discrete action and the current continuous action respectively.

[0148] Step 157: Calculate the current beamforming vector based on the current discrete action and the current continuous action, combined with the lower-level beamforming optimization subproblem.

[0149] Step 158: Apply the current discrete action, the current continuous action, and the current beamforming vector to the simulation environment, obtain the current throughput reward calculated based on the throughput reward function, and obtain the updated state information for the next time step.

[0150] Step 159: Store the transformation tuple containing current status information, action information, reward information, and updated status information into the experience replay pool.

[0151] Step 1510: When the time step reaches the update cycle, a small batch of samples is drawn from the experience replay pool. Based on the advantage function, the cutoff function, and the policy entropy constraint function, gradient updates are performed on the initial commenter parameters, the initial first actor parameters, and the initial second actor parameters, respectively, until the model converges and the transmission parameter optimization model is output.

[0152] Steps 156 to 1510 are described in detail below.

[0153] In step 156 of some embodiments, in each training time step, the current state information is input into the first actor network and the second actor network respectively, and the current discrete action and the current continuous action are sampled respectively. In the iterative training process of deep reinforcement learning, a "time step" represents the smallest time scale for a complete interaction between the model and the environment. At the beginning of each time step, the system synchronously inputs the current state information, including the current channel state and the remaining battery power, as the underlying feature data into the dual-actor architecture. The first actor network is specifically responsible for handling discrete decisions, and its network output layer generates a probability distribution in the discrete action space. The system samples according to this probability distribution to obtain the current discrete action used to control the switching state and discrete phase shift of each unit of the hybrid sub-connector reflector. Simultaneously, the second actor network is specifically responsible for handling continuous decisions, and its network output layer typically generates the mean and standard deviation of the continuous action distribution (such as a Gaussian distribution). The system samples according to this to obtain the current continuous action used to set the amplification factor, transmission power, and auxiliary communication duration. This parallel sampling mechanism ensures that control parameters of different dimensions can be generated synchronously and independently within the same time step.

[0154] In step 157 of some embodiments, the current beamforming vector is calculated based on the current discrete action and the current continuous action, combined with the lower-level beamforming optimization subproblem. This application employs a layered decoupling strategy in its algorithm design. After the first and second actor networks output the current system resource configuration parameters (i.e., the upper-level reflector control variables, time division variables, and terminal transmit power), these complex non-convex variables are considered known constants within the current time step. The system then substitutes these determined current discrete and continuous actions into the lower-level mathematical model. At this point, the originally highly complex joint optimization problem naturally degenerates into a pure convex optimization subproblem concerning only the beamforming of the base station receiver. The processor uses standard convex optimization criteria such as minimum mean square error (MMSE) to analytically solve this lower-level beamforming optimization subproblem, thereby quickly and accurately calculating the current beamforming vector matching the upper-level action. This step effectively compensates for the insufficient accuracy of pure neural networks when dealing with high-precision continuous convex space solutions.

[0155] In step 158 of some embodiments, the current discrete action, the current continuous action, and the current beamforming vector are applied to the simulation environment to obtain the current throughput reward calculated based on the throughput reward function, and to obtain the updated state information for the next time step. After obtaining complete system control parameters, the system applies these three parts of action information uniformly to a pre-constructed mathematical simulation environment of the communication system. This simulation environment simulates the real electromagnetic wave propagation and battery charging and discharging physical processes. After the action is applied, the simulation environment comprehensively evaluates the total data transmission rate brought about by the current action combination under the premise of ensuring the quality of service for all users, based on the throughput reward function predefined in step 153, and feeds back a specific numerical reward signal, namely the current throughput reward. At the same time, due to the time evolution and energy consumption of the communication environment, the channel parameter matrix in the simulation environment will undergo random evolution, and the available battery power will also be calculated, deducted, and recharged according to the battery energy update function. The system reads the new characteristic parameters after the evolution, which constitute the updated state information for the next time step.

[0156] In step 159 of some embodiments, a transition tuple containing current state information, action information, reward information, and updated state information is stored in the experience replay buffer. In deep reinforcement learning, directly using continuously generated data for network training often leads to the model getting stuck in local minima and making convergence extremely difficult. To break the temporal correlation between data, the system introduces an "experience replay buffer" mechanism. Specifically, the system packages the current state information obtained in the aforementioned steps, the action information composed of discrete and continuous actions and beamforming vectors, the reward information composed of the current throughput reward, and the updated state information after environmental evolution into a complete "transition tuple" according to a standard data structure. The processor stores this transition tuple as an independent experience sample in an experience replay buffer of a preset capacity. As the training time steps advance, the experience replay buffer continuously accumulates a large amount of interaction data covering various good and bad strategies, providing rich and independently distributed training materials for subsequent neural network parameter updates.

[0157] In step 1510 of some embodiments, when the time step reaches the update cycle, a small batch of samples is drawn from the experience replay pool. Based on the advantage function, the cutoff function, and the policy entropy constraint function, gradient updates are performed on the initial commenter parameters, the initial first actor parameters, and the initial second actor parameters, respectively, until the model converges and outputs the transmission parameter optimized model. During training, the model does not update parameters after each step, but only starts the learning process when the accumulated time steps reach the preset update cycle. The system randomly draws a batch of historical data (i.e., small batch samples) from the experience replay pool and feeds it into the network. In terms of mathematical update logic, the "advantage function" is used to evaluate the quality of a certain action compared to the average expected state; the "cutoff function" is the core of the proximal policy optimization (PPO) algorithm, used to limit the magnitude of each policy update to prevent the destruction of existing good policies due to excessively large single gradient steps; the "policy entropy constraint function" is used to maintain the randomness of the network output, encouraging the model to continuously explore the unknown action space to avoid premature convergence. The system uses these functions to calculate the overall loss and calculates the gradient through the backpropagation algorithm, performing multiple rounds of fine-tuning and updating the internal weights and bias parameters of the commentator network, the first actor network, and the second actor network. This interactive, sampling, and gradient update cycle will continue for tens of thousands to hundreds of thousands of cycles until the loss function of the neural network tends to stabilize and converge, ultimately outputting a transmission parameter optimization model with powerful intelligent decision-making capabilities.

[0158] Reference Figure 3 This is a schematic diagram of the overall architecture of a transmission parameter optimization model based on the hierarchical hybrid near-end policy optimization (HPPO) algorithm provided in an embodiment of this application. Figure 3 As shown, the architecture mainly consists of two closed loops: a communication and energy physics simulation environment and a deep reinforcement learning agent (i.e., a dual-actor-commentator framework). At each training time step, the physics simulation environment (which internally includes channel and energy evolution models and a battery update model) outputs comprehensive state information based on the current real physical flow of the system. And throughput rewards used to evaluate the merits of the current strategy After receiving this state information, the agent synchronously inputs it as a low-level observation feature into the internal deep neural network for forward feature extraction and policy reasoning.

[0159] Furthermore, to effectively overcome the challenge of high-dimensional mixed action space containing discrete and continuous variables in hybrid sub-connected self-powered systems, the model employs a hierarchical design with parameter decoupling. Specifically, the first actor network (Actor 1) is dedicated to inferring and outputting discrete actions, mainly including the mode switching state and discrete phase shift parameters of the reflection unit; the second actor network (Actor 2) is dedicated to inferring and outputting continuous actions, mainly including the continuous amplification coefficient of the shared amplifier circuit, the transmission power of the mobile terminal, and the auxiliary communication duration for dynamic timing scheduling; simultaneously, the critic network continuously evaluates the value of the current state to guide the gradient update of the parameters of the two actor networks. Subsequently, the action parameters output by the actor networks are fed into the lower-level receive beamforming optimization module, where the optimal receive beamforming vector at the base station is analytically solved using convex optimization criteria. Ultimately, discrete actions, continuous actions, and beamforming vectors are combined to form a complete action tuple for the current time step, which is then applied back to the simulation environment to drive the physical channel and battery state to evolve to the next time step. Through continuous interaction, sampling, and experience replay, the global optimization goal of maximizing the long-term throughput of the system is finally achieved.

[0160] The following is an example of an update training algorithm for an initial transmission parameter optimization model provided in an embodiment of this application.

[0161] Algorithm 1: Hierarchical HPPO Algorithm Enter reward discount factor Experience replay pool capacity and network learning rate .

[0162] Initialize discrete Actor network parameters Continuous Actor Network Parameters Critic network parameters and capacity size Experience replay pool .

[0163] Output the parameters of the trained Actor network .

[0164] 1) for episode to do 2) Initialize the environment and obtain the initial state. ; 3) for to do 4) ; 5) Based on the current state Discrete Actor and continuous Actor networks output action policies respectively; then, discrete actions are obtained by sampling. and continuous action ; 6) Based on the sampling action, the optimal beamforming vector of the inner sub-problem (P1-1) is calculated analytically to obtain all optimization variables; 7) Perform actions and interact with the environment to observe the state of the next time slot. And calculate the reward ; 8) Store experience information To the experience replay pool ; 9) if then 10) Use Calculating the advantage function based on empirical information and target state value ; 11) for epoch to do 12) Disorder Experience information; 13) repeat 14) From Extract small batches of experience information; update Actor parameters. and Critic parameters ; 15) until all experience information has been traversed 16) end for 17) Clear the experience replay pool data; 18) end if 19) end for 20) end for As shown in the code above, the specific training process of the above transmission parameter optimization model is based on a hierarchical hybrid proximal policy optimization (HPPO) algorithm. In the initial stage of the algorithm, the system first needs to construct an empirical replay pool to break the temporal correlation of data (…). The system initializes the internal parameters of the three core neural networks in the deep reinforcement learning framework to an empty state. Simultaneously, it initializes the initial weight parameters of the commenter network. Assign initial weight parameters to the first actor network used to process discrete actions. And assigning initial weight parameters to the second actor network used to process continuous actions. This initialization process provides an unbiased starting point and value assessment benchmark for subsequent agent-environment interactions and gradient descent.

[0165] After entering the formal training cycle, the algorithm sets a total of E training rounds (Episodes), and performs interactive simulations for T time steps (Steps) in each round. At the beginning of each time step t, the system first observes the comprehensive state information output by the current virtual communication environment. This state information accurately characterizes the current cascaded channel fading and the available energy of the battery at the bottom layer of the reflector. Based on the observed state... The system utilizes a two-actor architecture for synchronous and independent action sampling: the first actor network samples actions according to the current policy. Output the corresponding current discrete action (Including the mode switching state and discrete phase shift parameters of each reflection unit), while the second actor network follows the current strategy. Output the corresponding current continuous action (Including the amplification factor of the shared amplifier circuit, the transmission power of the mobile terminal, and the duration of auxiliary communication within the time frame).

[0166] After obtaining the action decisions from the upper layer, the algorithm executes the lower-level solution steps to decouple the parameters. The system then processes the discrete actions obtained from the aforementioned sampling. and continuous action As a known constant, it is substituted into the lower-level beamforming optimization subproblem (i.e., problem (P1-1)) to analytically calculate the optimal receiving beamforming vector at the base station side in the current time step using standard convex optimization criteria. Subsequently, the system will be... , and The complete set of actions, together, is applied to the simulation environment, driving the environment to evolve to the next state. During this interaction, the system calculates and obtains the current throughput reward resulting from the execution of the actions. And observe the updated state information at the next time step. The system will use the aforementioned tuples containing state, action, reward, and next state transitions. Stored completely in the experience replay pool In this process, we can continuously accumulate interactive experience.

[0167] As time steps progress, the algorithm will periodically trigger an iterative update mechanism for network parameters. Specifically, the system will determine the periodicity of time step t, and when the preset update period condition is met (i.e., ... When F is the capacity of the experience replay pool, the system draws from the experience replay pool. A batch of sample data is randomly selected from the dataset. Using these independent and identically distributed samples, the system calculates the advantage function and, in conjunction with an alternative objective function with a truncation mechanism, calculates the policy gradient. This gradient is then applied to the parameters of the first actor network and the parameters of the second actor network, respectively. Perform gradient ascent updates; simultaneously, compute the gradient using the value loss function, applying it to the parameters of the commenter network. Gradient descent updates are performed. Through this alternating cycle of "multi-step interactive sampling - periodic centralized updates", the algorithm achieves robust convergence of network parameters while ensuring exploration efficiency, until all training rounds are completed, and finally outputs a transmission parameter optimization model with optimal timing scheduling and resource allocation capabilities.

[0168] Through steps 156 to 1510 above, a highly specialized hierarchical deep reinforcement learning training process is constructed, deeply coupling the precise convex optimization mathematical solutions of the traditional communication field (lower-layer beamforming) with the powerful non-convex exploration capabilities of the artificial intelligence field (upper-layer dual-actor network). This overcomes the complex nonlinear programming problem involving high-dimensional, discrete, and continuous mixed variables in hybrid sub-connected reflector systems. By introducing an experience replay mechanism, advantage function evaluation, and a PPO algorithm with truncation constraints, this training process ensures extremely high exploration efficiency while achieving stable and monotonic convergence of model parameters. More importantly, the entire massive and complex strategy trial and error and parameter optimization process is completed in the offline simulation training stage. This allows the final output transmission parameter optimization model, when deployed in actual base stations, to instantly output optimal control commands based on real-time fluctuating channel and power environments with only millisecond-level forward inference, greatly reducing the online computation latency of the self-powered communication system and comprehensively ensuring the system's high throughput and high reliability.

[0169] Through steps 110 to 150 above, the proposed solution achieves intelligent and refined parameter optimization for the highly challenging hybrid sub-connected self-powered communication system. Firstly, the proposed solution uses rigorous mathematical modeling (establishing a two-stage signal-to-noise ratio function and an energy update function) to fully digitize the complex physical communication environment and energy flow mechanism, ensuring that the optimization direction never deviates from the actual physical causal laws. Furthermore, by employing a technical approach of "parameter decoupling" combined with "deep reinforcement learning multi-round training," the solution successfully overcomes the computational disaster caused by high-dimensional discrete and continuous mixed variables, cleverly transforming the non-convex optimization problem, which originally required a large amount of time for online solutions, into millisecond-level fast inference by a trained neural network. This not only greatly reduces the decision latency of base stations during actual network deployment but also enables the entire system to intelligently balance energy reserves and network capacity with a global, long-term perspective spanning multiple time frames, thereby fully releasing the performance potential of the proposed hardware architecture and ensuring the communication system's sustained stability and high-speed transmission under harsh environments.

[0170] Step 200: Input the first channel state information, the second channel state information, the third channel state information, and the battery energy state information into the transmission parameter optimization model for data matching processing to obtain the output optimized mode switching sequence, optimized phase shift sequence, optimized amplification coefficient sequence, optimized transmission power, and optimized auxiliary communication duration.

[0171] Step 200 is described in detail below.

[0172] In step 200 of some embodiments, the base station combines the acquired multi-dimensional state features into environmental state variables, and inputs the first channel state information, second channel state information, third channel state information, and battery energy state information into a transmission parameter optimization model for data matching processing, obtaining the output optimized mode switching sequence, optimized phase shift sequence, optimized amplification coefficient sequence, optimized transmission power, and optimized auxiliary communication duration. This "data matching processing" is essentially the forward mapping and policy reasoning process of environmental state feature data in a deep neural network model. After the model's calculation and reasoning, the system outputs multiple action decision commands for the current time frame in parallel. Specifically, these outputs precisely cover the optimized mode switching sequence (such as indicator variables) that controls the independent on / off state of each reflection unit. ), and an optimized phase shift sequence that controls the shared phase shift angle of each subarray (e.g. ), and an optimized amplification factor sequence (e.g., controlling the shared amplification factor of each subarray) ) and the optimized transmission power of each mobile terminal (e.g. and Specifically, the model also outputs a key time-domain control parameter, namely the optimized auxiliary communication duration used to guide subsequent time frame partitioning (e.g., This provides a complete set of parameters for the physical scheduling of subsequent resources.

[0173] Step 300: Solve based on optimized transmit power, first channel state information, second channel state information, third channel state information, optimized phase shift sequence, optimized amplification coefficient sequence, and optimized mode switching sequence to obtain the optimized receive beamforming of the base station.

[0174] Step 300 is described in detail below.

[0175] In step 300 of some embodiments, the system solves for the optimized receive beamforming of the base station based on optimized transmit power, first channel state information, second channel state information, third channel state information, optimized phase shift sequence, optimized amplification coefficient sequence, and optimized mode switching sequence. In multi-antenna communication systems, "receive beamforming" refers to a spatial filtering technique in which the base station adjusts the complex weighting coefficients of each receiving antenna in the antenna array to orient the spatial receive gain toward the target signal direction and suppress environmental interference. In this step, since the resource configuration parameters of the upper reflector and the transmit power parameters of the mobile terminal have been output and kept fixed by the transmission parameter optimization model, the beamforming design on the base station side degenerates into a low-level convex optimization subproblem. The system typically uses mathematical criteria such as linear minimum mean square error (MMSE) for analytical solution to quickly and accurately calculate the optimal receive beamforming vector (e.g., ...) of the base station in the auxiliary communication phase and the pure energy harvesting phase. and This is to achieve optimal reception of spatially cascaded signals.

[0176] Step 400: Divide the current time frame into an auxiliary communication phase and a pure energy harvesting phase based on the optimized auxiliary communication duration.

[0177] Step 400 is described in detail below.

[0178] In step 400 of some embodiments, the system divides the current time frame into an auxiliary communication phase and a pure energy harvesting phase based on the optimized auxiliary communication duration. This is a key timing scheduling action for implementing the adaptive auxiliary communication protocol of this application. Traditional intelligent reflector systems operate continuously at full load throughout the entire time frame period by default, which can easily lead to the depletion of energy in self-powered devices. In this embodiment, however, the system intelligently outputs the optimized auxiliary communication duration based on the transmission parameter optimization model. This involves dynamically truncating and segmenting a complete time frame of fixed total length (e.g., T) along the time dimension. The first segmented time window (i.e., the one with a length of T) is... The time window of T is configured as an auxiliary communication phase specifically for signal reflection and amplification; while the remaining second half of the time window (i.e., the length of T-) is configured as a phase for signal reflection and amplification. The time period is strictly configured as a pure energy harvesting phase used only to replenish battery power from the external environment, thus achieving effective decoupling of communication performance and energy accumulation in the time dimension.

[0179] Step 500: During the auxiliary communication phase, the opening and closing of the mode switching switch is controlled based on the optimized mode switch sequence, the shared phase shift circuit and the shared amplifier circuit are controlled based on the optimized phase shift sequence and the optimized amplification coefficient sequence, and the mobile terminal is controlled to send data to the base station based on the optimized transmit power and the optimized receive beamforming.

[0180] Step 500 is described in detail below.

[0181] In step 500 of some embodiments, during the auxiliary communication phase, the system controls the opening and closing of the mode switching switch based on an optimized mode switch sequence, controls the shared phase shift circuit and the shared amplifier circuit based on an optimized phase shift sequence and an optimized amplification coefficient sequence, and controls the mobile terminal to send data to the base station based on optimized transmit power and optimized receive beamforming. Specifically, in a length of During the auxiliary communication phase time window, the hardware of the reflector is in a fully scheduled operational state. The underlying hardware controls the corresponding mode switching switch to close (i.e., the reflector unit connects to the shared amplifier circuit and enters active reflection mode) or open (i.e., the reflector unit disconnects the amplification link, retaining only the phase shift, and enters passive reflection mode) according to the specific Boolean value indication of the optimized mode switch sequence. Simultaneously, the shared phase shift circuit and the shared amplifier circuit strictly apply the calculated optimal phase deflection and power gain to the incident signal according to the sequence parameters. On the peripheral communication link, each mobile terminal transmits uplink wireless signals into space strictly according to the set optimized transmission power. The base station uses the optimized receive beamforming vector for spatial filtering, thereby simultaneously receiving and processing high-quality signals from both the direct link and the reflector enhancement link.

[0182] Step 600: During the pure energy harvesting phase, control the hybrid sub-connected reflector to harvest ambient energy, and control the mobile terminal to send data to the base station via a direct link.

[0183] Step 600 is described in detail below.

[0184] In step 600 of some embodiments, during the pure energy harvesting phase, the system controls the hybrid sub-connected reflector to harvest ambient energy and controls the mobile terminal to send data to the base station via a direct link. In this T-... During the subsequent time window, to maximize system energy accumulation for later use, the control unit issues a command to cause the hybrid sub-connector reflector to actively suspend all electromagnetic wave reflection and amplification auxiliary functions and enter a dormant power-off state. At this time, all mode switching switches are forcibly disconnected, the power consumption of the phase shift circuit and amplification circuit is reduced to zero, and the energy harvesting circuit of the reflector is dedicated to capturing energy from the natural or radio frequency environment and efficiently recharging it back to the battery unit. During this period, the mobile terminal does not interrupt the transmission of uplink data, but continues to transmit signals using the allocated transmit power; the base station synchronously switches to the corresponding other set of receive beamforming vectors. At this time, the entire communication system relies solely on the direct link affected by the natural fading of the physical environment to complete basic data transmission and reception.

[0185] Through steps 100 to 600 above, the method provided in this application achieves a dynamic balance between the long-term throughput target of the communication system and the causal constraints of self-powered energy. By acquiring global state characteristics and utilizing a pre-trained optimization model, the system cleverly transforms the high-dimensional, strongly coupled non-convex resource allocation problem into online fast intelligent inference and underlying convex optimization beamforming solution, greatly reducing the online computational overhead and decision latency of the base station. More importantly, relying on a cross-time-domain adaptive timing partitioning mechanism, coupled with independent control of the active / passive modes of the reflection unit in the underlying hybrid sub-connection architecture, the system can intelligently perform energy scheduling and interception across time frames. This method not only fundamentally eliminates the risk of equipment paralysis due to over-discharge when environmental energy is scarce, but also fully unleashes the hardware potential under stringent battery self-sufficiency conditions, thereby maximizing the long-term average transmission rate and communication reliability of the uplink multi-user communication network.

[0186] The hybrid sub-connected reflector, communication system, parameter optimization method, and related equipment provided in this application achieve a perfect balance between extremely low power consumption and high flexibility in physical hardware architecture. In existing technologies, hybrid reconfigurable smart surfaces (RIS) often face a dilemma: either a fully connected architecture leads to high hardware costs and extremely high static power consumption; or although power consumption is reduced to some extent through shared power amplifiers, the lack of a shared phase shifter mechanism and relatively fixed underlying operating modes result in poor beam control flexibility, and the large phase shifter circuit matrix still leads to high basic power consumption. In contrast, this application innovatively adopts a sub-array level "shared phase shifter circuit and shared amplifier circuit" underlying design, fundamentally and significantly reducing hardware layout costs and overall system static power consumption. More importantly, in a highly shared circuit architecture, this solution successfully endows each physical unit with the ability to dynamically switch between active and passive reflection modes by introducing an independent mode switching switch for each reflector. This hybrid architecture design, which combines "highly shared underlying layers with independent switching at the top layer," is not only more energy-efficient than existing technologies with traditional fixed modes, but also enables the system to flexibly adjust the reflector control strategy based on the real-time battery energy status, thereby achieving higher communication gain under extremely limited energy supply.

[0187] Furthermore, at the time-domain communication scheduling level, the adaptive auxiliary communication protocol proposed in this application completely breaks through the inherent limitations of fixed duration in traditional communication mechanisms. Existing RIS auxiliary communication technologies typically assume that the reflector remains continuously operational throughout the entire communication time frame. This static protocol logic is difficult to adapt to the randomness and fluctuation of environmental energy acquisition in self-powered scenarios. When energy is scarce, it can easily lead to the rapid depletion of the underlying battery power, causing a system crash, or, if the strategy is conservative, result in low overall energy utilization efficiency. To address this pain point, this solution breaks with convention by introducing the "auxiliary communication duration" within a single frame as a controllable and optimizable core variable into the communication protocol. This allows the system to intelligently and dynamically determine the specific duration for maintaining auxiliary communication within the current time frame based on the current remaining battery power and real-time channel status.

[0188] In summary, this system design, which deeply integrates the hybrid sub-connection hardware architecture of the physical layer with the cross-frame dynamic energy scheduling mechanism of the protocol layer, constructs a highly robust energy closed-loop management system for self-powered communication networks. This mechanism not only fundamentally eliminates the risk of equipment power outages caused by drastic fluctuations in environmental energy, but also significantly outperforms traditional fixed-duration operating protocols in the long-term average data throughput of the entire communication system under harsh energy-constrained scenarios. This greatly enhances the practical deployment value and network reliability of self-powered intelligent reflectors in complex areas without grid coverage.

[0189] This application also provides an electronic device, including: At least one memory; At least one processor; At least one program; The program is stored in a memory, and the processor executes the at least one program to implement the transmission parameter optimization method and transmission parameter optimization method of the communication system described above in this application. The electronic device can be any smart terminal, including mobile phones, tablets, personal digital assistants (PDAs), in-vehicle computers, etc.

[0190] Please see Figure 4 , Figure 4 The hardware structure of an electronic device according to another embodiment is illustrated. The electronic device includes: The processor 401 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application. The memory 402 can be implemented in the form of ROM (Read-Only Memory), static storage device, dynamic storage device, or RAM (Random Access Memory). The memory 402 can store the operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 402, and the processor 401 calls and executes the transmission parameter optimization method and transmission parameter optimization method of the communication system of the embodiments of this application. Input / output interface 403 is used to implement information input and output; The communication interface 404 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.). Bus 405 transmits information between various components of the device (e.g., processor 401, memory 402, input / output interface 403, and communication interface 404); The processor 401, memory 402, input / output interface 403 and communication interface 404 are connected to each other within the device via bus 405.

[0191] This application embodiment also provides a storage medium, which is a computer-readable storage medium, storing a computer program. When the computer program is executed by a processor, it implements the above-described method for optimizing transmission parameters of the communication system.

[0192] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0193] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.

[0194] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.

[0195] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0196] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.

[0197] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. The coupling or direct coupling or communication connection between the shown or discussed units may be through some interfaces, or indirect coupling or communication connection between the apparatus or units, and may be electrical, mechanical, or other forms.

[0198] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0199] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0200] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0201] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.

Claims

1. A hybrid sub-connected reflective surface, characterized in that, include: The system includes an energy harvesting circuit, a battery cell, and multiple subarrays. The energy harvesting circuit is connected to the battery cell and is used to harvest ambient energy and store it in the battery cell to maintain the operation of the multiple subarrays. Each of the subarrays includes a shared phase-shifting circuit, a shared amplification circuit, multiple reflection units, and a mode switching switch corresponding to each of the reflection units; In each of the subarrays, all the reflecting units are connected to the shared phase-shifting circuit. A corresponding mode switching switch is connected in series on the radio frequency branch where each reflecting unit is located. One end of the mode switching switch is connected to the reflecting unit or the shared phase-shifting circuit, and the other end of the mode switching switch is connected to the shared amplifier circuit. When the hybrid sub-connecting reflective surface receives a mode control signal, the hybrid sub-connecting reflective surface is used to control the corresponding mode switching switch to close or open based on the mode control signal, so as to control the corresponding reflective unit to switch between active reflection mode and passive reflection mode.

2. A communication system, characterized in that, include: A hybrid sub-connected reflective surface, as described in claim 1; A base station and multiple mobile terminals, wherein the base station communicates with each of the mobile terminals via the hybrid sub-connecting reflective surface.

3. A method for optimizing transmission parameters in a communication system, characterized in that, The communication system is as described in claim 2, and the method includes: The system acquires the first channel state information between each mobile terminal and the base station, the second channel state information between the hybrid sub-connection reflector and the base station, and the third channel state information between each mobile terminal and the hybrid sub-connection reflector in the current time frame; it also acquires the battery energy state information of the hybrid sub-connection reflector in the current time frame; and acquires the transmission parameter optimization model. The first channel state information, the second channel state information, the third channel state information, and the battery energy state information are input into the transmission parameter optimization model for data matching processing to obtain the output optimized mode switching sequence, optimized phase shift sequence, optimized amplification coefficient sequence, optimized transmission power, and optimized auxiliary communication duration. The optimized receive beamforming of the base station is obtained by solving the optimized transmit power, the first channel state information, the second channel state information, the third channel state information, the optimized phase shift sequence, the optimized amplification coefficient sequence, and the optimized mode switching sequence. Based on the optimized auxiliary communication duration, the current time frame is divided into an auxiliary communication phase and a pure energy harvesting phase; During the auxiliary communication phase, the opening and closing of the mode switching switch is controlled based on the optimized mode switching sequence, the shared phase shift circuit and the shared amplification circuit are controlled based on the optimized phase shift sequence and the optimized amplification coefficient sequence, and the mobile terminal is controlled to send data to the base station based on the optimized transmit power and the optimized receive beamforming. During the pure energy harvesting phase, the hybrid sub-connected reflector is controlled to harvest ambient energy, and the mobile terminal is controlled to send data to the base station via a direct link.

4. The method for optimizing transmission parameters of a communication system according to claim 3, characterized in that, The optimized model for obtaining transmission parameters includes: Based on the phase shift parameter sequence, amplification factor parameter sequence, and mode switching parameter sequence of the hybrid sub-connector reflector, as well as the transmission power parameter and beamforming parameter of each mobile terminal, a first decoding signal-to-noise ratio function in the auxiliary communication stage and a second decoding signal-to-noise ratio function in the pure energy harvesting stage are respectively constructed. Based on the transmission power parameters, the phase shift parameter sequence, the amplification factor parameter sequence, the mode switch parameter sequence, and the auxiliary communication duration, the battery energy update function of the hybrid sub-connected reflector is obtained; Based on the first decoding signal-to-noise ratio function, the second decoding signal-to-noise ratio function, and the auxiliary communication duration, a transmission parameter optimization problem is constructed with the goal of maximizing the long-term throughput of the system. Based on the beamforming parameters, the transmission parameter optimization problem is decoupled into a lower-level beamforming optimization sub-problem and an upper-level resource joint optimization sub-problem. An initial transmission parameter optimization model is constructed based on the joint optimization sub-problem of upper-layer resources and the battery energy update function, and the initial transmission parameter optimization model is trained in multiple rounds to obtain the transmission parameter optimization model.

5. The method for optimizing transmission parameters of a communication system according to claim 4, characterized in that, The process of obtaining the battery energy update function for the hybrid sub-connected reflector based on the transmitted power parameters, the phase shift parameter sequence, the amplification factor parameter sequence, the mode switching parameter sequence, and the auxiliary communication duration includes: Based on the static power consumption of the phase-shifting circuit and the static power consumption of the amplifier in the shared amplifier circuit, and combined with the sub-array activation state indicated by the mode switch parameter sequence, the total static circuit power consumption of the hybrid sub-connected reflector is calculated. Based on the amplifier conversion efficiency, the third channel state information, the transmit power parameters, the phase shift parameter sequence, the amplification factor parameter sequence, and the input noise variance, the dynamic amplification power consumption of the hybrid sub-connector reflector is calculated. Add the total static circuit power consumption to the dynamic amplification power consumption, and then multiply by the auxiliary communication duration to obtain the energy consumption item for the current time frame; The battery energy update function is obtained by performing boundary constraint calculations based on the remaining battery energy in the previous time frame, the environmental energy acquisition item in the current time frame, the energy consumption item, and the maximum battery capacity.

6. The method for optimizing transmission parameters of a communication system according to claim 4, characterized in that, The process of constructing a transmission parameter optimization problem based on the first decoding signal-to-noise ratio function, the second decoding signal-to-noise ratio function, and the auxiliary communication duration, with the goal of maximizing the long-term throughput of the system, includes: Calculate the energy consumption of the hybrid sub-connected reflector in the current time frame, obtain the current available battery energy, and construct energy causal constraints based on the numerical relationship between the energy consumption and the current available battery energy. Obtain the auxiliary communication duration and the total duration of the current time frame; based on the numerical relationship between the auxiliary communication duration and the current time frame, construct duration boundary constraints. Based on the first decoding signal-to-noise ratio function and the second decoding signal-to-noise ratio function, the total amount of data uploaded by a single user in the auxiliary communication stage and the pure energy harvesting stage are calculated respectively, and service quality constraints are constructed based on the numerical relationship between the total amount of data uploaded by a single user and the minimum service quality threshold. Obtain the maximum transmit power threshold, the maximum amplification threshold, and the discrete operating space. Based on the numerical relationship between the transmit power parameter and the maximum transmit power threshold, the numerical relationship between the amplification coefficient parameter sequence and the maximum amplification threshold, and the numerical relationship between the mode switch parameter sequence and the discrete operating space, construct the physical constraints of the device. Based on the first decoding signal-to-noise ratio function, the second decoding signal-to-noise ratio function, the auxiliary communication duration, the energy causal constraint, the duration boundary constraint, the quality of service constraint, and the device physical constraint, a transmission parameter optimization problem is constructed with the goal of maximizing the long-term throughput of the system.

7. The method for optimizing transmission parameters of a communication system according to claim 4, characterized in that, The construction of the initial transmission parameter optimization model based on the joint optimization sub-problem of the upper-layer resources and the battery energy update function includes: The state space is determined based on the first channel state information, the second channel state information, the third channel state information, and the energy state output by the battery energy update function; The action space is determined based on the amplification factor parameter sequence, the transmission power parameter, the auxiliary communication duration, the mode switch parameter sequence, and the phase shift parameter sequence. Based on the system's long-term throughput and service quality constraints in the aforementioned upper-layer resource joint optimization subproblem, a throughput reward function is constructed. Determine the initial commenter parameters of the commenter network, the initial first actor parameters of the first actor network, and the initial second actor parameters of the second actor network; Based on the state space, the action space, the throughput reward function, the initial commenter parameters, the initial first actor parameters, and the initial second actor parameters, the initial transmission parameter optimization model is obtained.

8. The method for optimizing transmission parameters of a communication system according to claim 7, characterized in that, The process of training the initial transmission parameter optimization model multiple times to obtain the transmission parameter optimization model includes: In each training time step, the current state information is input into the first actor network and the second actor network respectively, and the current discrete action and the current continuous action are sampled respectively. Based on the current discrete action and the current continuous action, the current beamforming vector is calculated in conjunction with the lower-level beamforming optimization sub-problem. The current discrete action, the current continuous action, and the current beamforming vector are applied to the simulation environment to obtain the current throughput reward calculated based on the throughput reward function, and to obtain the updated state information for the next time step. The transformation tuple containing the current state information, action information, reward information, and updated state information is stored in the experience replay pool; When the time step reaches the update cycle, a small batch of samples is extracted from the experience replay pool. Based on the advantage function, the cutoff function, and the policy entropy constraint function, gradient updates are performed on the initial commenter parameters, the initial first actor parameters, and the initial second actor parameters, respectively, until the model converges and the transmission parameter optimization model is output.

9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the method for optimizing transmission parameters of the communication system according to any one of claims 3 to 8.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the method for optimizing transmission parameters of the communication system according to any one of claims 3 to 8.