A multi-stage optimal decision-making method based on local information target defense game
By employing a multi-stage decision-making method for local information target defense games, and combining the perception range and the influence of obstacles, the deployment location of defenders is optimized, solving the problem of deployment difficulties of existing strategies in complex environments and improving the capture success rate of defenders in real-world scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TONGJI UNIV
- Filing Date
- 2023-06-21
- Publication Date
- 2026-06-16
AI Technical Summary
Existing target defense game strategies are difficult to deploy in complex environments, cannot effectively take into account the perception range and the impact of obstacles, resulting in high computational costs and difficulty in applying them in real-world scenarios.
A multi-stage optimal decision-making method based on local information is adopted. The method involves cooperation between the target and the defender to share perception information, and is divided into a deployment stage, an asymmetric information stage, and a participation stage. By combining perception limitations and the influence of obstacles, the deployment position of the defender is optimized to increase the probability of winning.
It improves the success rate of the defender in capturing in complex environments, enhances the practical application effect of the defense strategy, and verifies the effectiveness of the decision through numerical examples.
Smart Images

Figure CN116797430B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the technical field of target defense games, and in particular to a multi-stage optimal decision-making method for target defense games based on local information. Background Technology
[0002] With the continuous development of artificial intelligence technology, more and more security-related tasks can be completed independently or collaboratively by robots, drones, and other technologies. More specifically, when the scenario involves detection, interception, or pursuit, it can often be described as a chase-and-escape game problem. Target defense game, as a variant of the chase-and-escape game, is a three-way game involving the target, the intruder, and the defender. Compared to the chase-and-escape game, the intruder, in addition to needing to evade the defender's pursuit, also hopes to break through the target. The defender aims to complete the capture task within a designated area before the intruder breaks through. The target can typically be modeled as a fixed point, a fixed area, or a dynamic player that can cooperate with the defender. Most existing target defense game strategies are based on the chase-and-escape game discussion, often mathematically abstracted as a differential game, and solved using the Hamilton-Jacobi-Bellman-Isaacs equation (HJI equation). However, the computational cost of the HJI equation increases exponentially with the problem's dimensionality, increasing the difficulty of solving it. Subsequently, adaptive dynamic programming and reinforcement learning have also begun to be used to solve this problem, but because the objective function does not contain an integral term, it remains very difficult to handle.
[0003] While existing solutions based on the concept of advantageous regions have addressed various problems across different scenarios, most discussions are somewhat limited, failing to comprehensively consider the impact of perception range and obstacles on target defense game theory problems. Existing solutions to target defense game theory problems are largely based on traditional pursuit-escape game methods, which suffer from limitations in research scenarios, simplistic deployment methods, and inability to be practically applied in complex environments. Furthermore, assuming that the advantageous region can be solved using an Apollonian circle or a Cartesian ellipse leads to insufficient problem dimensionality and overly idealistic approaches. Therefore, this paper comprehensively considers factors such as potential information acquisition limitations due to the influence of perception range and deployment methods more aligned with security-related problems when obstacles are present. Summary of the Invention
[0004] To address the shortcomings of existing technologies, the present invention aims to provide a multi-stage optimal decision-making method based on local information target defense game theory, which can increase the defender's advantageous angle for ensuring capture and improve the defender's probability of winning. To achieve the above-mentioned objective and other advantages of the present invention, a multi-stage optimal decision-making method based on local information target defense game theory is provided, comprising:
[0005] The target's perception range, the defender's perception range, and the intruder's perception range, wherein the defender shares the perception information acquired within the target's perception domain by cooperating with the target;
[0006] The game process includes the deployment phase, the asymmetric information phase, and the participation phase;
[0007] When the intruder is not within the target's perception range, and the defense value is not within the intruder's perception range, it is in the deployment phase. During the deployment phase, neither the intruder nor the defender can obtain information about the other due to perception limitations.
[0008] When the intruder is within the target's perception range, and the intruder cannot obtain information about the defender while the defender possesses all the information about the intruder, it is in the asymmetric information stage.
[0009] When both the defender and the intruder possess complete information about the opponent, they are in the participation phase. If the intruder can escape the target's perception domain, it can re-enter the deployment phase. If the intruder's area of advantage intersects with the target area, the intruder can successfully break through. If the defender arrives at the defensive position before reaching the key position or has already arrived in advance, the defender can definitely complete the capture.
[0010] Preferably, during the participation phase, the defender uses a strategy to lure the intruder deep enough, while the intruder's strategy is to determine whether the Apollonius circle generated by the intruder and the defender intersects with the target area at a critical moment. If it does, the intruder chooses to continue breaking through regardless of the defender's actions; if there is no intersection, the intruder immediately escapes; or if the defender cannot reach the optimal defensive position, the target area will be breached by the intruder, and the intruder can only decide to escape when the defender reaches the optimal defensive position.
[0011] Preferably, when in the asymmetric information stage, if the intruder does not sense the line segment obstacle during the intrusion process, it will continue to intrude radially until the next stage; if the intruder senses the existence of the obstacle during the intrusion process, the intruder does not make a direct judgment, but waits until it senses the endpoint of the obstacle to make a decision to turn to the endpoint, and after reaching the endpoint, it will reselect the radial direction to intrude until the next stage.
[0012] Preferably, when in the asymmetric information stage, the angle of the intruder at the start of the asymmetric stage is θ(t). asym The angle corresponding to the midpoint of the obstacle is θ. o The distance from the center of the obstacle to the target is The angle between the endpoint of the line segment obstacle and the center of the obstacle The relative angle between the intrusion angle and the obstacle center angle is Δθ = θ(t). asym )-θ o When relative angle The defender's optimal defensive position is at the same radial angle as the invasion angle; when the relative angle Δθ∈[-β, β], the defender's optimal deployment position moves to the radial angle of the obstacle endpoint closer to the invasion angle, which is consistent with the angle at which the intruder re-intrudes radially after bypassing the obstacle.
[0013] Preferably, when the relative angle Δθ∈[-β, β], and the intrusion angle and the defense angle are in different ranges, the intruder's intrusion method is different, namely: when Δθ=0, the intruder enters perpendicular to the center of the obstacle; when Δθ∈[0, β], the intruder intrudes to the left of the center of the obstacle; when Δθ∈[-β, 0], the intruder intrudes to the right of the center of the obstacle.
[0014] Preferably, during the deployment phase, since it is the first time the intruder has no information about any obstacles, the intruder chooses to randomly select an intrusion angle; the angle of intrusion is random, and the outcome of the game is probabilistic; the defender's goal is to find the optimal deployment radius and start the game from a position with an advantage.
[0015] Compared with existing technologies, the advantages of this invention are: it clarifies the critical value that needs to be deployed in advance to gain more advantages for defenders; it increases the complexity of the problem by introducing perception constraints and obstacles, making the studied problem closer to real-world problems and increasing the possibility of implementing multi-stage decision-making in reality; it studies the impact of the existence of obstacles on each stage of the game and the game outcome, proposes multi-stage decisions to ensure the defender's advantage, and verifies through numerical examples that the proposed decisions can more effectively defend. Attached Figure Description
[0016] Figure 1 This is a schematic diagram of a scenario involving a local information target defense game with line segment obstacles, based on the multi-stage optimal decision-making method of the local information target defense game according to the present invention.
[0017] Figure 2 A flowchart of the multi-stage optimal decision-making method based on local information target defense game according to the present invention;
[0018] Figure 3 A schematic diagram of the Apollonius circle for the multi-stage optimal decision-making method based on local information target defense game according to the present invention;
[0019] Figure 4 This is a positional relationship diagram at key moments of the multi-stage optimal decision-making method for target defense game based on local information according to the present invention;
[0020] Figure 5 This diagram illustrates the specific situation when an intruder invades vertically from the center of an obstacle using the multi-stage optimal decision-making method based on local information target defense game theory according to the present invention.
[0021] Figure 6 This is a key angle diagram for ensuring capture in the multi-stage optimal decision-making method based on local information target defense game according to the present invention;
[0022] Figure 7 This is a special location diagram corresponding to Note 2 of the multi-stage optimal decision-making method based on local information target defense game according to the present invention;
[0023] Figure 8 This is a snapshot of the multi-stage optimal decision-making method based on local information target defense game according to the present invention, without considering line segment obstacles;
[0024] Figure 9 This is a snapshot of the multi-stage optimal decision-making method based on local information target defense game according to the present invention when considering line segment obstacles. Detailed Implementation
[0025] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0026] Reference Figure 1-9 A multi-stage optimal decision-making method based on local information target defense game includes: the target's perception range, the defender's perception range and the intruder's perception range, wherein the defender shares the perception information obtained within the target's perception domain by cooperating with the target;
[0027] The game process includes the deployment phase, the asymmetric information phase, and the participation phase.
[0028] like Figure 1 As shown, the target is a circular region with radius r0, and the target's sensing range (TSR) is a region with width ρ. T The area is a ring-shaped region. The defender itself lacks sensory capabilities, but can cooperate with the target to share sensory information acquired within the target's sensory domain. The length of the linear obstacle is l. o The distance from the center of the obstacle to the center of the target is That is, with radius The circle is tangent to the intruder. The intruder's perception range is a radius of ρ. I A circular area.
[0029] The objective defense game problem under consideration involves a variety of parameters, and different parameters may correspond to different game scenarios. The selection of some parameters may even directly determine the outcome for one side. The following four assumptions are defined:
[0030] Assumption 1: Assume the speed ratio of the intruder to the defender is v, and v < 1.
[0031] Assumption 2: Assume that the parameter satisfies: ρ T <r0v.
[0032] Assumption 3: Assume the parameters satisfy:
[0033] Assumption 4: Assume the parameters satisfy:
[0034] Assumption 1 clarifies the type of game and guarantees the defender's chance of winning; Assumption 2 limits the size of the TSR (Tracking Zone), preventing the defender from always winning; Assumption 3 restricts some possible strategies of the intruder and also prevents the intruder from being unable to perceive the existence of the obstacle outside the TSR; Assumption 4 ensures that even if the intruder's radial direction towards the target center is towards the obstacle center, it can still perceive the obstacle's endpoints and formulate a strategy to change its intrusion direction. These four assumptions form the basis for the subsequent discussion of the game process. A reverse analysis is performed to develop a multi-stage decision-making method, and optimization of the initial settings is used to give the defender a greater winning advantage.
[0035] The participation phase begins at t eng The entry into the participation phase is marked by the intruder being within the TSR and the defender being within the intruder's perception range, which can be specifically described as:
[0036] r0<||x I (t)||<r0+ρ T (1)
[0037] ||x I (t)-x D (t)||<ρ I (2)
[0038] This requires using a method commonly used to describe the dominant region when the capture radius is zero: the Apollonius circle. Specifically, given two fixed points on a plane, the set of points whose distances to the two fixed points are in a constant ratio k constitutes the Apollonius circle, where k > 0 and k ≠ 0. A schematic diagram is shown below. Figure 3 As shown.
[0039] The game problem under consideration is similar to the definition of the Apollonius circle. Since there is no communication time delay, the time for both players is the same, and the speed ratio v of the two players can be regarded as the distance ratio l. Assume that the center of the Apollonius circle can be defined as x. A The radius can be defined as r A When the effects of obstacles are not considered, the Apollonius circle can be defined by the following formula:
[0040]
[0041]
[0042] Since the length of the obstacle is finite, through subsequent deployment positioning, capture must occur after the intruder bypasses the obstacle, and escape must also be based on sensing the defender after bypassing the obstacle. That is, the escape trajectory will not pass through the obstacle, but will be in the radial direction corresponding to a certain endpoint. The existence of the obstacle does not affect the judgment of the game outcome by the geometric relationship of the Apollonius circle. Judging the positional relationship between the Apollonius circle and the target and TSR boundary can provide several sufficient conditions for the participation phase, as follows:
[0043] 1) If the Apollonius circle intersects with the interior of the target.
[0044] ||x A ||-r A <r0, (5)
[0045] As long as assumptions 1-4 are satisfied, there must exist an intruder strategy that can break through the target;
[0046] 2) If any point in the Apollonius circle exists outside the TSR region and intersects with it:
[0047] ||x A ||-rA>r0+ρ T (6)
[0048] As long as assumptions 1-4 are satisfied, there must exist an intruder strategy that can escape the TSR and bring the game back to the deployment phase;
[0049] 3) If the Apollonius circle does not intersect with the target region:
[0050] ||x A ||-r A ≥r0, (7)
[0051] If Assumptions 1-4 are satisfied, there must exist a defender strategy that can prevent intrusion. Based on the above conditions, the following can be deduced:
[0052] Corollary 1: If the game parameters satisfy: There is always an intruder's strategy that allows them to either break through or escape;
[0053] Corollary 2: If the game parameters satisfy: There is always a defender strategy that can achieve capture.
[0054] In the asymmetric information phase, it begins at t asym The entry into the asymmetric information stage is marked by the intruder entering the TSR (Transformation and Relationship) phase. In this stage, the intruder cannot possess the defender's information, while the defender can share all of the intruder's information. Specifically, this can be described as:
[0055] r0≤||x I (t)||≤r0+ρ T (8)
[0056] ||x I (t)-x D (t)||>ρ I (9)
[0057] Since the intruder only knows the location of the target's center and has no information about the defender, the intruder's strategy can only be to invade radially from the target's center, minimizing the duration of the asymmetric information phase. If the intruder encounters an obstacle during the invasion, it must devise a strategy to reach one end of the obstacle before resuming radial invasion.
[0058] The intruder's strategy during the asymmetric information phase is as follows:
[0059] If no line segment obstacles are encountered during the intrusion, the intrusion continues radially until the next stage;
[0060] If the intruder senses the presence of an obstacle during the intrusion process, it does not turn directly. Instead, it waits until it senses the endpoint of the obstacle to make a decision on the endpoint. After reaching the endpoint, it reselects the radial direction to intrude until the next stage.
[0061] Assume that all coordinates involved are determined by the defender's position at the start of the asymmetric information phase. Specifically, the defender's position at t... asym The current position and direction are considered as the positive direction of the horizontal axis, and the distance of the defender from the origin at this moment is taken as the deployment radius R. dep The following constraints can be set in advance to generate constraints that are beneficial to subsequent calculations:
[0062] R dep <r0+ρ T -ρ I (10)
[0063] There exists a critical moment that satisfies the following conditions:
[0064] a) The Apollonius circle formed by the speed ratio of the intruder and the defender is tangent to the target boundary at this moment;
[0065] b) The intruder can sense the defender at this very moment;
[0066] c) The intruder and the defender are currently in the same radial direction.
[0067] Positional relationships at critical moments, such as Figure 4 As shown, if the intruder is not affected by obstacles throughout the intrusion process, the best position for the defender at this critical moment is... The intruder's corresponding location is The radii of the defender and the intruder can be represented as follows:
[0068]
[0069]
[0070] When encountering obstacles during an intrusion, what is the best defensive position for the defender? This will change. Assume the intruder's angle at the start of the asymmetric phase is θ(t). asym The angle corresponding to the midpoint of the obstacle is θ. o The distance from the center of the obstacle to the target is These are the angles between the endpoints of the line segment obstacle and the center of the obstacle. The relative angle between the intrusion angle and the obstacle center angle is Δθ = θ(t). asym )-θ o When relative angle The defender's optimal defensive position is at the same radial angle as the invasion angle; when the relative angle Δθ∈[-β, β], the defender's optimal deployment position moves to the radial angle of the obstacle endpoint closer to the invasion angle, which is consistent with the angle at which the intruder re-intrudes radially after bypassing the obstacle.
[0071] When the relative angle Δθ∈[-β, β], the intruder's intrusion method differs depending on the range of the intrusion angle and the defense angle. Specifically: when Δθ=0, the intruder enters perpendicular to the center of the obstacle; when Δθ∈[0, β], the intruder invades to the left of the center of the obstacle; and when Δθ∈[-β, 0], the intruder invades to the right of the center of the obstacle.
[0072] When the obstacle has no effect on the intrusion process, a quantity l can be defined. DThis indicates the furthest distance a defender can travel to reach the optimal defensive position just before the next phase begins.
[0073]
[0074] The initial angle of capture is ensured through multiple lemmas:
[0075] Lemma 1: When Corollaries 1-2 are satisfied, the parameters for determining the deployment location can be selected as follows:
[0076]
[0077] When |θ(t) asym )|≤Θ(R dep When )|, the defender can definitely complete the capture, where:
[0078]
[0079] When obstacles affect the intrusion process, the intruder's specific intrusion trajectory will change. Figure 5 This describes the changes in the optimal defense position when an intruder makes a vertical intrusion. In this case, the intruder's trajectory can be divided into three parts: x1 represents the length of the intruder's radial movement from the intrusion angle until it senses the endpoint of the line segment obstacle; x2 represents the length of the intruder's intrusion towards the endpoint after detection; and x3 represents the length of the intruder's journey from the endpoint to the new critical position corresponding to the endpoint. The change in the intruder's trajectory is l. D It can be changed to:
[0080]
[0081] Where x2=ρ I , The values of x2 and x3 do not change with Δθ, but are only related to the initial settings. x1 is related to Δθ, and its expression differs depending on whether it is a vertical intrusion or an intrusion to the left or right.
[0082] Let l1 represent the distance required from the optimal deployment radius to the optimal defense position when the intrusion angle is closer to the endpoint of the obstacle on the left, and l2 represent the situation on the right. l1 is always less than l2, and l1 and l2 can be expressed as follows:
[0083]
[0084]
[0085] Inference 3: Assuming that the line segment obstacle has no effect on the intrusion angle, i.e., |Δθ|>β, the optimal deployment radius should be:
[0086]
[0087] The maximum deployment angle that can guarantee the defender's victory is:
[0088]
[0089] The defender's winning probability can be described as:
[0090]
[0091] When assumption 2 is satisfied, P D <0.5, because cos -1 It is positive. However, if assumption 2 is not satisfied, then P D =1. The presence of obstacles will affect the optimal defense position, but since the length of obstacles is limited, the probability of encountering obstacles during the intrusion process is relatively small. Therefore, the optimal deployment position that does not consider line segment obstacles can be directly selected as the optimal deployment position when line segment obstacles exist.
[0092] Having determined the optimal deployment locations in the presence of obstacles, l1 and l2 are further optimized into l1′ and l2′:
[0093]
[0094]
[0095] Lemma 2: When intruding into an obstacle not perpendicular to its center, discuss l D The relationship between l', l1', and l2' yields two critical angles Δθ. l and Δθ r Located on either side of the obstacle on the line segment, the specific solution is as follows:
[0096]
[0097]
[0098] in, For i = 1, 2 holds true, where b1 and b2 are solutions for 24 and 25, respectively.
[0099] The key angle on the left is l D The different solutions corresponding to ′ in different ranges are as follows:
[0100]
[0101] When satisfied The key angle on the right is at l D The different solutions corresponding to ′ in different ranges are as follows:
[0102]
[0103] If not satisfied:
[0104]
[0105] Lemma 3: When intruding perpendicular to the center of an obstacle, discuss l D Similarly, the relationship between l', l1', and l2' can be used to obtain two critical angles Δθ. t and Δθ r Located on either side of the obstacle on the line segment, x1 has an analytical solution, which is solved as follows:
[0106]
[0107] The key angle on the left is l D The different solutions corresponding to ′ in different ranges are as follows:
[0108]
[0109] When satisfied The key angle on the right is at l D The different solutions corresponding to ′ in different ranges are as follows:
[0110]
[0111] If not satisfied:
[0112]
[0113] Theorem 1: Due to the presence of obstacles, the defender has an additional chance of winning outside the optimal deployment angle. The increased probability of winning for the defender can be described as follows, where Δθ t and Δθ r The possible values of are given in Lemma 2-3:
[0114]
[0115] Note 1: If There are two special positions that can directly increase the advantage angle, in which case the defender's winning probability is:
[0116]
[0117] Note 2: As stated in Lemma 2-3 and Note 1, when the condition is satisfied... and The dominant angle is continuous; if this condition is not met, the dominant angle is discontinuous.
[0118] Note 3: In the presence of obstacles, the intruder has the possibility of escaping from the TSR. However, since the game is not over, the relative position of the obstacles will not change. When the intruder invades again, they can do so from a position far away from the obstacles. In other words, the advantage that obstacles provide to the defender's winning probability is only effective once.
[0119] When the presence of line segment obstacles is not considered, the parameters are selected as follows: the radius of the target area is r0 = 1, the speed ratio of the intruder to the defender is v = 0.85, and the width of the target perception field is ρ. T =0.6, the intruder's perception radius is ρ l =0.1, invasion angle θ(t) asym = 0.77. Based on the description of the parameter relationships above, the optimal deployment radius can be obtained by selecting the above parameters. Maximum deployment angle An intrusion angle smaller than this angle ensures that the defender can complete the capture.
[0120] The number of devices and processing scale described herein are for the purpose of simplifying the description of the invention, and applications, modifications and variations thereof will be apparent to those skilled in the art.
[0121] Although embodiments of the present invention have been disclosed above, they are not limited to the applications listed in the specification and embodiments. They can be applied to various fields suitable for the present invention. For those skilled in the art, other modifications can be easily made. Therefore, without departing from the general concept defined by the claims and their equivalents, the present invention is not limited to the specific details and illustrations shown and described herein.
Claims
1. A multi-stage optimal decision-making method based on local information target defense game, characterized in that, include: The target's perception range, the defender's perception range, and the intruder's perception range, wherein the defender shares the perception information acquired within the target's perception domain by cooperating with the target; The game process includes the deployment phase, the asymmetric information phase, and the participation phase; When the intruder is not within the target's perception range and the defender is not within the intruder's perception range, it is in the deployment phase. During the deployment phase, neither the intruder nor the defender can obtain information about the other due to perception limitations. When the intruder is within the target's perception range, but the defender is not within the intruder's perception range, it is considered an asymmetric information stage. In this stage, the intruder cannot obtain information about the defender, while the defender possesses all the information about the intruder. When both the defender and the intruder have all the information about the opponent, they are in the participation phase. In this phase, if the intruder can escape the target's perception domain, it can re-enter the deployment phase. If the intruder's area of advantage intersects with the target area, the intruder can successfully break through. If the defender arrives at the key position or has already arrived at the defensive position in advance, the defender can definitely complete the capture. When in the asymmetric information stage, if the intruder cannot perceive the line segment obstacle during the intrusion process, it will continue to intrude radially until the next stage; when the intruder senses the existence of the obstacle during the intrusion process, the intruder does not turn directly, but waits until it senses the endpoint of the obstacle to make a decision to turn to the endpoint, and after reaching the endpoint, it reselects the radial direction to intrude until the next stage. When in the asymmetric information stage, the angle from which the intruder is located at the beginning of the asymmetric stage is: The angle corresponding to the midpoint of the obstacle is The distance from the center of the obstacle to the target is The angle between the endpoint of the line segment obstacle and the center of the obstacle. The relative angle between the intrusion angle and the obstacle center angle When relative angle Then the defender's optimal defensive position and the angle of invasion are at the same radial angle; when the relative angle Then the defender's optimal deployment position moves to the radial angle of the obstacle endpoint closer to the angle of invasion, which is consistent with the angle at which the intruder re-intrudes radially after bypassing the obstacle; When relative angle When the intrusion angle and the defense angle are at different ranges, the intruder's intrusion method will be different, namely: when At that time, the intruder invades perpendicular to the center of the obstacle; when At that time, the intruder entered from the left side of the center of the obstacle; when At that time, the intruder invaded from the right side of the center of the obstacle.
2. The multi-stage optimal decision-making method for target defense game based on local information as described in claim 1, characterized in that, During the engagement phase, the defender lures the intruder into the target's perception domain. Meanwhile, the intruder's strategy involves determining whether the Apollonius circles generated by the intruder and defender intersect with the target area at key moments. If they do, the intruder continues its breakthrough regardless of the defender's actions. If they do not intersect, the intruder immediately escapes. Alternatively, if the defender fails to reach the optimal defensive position, the target area will be breached by the intruder. The intruder can only decide to escape if the defender reaches the optimal defensive position.
3. The multi-stage optimal decision-making method for target defense game based on local information as described in claim 1, characterized in that, During the deployment phase, the intruder, being a first-time intruder, has no information about any obstacles and chooses to randomly select an intrusion angle. The angle of intrusion is random, and the outcome of the game is probabilistic. The defender's goal is to find the optimal deployment radius and begin the game from a position with an advantage.