A multi-agent resource security game simulation and deduction method and system based on a large language model
By using a multi-agent resource security game simulation and deduction method based on a large language model, the problems of information processing lag and insufficient behavioral modeling in existing technologies are solved. This enables real-time monitoring and multi-dimensional strategy simulation of the iron ore supply chain, improving the response speed and scientific nature of resource security decisions.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INST OF MINERAL RESOURCES CHINESE ACAD OF GEOLOGICAL SCI
- Filing Date
- 2026-02-25
- Publication Date
- 2026-06-12
AI Technical Summary
Existing resource security analysis models are unable to process multi-source information in real time, simulate the game behavior of multi-party intelligent agents, lack multi-dimensional factor fusion analysis, and rely on experience to deduce strategies, making it difficult to cope with the dynamic and politicized challenges of the iron ore supply chain.
A multi-agent resource security game simulation and deduction method based on a large language model is adopted. By constructing a collaborative system of information perception, opponent analysis, self-analysis and comprehensive deduction, the system can realize real-time monitoring, in-depth analysis and strategy simulation of the security situation of the iron ore supply chain, including event deconstruction, behavior modeling and multi-dimensional cross-validation.
It enables the automatic collection and analysis of massive amounts of multi-source information, improves the speed and efficiency of resource security monitoring and response, generates more realistic projection results, and assists decision-makers in making scientific and comprehensive strategic judgments.
Smart Images

Figure CN122196401A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence simulation and decision support technology, specifically to a multi-agent resource security game simulation and deduction method and system based on a large language model. Background Technology
[0002] As a mature virtual simulation decision support tool, sand table exercises have been widely used in military, public policy, business, and crisis management fields. Their core value lies in enabling decision-makers to preview the possible outcomes of various strategies before taking action by building models, integrating data, and simulating environments, thereby minimizing risks and improving the quality of decision-making.
[0003] In terms of its development history, sand table exercises originated in the military field. They play a crucial role in military strategic planning and military exercises. By simulating combat scenarios and predicting the outcomes of different military actions, military leaders can better plan and execute strategies. This is vital for national security, as it helps reduce potential military risks and threats. Sand table exercises are also used in the medical field, particularly in epidemiological research. By simulating disease transmission and vaccination strategies, medical professionals can better understand the effectiveness of different control measures, thereby managing infectious disease outbreaks more effectively and minimizing disease transmission. In public policy, crisis management, and other fields, sand table exercises are used to simulate the effects of different response strategies, which helps to better prepare for and respond to various emergencies, ensuring public safety and well-being. In the field of mineral resources, research on electronic sand table exercises has reportedly yielded practical results abroad. For example, the 2005 US National Planning Scenario report showed that they had established a sophisticated system simulation platform capable of simulating and extrapolating various major events, helping decision-makers predict the effectiveness of emergency plans. The 2019 "Red Contagion" simulation exercise lasted eight months and involved multiple federal departments and agencies. It tested the ability to respond to emergencies and has been successfully applied to simulate rare earth supply chain disruptions. The National Mineral Information Center (NMIC) of the U.S. Geological Survey (USGS) tracks global production and trade data in real time and has built an interactive map and model of "net import dependence" and "supply chain fragility index".
[0004] Domestic strategic simulation technology is mainly used in the military and meteorological fields. In the resource field, it is still in the early stages of development. Existing systems are unable to conduct complex digital sand table simulations and cannot quickly provide decision-making basis.
[0005] In the field of resource security, particularly the security of the iron ore supply chain, existing analytical models are mostly based on static economic data and historical trends, making it difficult to cope with the increasingly dynamic and politicized international competitive environment. Current analytical methods have the following main shortcomings:
[0006] Information processing lag: Relying on manual collection and analysis of massive amounts of news, policy and market reports from multiple sources results in slow response times and difficulty in capturing instantaneous dynamics.
[0007] Lack of behavioral modeling: Traditional models struggle to simulate the complex strategic intentions and interactive behaviors of key stakeholders, such as resource-rich governments, multinational mining companies, and domestic regulatory agencies.
[0008] The extrapolation is limited to a single dimension: it often focuses on one aspect, such as economy or logistics, and lacks a comprehensive framework that integrates and analyzes multiple factors, such as geopolitics, industrial policy, corporate behavior, and market sentiment.
[0009] Strategy generation relies on experience: the deduced coping strategies depend heavily on the personal experience of experts and lack the ability to systematically and automatically evaluate and generate strategies based on massive historical events and theoretical frameworks.
[0010] Therefore, there is a need for an intelligent method that can process multi-source information in real time, simulate the game behavior of multi-party intelligent agents, and integrate theoretical guidance for dynamic strategy deduction, in order to address the security challenges faced by strategic resources such as iron ore, including concentrated supply, vulnerable transportation, and lack of pricing power. Summary of the Invention
[0011] To address the aforementioned issues, this invention proposes a multi-agent resource security game simulation and deduction method and system based on a large language model. By constructing a collaborative system comprising multiple agent modules including information perception, opponent analysis, self-analysis, and comprehensive deduction, it enables real-time monitoring, in-depth analysis, behavior prediction, and strategy simulation of the iron ore supply chain security situation.
[0012] To achieve the above objectives, the present invention adopts the following technical solution:
[0013] On the one hand: A simulation and deduction method for multi-agent resource security game based on a large language model, including the following steps:
[0014] S1. Building a behavioral knowledge base based on event deconstruction
[0015] S11. Continuously acquire unstructured text data from multiple sources, including those from the adversary and our target domain;
[0016] S12. An odd-round voting mechanism is used for LLM semantic filtering. An odd number of independent judgments are performed on the same text, and the text that is determined to be relevant by more than half of the judgments is vectorized.
[0017] S13. Use the recursive K-Means algorithm to perform dynamic clustering analysis on the vectorized text. Take all data points as the initial cluster, iteratively select the sub-cluster with the largest sum of squared errors for K-Means bisection, until the preset number of clusters is reached, and construct dynamic information databases for the adversary and our side respectively.
[0018] S14. Define the key strategic actions in the dynamic information database as "events" and deconstruct them into a "context-decision-outcome" triple structure;
[0019] S15. Based on step S14, construct an “event-triple” labeled dataset, and perform supervised fine-tuning on the pre-trained language model based on the dataset to enable it to have the ability to automatically deconstruct events.
[0020] S16. Define the event behavior triples experienced by a specific subject on the timeline as the subject's "historical behavior sequence". By deconstructing historical events, generate historical behavior triples and store them in the historical behavior database. At the same time, encode each triple into a semantic vector for similarity retrieval.
[0021] S2. Agent Behavior Modeling and Prediction Based on Historical Behavior Consistency
[0022] S21. Adversarial Modeling: Through the first intelligent agent, preliminary behaviors are extracted from the adversary's historical event chain, and logical consistency is checked and corrected based on its historical behavior database to generate an optimized adversary behavior sequence and adversary behavior prediction.
[0023] S22. Our Modeling: Through the second intelligent agent, a preliminary response strategy is extracted from our historical event chain. Based on our historical behavior database, the preliminary strategy is logically consistent and corrected to generate an optimized sequence of our behavior.
[0024] S3. Comprehensive Strategy Deduction Based on Multi-Dimensional Cross-Validation
[0025] Receive the output of S2 and execute it through a third agent:
[0026] S31. Event Dimension Evaluation: Using the first dedicated prompt word operator, evaluate the impact of current events, historically relevant events, and future predicted events on candidate inference strategies respectively;
[0027] S32. Strategy Dimension Evaluation: Using the second dedicated prompt word operator, evaluate the potential impact of candidate inference strategies and historically relevant strategies on the current event situation;
[0028] S33. Theoretical Anchoring: Retrieve relevant theoretical basis from the strategic theory database to constrain the direction of candidate strategy generation;
[0029] S34. Strategy Synthesis Output: Integrate the evaluation and anchoring results from steps S31 to S33, and generate a deduction report through comprehensive reasoning using a large language model;
[0030] The evaluation process in steps S31 and S32 adopts a proportional-integral control method, with the evaluation of the current event corresponding to the proportional control link and the evaluation of historical events corresponding to the integral control link.
[0031] Furthermore, in step S1, the discrimination instruction provided to the large language model explicitly constrains its output format to be a predefined structured tag pair: "###Yes###" and "###No###".
[0032] Furthermore, the recursive K-Means clustering method described in step S1 specifically includes the following steps:
[0033] a. Initial cluster formation: Treat all vectorized text data points as the same cluster;
[0034] b. Segmentation and Selection: The K-Means algorithm is used to segment the current cluster into two sub-clusters, and the sum of squared errors of each sub-cluster is calculated. The sub-cluster with the largest sum of squared errors is selected as the next cluster to be segmented.
[0035] c. Iteration: Repeat step b until the number of clusters meets the preset value.
[0036] Furthermore, in step S2, the historical event chain is constructed through the following steps: using a large language model to score and summarize the strategic relevance of relevant Chinese texts in the dynamic information database, and archiving the event summaries with scores higher than the threshold in chronological order.
[0037] 5. The simulation and deduction method for multi-agent resource security game based on a large language model according to claim 1, characterized in that, in step S2, the generation of the adversary's behavior sequence includes:
[0038] Event chain construction: Based on the current events of adversarial entities, the large language model is used to retrieve and extract related historical events from their historical event chains to form an event chain containing temporal and logical connections;
[0039] Behavior extraction: Using a large language model, extract the initial behavior objects taken by the adversarial entity from the event chain;
[0040] Behavior correction: The preliminary behavior object is compared with the behavior patterns recorded in the historical behavior database of the adversary entity. The logical consistency is evaluated by a large language model, and the parts that do not conform are corrected or re-inferred to generate optimized behavior objects as behavior sequences.
[0041] Furthermore, in step S2, the generation of our action sequence includes:
[0042] Strategy context extraction: Utilize a large language model to analyze the historical event chain of our side, capture the development context of events, and extract preliminary response strategy objects for the current situation;
[0043] Strategy Revision and Anchoring: The initial response strategy targets are compared with the strategy patterns recorded in the historical behavior database of the party and the pre-set strategic principles; the large language model is used to evaluate whether they are in line with the party's core interests and strategic continuity, and unreasonable parts are revised to generate optimized response strategy targets as the party's behavior sequence.
[0044] Strategy archiving: Update the optimized response strategy object to the historical behavior database.
[0045] Furthermore, for newly acquired text data, it is categorized into strategic themes within the dynamic information repository through the following steps:
[0046] Vectorize the text data into a feature vector;
[0047] Calculate the distance between the feature vector and the cluster center vectors corresponding to each strategic theme in the dynamic information database;
[0048] The text data is assigned to the strategic theme corresponding to the cluster center that is closest to its feature vector.
[0049] Furthermore, the event dimension evaluation in step S31 is performed through the first dedicated prompt word operator, whose inputs include: the current event object, historical related event objects, future predicted event objects, and candidate inference strategy objects;
[0050] The strategy dimension evaluation described in step S32 is performed through the second dedicated prompt word operator, whose inputs include: the current strategy object, historical related strategy objects, and the current event situation object.
[0051] On the other hand, there is a multi-agent resource security game simulation and deduction system based on the above method, including:
[0052] The information acquisition and processing module continuously acquires text data from multiple sources in the target domain and performs the following processing:
[0053] (a) An odd-round voting mechanism is used for semantic filtering of large language models. An odd number of independent judgments are performed on the same text, and the text that is judged as relevant more than half of the time is adopted.
[0054] (b) Vectorize the adopted text;
[0055] (c) The recursive K-Means algorithm is used to perform dynamic clustering analysis on vectorized text. All data points are used as the initial clusters. The sub-clusters with the largest sum of squared errors are selected iteratively for K-Means bisection until the preset number of clusters is reached. The dynamic information databases of the adversary and our side are constructed and updated respectively.
[0056] (d) Define key strategic actions in the dynamic information database as events, deconstruct the events into “context-decision-outcome” triples through a fine-tuned pre-trained language model, generate subject historical behavior triples and store them in the historical behavior database, and encode the triples into semantic vectors for similarity retrieval.
[0057] At least one adversary analysis agent is connected to the information collection and processing module, which is used to extract relevant information about the pre-set adversary from the basic information database, analyze and simulate the adversary's behavior patterns, and perform logical consistency verification and correction based on the adversary's historical behavior database, and output the adversary's behavior sequence and behavior prediction.
[0058] At least one of our analytical intelligent agents is connected to the information collection and processing module. It extracts relevant information about ourselves from the basic information database, analyzes and simulates our response behavior, performs logical consistency verification and correction based on our historical behavior database, and outputs our behavior sequence.
[0059] The integration module, connected to both the adversary analysis agent and the friendly analysis agent, is used to receive the behavior sequence, behavior prediction, and friendly behavior sequence, and includes:
[0060] The strategy analysis submodule is configured to perform event-dimensional evaluation based on the proportional-integral control concept through the first dedicated prompt word operator, and evaluate the impact of the current event, historical related events and future predicted events on the candidate strategy respectively;
[0061] The event evaluation submodule is configured to perform a strategy dimension evaluation based on the proportional-integral control concept through a second dedicated prompt word operator, and to evaluate the potential impact of candidate strategies and historical related strategies on the current event situation.
[0062] The theory anchoring submodule is configured to retrieve relevant theoretical basis from a pre-built strategic theory library and impose directional constraints on candidate strategies;
[0063] The strategy synthesis output submodule is configured to integrate the outputs of the strategy analysis submodule, the event evaluation submodule, and the theory anchoring submodule, and perform comprehensive reasoning through a large language model to generate a deduction report.
[0064] The information acquisition and processing module, the adversary analysis agent, the friendly analysis agent, and the integrated module communicate and exchange data through a predefined structured data interface. The data objects transmitted by the structured data interface include at least event objects, behavior objects, policy objects, and evaluation operator objects.
[0065] Furthermore, the resource security simulation is a simulation of the iron ore resource supply chain security. The opposing parties include government regulatory agencies and mining companies in major iron ore exporting countries, while the "our" parties include government agencies, industry associations, and mining companies in iron ore importing countries.
[0066] The beneficial effects of this invention are:
[0067] It enables the automatic collection, analysis, and simulation of massive amounts of multi-source information, greatly improving the speed and efficiency of resource security monitoring and response.
[0068] By simulating multi-party interactions in the real world through a multi-agent architecture, the simulation results are made closer to complex real-world games.
[0069] Generate strategic recommendations based on data, history, theory, and multi-dimensional assessment to assist decision-makers in making more scientific and comprehensive judgments.
[0070] The system architecture and intelligent agent design are universal and can be adapted to security simulation scenarios for other strategic resources, such as oil and rare earths, by changing the data source, theoretical library and simulation objects.
[0071] The present invention will be further described in detail below with reference to the accompanying drawings and specific implementation methods. Attached Figure Description
[0072] Figure 1 This is a flowchart of the method of the present invention;
[0073] Figure 2 This is a schematic flowchart illustrating the basic disk creation process of this invention;
[0074] Figure 3 This is a schematic diagram of the logical relationship of the adversary analysis module of the present invention;
[0075] Figure 4 This is a schematic diagram of the logical relationship of our analysis module in this invention;
[0076] Figure 5 This is a schematic diagram of the logical relationship of the comprehensive deduction module of the present invention;
[0077] Figure 6 This is a schematic diagram of the logical relationship of the strategy analysis submodule of the present invention;
[0078] Figure 7 This is a schematic diagram of the logical relationship between the event evaluation submodule of the present invention;
[0079] Figure 8 This is a schematic diagram of the logical relationship between the theoretical anchoring sub-modules of this invention;
[0080] Figure 9 This is a schematic diagram of the logical relationship between the strategy synthesis output submodules of this invention;
[0081] Figure 10 A schematic diagram of the multi-agent game simulation and deduction architecture of this invention. Detailed Implementation
[0082] Example 1: This embodiment is a simulation and deduction method for multi-agent resource security game based on a large language model, including the following steps:
[0083] S1. Building a Behavioral Knowledge Base Based on Event Deconstruction
[0084] Unstructured text data is continuously acquired from multiple preset information sources, including domains related to both the adversary and our side; semantic understanding and relevance filtering are performed on the text data using a large language model (LLM); an odd-round voting mechanism is adopted, performing an odd number of independent judgments on the same text, and if the text is judged to be relevant more than half of the times, it is adopted.
[0085] The adopted text is vectorized; the recursive K-Means algorithm is used to perform dynamic clustering analysis on the vectorized text, and dynamic information databases reflecting the evolution of strategic themes of the adversary and our side are constructed respectively. The recursive K-Means algorithm takes all data points as the initial cluster, iteratively selects the sub-cluster with the largest sum of squared errors for K-Means bisection, until the preset number of clusters is reached.
[0086] The recursive K-Means clustering method specifically includes the following steps:
[0087] a. Initial cluster formation: Treat all vectorized text data points as the same cluster;
[0088] b. Segmentation and Selection: The K-Means algorithm is used to segment the current cluster into two sub-clusters, and the sum of squared errors of each sub-cluster is calculated. The sub-cluster with the largest sum of squared errors is selected as the next cluster to be segmented.
[0089] c. Iteration: Repeat step b until the number of clusters meets the preset value.
[0090] The event is deconstructed into a "context-decision-outcome" triple structure. Context refers to the background, triggers, and evolution of the event; decision refers to the action strategies taken by the event's subject to cope with the context; and outcome refers to the final state of the event before it enters the next stage or transforms into other event forms. Each element of the triple is described in detail using natural language text, forming the basic unit of event behavior.
[0091] Key strategic actions in the dynamic information database are defined as "events." A pre-trained large language model is used to initially deconstruct events in unstructured texts such as news materials and historical archives, generating "context-decision-outcome" triples. An "event-triple" labeled dataset is constructed using the deconstruction results. Subsequently, a lightweight pre-trained language model is supervisedly fine-tuned based on this dataset to enable automated event deconstruction. The fine-tuned model is used to batch process historical event texts, outputting structured event behavior triples. An "event" refers to a key action or situational change initiated by the adversary or our own entity, with a clear timeline and strategic impact; it serves as the basic analytical unit for subsequent behavioral modeling.
[0092] A series of event behavior triplets experienced by a specific subject over a timeline are defined as the subject's "historical behavior sequence." The historical event material is deconstructed using the aforementioned fine-tuning model to generate the subject's historical behavior triplets, which are then stored in the historical behavior database. Simultaneously, each triplet is encoded into a semantic vector using a vectorization model for subsequent similarity retrieval.
[0093] S2. Agent Behavior Modeling and Prediction Based on Historical Behavior Consistency
[0094] Adversarial modeling: Through the first intelligent agent, preliminary behaviors are extracted from the adversary's historical event chain using a large language model, and the logical consistency of these preliminary behaviors is checked and corrected based on the historical behavior database to generate an optimized adversary behavior sequence; at the same time, adversary behavior predictions are generated based on the historical behavior database.
[0095] The steps for generating adversary behavior sequences are as follows:
[0096] Using a large language model, extract the initial behavior objects (O_ACTR) exhibited by the adversary in the current cycle from the adversary's historical event chain (O_EVENTH).
[0097] The preliminary behavioral object (O_ACTR) is compared with the behavioral patterns recorded in the opponent's historical behavioral database (O_ACTH);
[0098] Based on the comparison, the large language model is used to evaluate whether the preliminary behavior object conforms to the consistent strategic intent and behavioral logic of the adversary, and the parts that are evaluated as inconsistent, contradictory or abnormal are corrected or re-inferred.
[0099] The output is an optimized action object (O_ACT) that conforms to historical logical consistency after correction, which serves as the action sequence of the adversary.
[0100] Our modeling: Through the second intelligent agent, a large language model is used to extract preliminary response strategies from our historical event chain, and the logical consistency of the preliminary strategies is verified and corrected based on our historical behavior database to generate an optimized sequence of our behavior.
[0101] The steps to generate our action sequence are as follows:
[0102] Using a large language model, extract preliminary response strategy objects (S_ACTR) for the current situation from our historical event chain (S_EVENTH).
[0103] The initial response strategy object (S_ACTR) is compared with the strategy patterns and pre-set strategic principles recorded in our historical behavior database (S_ACTH);
[0104] Based on comparison, the large language model is used to evaluate whether the target of the preliminary response strategy is in line with our core interests, strategic resolve and policy continuity, and to revise or optimize the parts that are deemed unreasonable.
[0105] The output is an optimized response strategy object (S_ACT) that conforms to strategic consistency and effectiveness, and serves as the sequence of our actions.
[0106] The historical event chain is constructed through the following steps: using a large language model to score and summarize the strategic relevance of relevant Chinese texts in the dynamic information database, and archiving the event summaries with scores higher than the threshold in chronological order.
[0107] S3. Comprehensive Strategy Deduction Based on Multi-Dimensional Cross-Validation
[0108] Receive the output of S2 and execute the steps through the third agent:
[0109] S31. Event Dimension Evaluation: Using the first dedicated prompt word operator, evaluate the impact of the current event, historically relevant events, and future predicted events on the candidate inference strategy. Its inputs include: the current event object (O_EVENT, S_EVENT), the historically relevant event object (O_EVENTHS, S_EVENTHS), the future predicted event object (O_EVENTP, S_EVENTP), and the candidate inference strategy object (S_ACT).
[0110] S32. Strategy Dimension Evaluation: Using the second dedicated cue word operator, evaluate the potential impact of candidate inferred strategies and historically relevant strategies on the current event situation. Its inputs include: the current strategy object (O_ACT, S_ACT), historically relevant strategy objects (O_ACTHS, S_ACTHS), and the current event situation object (S_EVENT).
[0111] S33. Theoretical Anchoring: Retrieve relevant theoretical basis from a pre-constructed strategic theory library to constrain the generation direction of candidate inference strategies;
[0112] S34. Strategy Synthesis Output: Integrate the evaluation and anchoring results from steps S31 to S33, perform comprehensive reasoning through a large language model, and generate a deduction report;
[0113] The evaluation process in steps S31 and S32 adopts the proportional-integral (PI) control method, with the evaluation of the current event corresponding to the proportional control link and the evaluation of the historical event corresponding to the integral control link.
[0114] Example 2:
[0115] This embodiment is a multi-agent resource security game simulation and deduction system based on a large language model. The system includes:
[0116] The information acquisition and processing module is used to acquire text data from multi-source information covering both the adversary and our side, and perform the following processing: (a) Using an odd-round voting mechanism to perform semantic filtering on a large language model, performing an odd number of independent discriminations on the same text, and adopting the text that is judged as relevant more than half of the time; (b) Vectorizing the adopted text; (c) Using a recursive K-Means algorithm to perform dynamic clustering analysis on the vectorized text, taking all data points as the initial cluster, iteratively selecting the sub-cluster with the largest sum of squared errors for K-Means binary division until the preset number of clusters is reached, and constructing and updating the dynamic information databases of the adversary and our side respectively; (d) Defining key strategic actions in the dynamic information database as events, deconstructing events into "context-decision-outcome" triples through a finely tuned lightweight pre-trained language model, generating subject historical behavior triples and storing them in the historical behavior database, and encoding the triples into semantic vectors for similarity retrieval.
[0117] The adversary analysis agent, connected to the information acquisition and processing module, is used to extract information from the adversary's dynamic information database, extract the adversary's preliminary behavior through a large language model, and perform logical consistency verification and correction based on the adversary's historical behavior database, and output the adversary's behavior sequence and behavior prediction.
[0118] Our analytical agent, connected to the information acquisition and processing module, is used to extract information from our dynamic information database, extract our initial response strategy through a large language model, perform logical consistency verification and correction based on our historical behavior database, and output our behavior sequence.
[0119] The integrated inference module, connected to both the adversary's analytical agent and the player's analytical agent, is used to receive the behavior sequence, behavior prediction, and player's behavior sequence, and includes:
[0120] The strategy analysis submodule is configured to perform event-dimensional evaluation based on the proportional-integral control concept through the first dedicated prompt word operator, and evaluate the impact of the current event, historical related events and future predicted events on the candidate strategy respectively;
[0121] The event evaluation submodule is configured to perform a strategy dimension evaluation based on the proportional-integral control concept through a second dedicated prompt word operator, and to evaluate the potential impact of candidate strategies and historical related strategies on the current event situation.
[0122] The theory anchoring submodule is configured to retrieve relevant theoretical basis from a pre-built strategic theory library and impose directional constraints on candidate strategies;
[0123] The strategy synthesis output submodule is configured to integrate the outputs of the strategy analysis submodule, the event evaluation submodule, and the theory anchoring submodule, and perform comprehensive reasoning through a large language model to generate a deduction report.
[0124] The information acquisition and processing module, the adversary analysis agent, the friendly analysis agent, and the integrated module communicate and exchange data through a predefined structured data interface. The data objects transmitted by the structured data interface include at least event objects, behavior objects, policy objects, and evaluation operator objects.
[0125] Example 3
[0126] Based on the above embodiments, this embodiment takes iron ore resource security simulation as an example. For ease of reading, the code names and meanings of the intelligent agent behaviors in this embodiment are listed below:
[0127]
[0128] S1. Information Acquisition and Knowledge Base Construction
[0129] (1) Establish the basic framework of system perception
[0130] Information gathering for both our side and the opposing side: This can be achieved using the Python requests + BeautifulSoup library. Based on preset information sources (defined target websites), such as major news websites, official websites of major steel giants, and official websites of relevant government departments, all news articles within a preset time range can be continuously or periodically scraped to form a raw article database (A_DB) with dates, titles, and article content. Then, information filtering driven by LLM is used: using prompts, articles related to iron ore resources are filtered from the data in A_DB to form a core information database (I_DB).
[0131] In addition, regarding information gathering on our side, this includes forming the theoretical database THEO_DB based on our relevant policies and theoretical research. Through information from the iron ore resource-related articles I_DB, text vectorization technology was used to convert text into vectors, and then clustering algorithms were applied to classify the information, ultimately forming a basic framework with different descriptive perspectives, such as... Figure 2 As shown.
[0132] (2) Model filtering
[0133] The data filtering employed techniques including word segmentation models, self-attention mechanisms, and context prediction.
[0134] Word segmentation model: During training, each individual word unit is transformed into a unit. This forms a set of mapping rules:
[0135] ,
[0136] Under this set of rules, all word units are grouped together according to their respective meanings, where NN represents the mapping method for that meaning. Then, the text in the news information is segmented into multiple word unit vectors according to the rules in NN. The segmented vectors are then recombined to form the input matrix. After the matrix undergoes position encoding, it is then transformed by three linear transformations: Q, K, and V, to form the three matrices of Query, Key, and Value for a self-attention mechanism.
[0137] A large model is used to extract various features from news information. These features are then processed through multiple fully connected layers and residual connections to form an encoding result. For some models, the encoding result is the output; for others, it undergoes further processing through multiple multi-head attention mechanisms and fully connected layers before finally forming the output. The relationship between the output and input results can be analyzed using the conditional probability formula.
[0138] Pi = P(si|inputs,s1,s2,…,si−1),
[0139] LLM based on probabilistic models is prone to "illusion" in its generative responses. To reduce the "illusion" effect of LMM, the relevance judgment of news events will be repeated an odd number of times. If the relevance is judged more than half of the time, it will be adopted. Furthermore, the output format "###Yes###" and "###No###" will be specified to the LLM in the history of the dialogue to further increase the structured features of the answer.
[0140] (3) Text vectorization processing
[0141] Neural network-based text vectorization technology is based on word vectorization. The principle is that "words that are close in distance have similar meanings." Therefore, the text vectorization of iron ore news can be abstracted into the following steps:
[0142] a. Break down the article into words in a vector model;
[0143] b. The input vector formed by the vocabulary is processed by a neural network, and the result is a fixed-length vector representing the features of the entire article.
[0144] (4) Clustering mechanism
[0145] A fixed-length vector representing the full text features of the article is then used to calculate the cluster coordinates formed during the establishment of the basic framework. The category with the shortest distance is taken as the strategic key point of the article.
[0146] This embodiment uses the unsupervised recursive K-Means clustering method to cluster the vectorized news articles. The recursive K-Means algorithm starts with a cluster containing all data points, then continuously divides this cluster into two sub-clusters, and applies the K-Means algorithm to each sub-cluster. This process is repeated until the preset number of clusters is reached.
[0147] At the beginning of the algorithm, all data points are treated as a single cluster. This cluster is then split into two subclusters by the K-Means algorithm, while calculating the sum of squared errors for each subcluster. Next, the algorithm selects the subcluster with the largest SSE and splits it into two subclusters again. This process is repeated until the number of clusters reaches a preset value.
[0148] The expression for the sum of squared errors (SSE) is:
[0149]
[0150] in, It represents the number of clusters, where i represents the i-th cluster. This represents the vectorized text. Let represent the centroid of the i-th cluster. Each cluster is composed of randomly selected text vectors. During the iteration process, after forming the centroid, clusters are reconstructed around it. The advantages of this algorithm lie in its high accuracy and stability, its ability to handle large-scale datasets well, and its fast convergence speed.
[0151] S2. Behavioral Modeling of Both Sides Based on Historical Behavior Consistency
[0152] This section uses the term "game players" to refer to the two opposing sides—the adversary and our side.
[0153] like Figure 3 , Figure 4 As shown, this embodiment adopts a dual-channel symmetrical architecture to achieve a comprehensive understanding of both sides. Both sides of the symmetrical architecture include:
[0154] (1) Event Archiving Engine: Extract relevant content from crawled news, articles, and other information, and use the finely tuned LLM in S1 to transcribe this content into a triple structure of "context-decision-outcome" and store it in the historical event database. It also provides prediction results based on relevant historical events. The "decision" and "outcome" will be updated after a new decision is made and the event changes.
[0155] (2) Behavioral analysis and prediction engine: Based on its own knowledge, LLM makes preliminary decisions on the current situation, and then finds relevant historical events from historical events that are related to the situation and decisions, and obtains historical strategies.
[0156] The specific workflow is as follows:
[0157] I. Event Archiving
[0158] a. Obtain the articles of the game's main players (the opposing side's article O_GD1, or our side's article S_GD).
[0159] b. Filter out the content of the current game event from the articles of the game players.
[0160] c. Using the methods described in S14-S17, summarize and transcribe the game events into a triplet structure, namely O_EVENT (the opponent's current round event) or S_EVENT (our own current round event).
[0161] d. Historical events with a triple structure are stored in the historical event database, completing the archiving process. This forms historical events, either O_EVENTH (adversary's historical events) or S_EVENTH (our own historical events).
[0162] e. Combining the historical events of the game players with the current events of the game players, use LLM to predict the development of events, i.e., O_EVENTP (adversary event prediction) or S_EVENTP (our event prediction).
[0163] II. Behavioral Analysis and Prediction
[0164] The event archiving engine provides the current round events for the game players, namely O_EVENT (opponent's current round event) or S_EVENT (our current round event). In this phase, the following processing is performed on these two sets of operators:
[0165] a. Behavioral analysis and prediction engine: Without using the historical events of the game players (i.e., O_EVENTH (adversary's historical events) or S_EVENTH (our own historical events)), it initially fits the response strategy for this round based on the capabilities of LLM (adversary's response strategy for this round O_ACTR, or our response strategy for this round S_ACTR).
[0166] b. Based on the capabilities of LLM, evaluate the response strategy fitted in the previous step for this round and obtain the optimized response strategy for this round.
[0167] c. From the relevant historical events of the game players, focusing on the strategy part of the triple (the opponent's response strategy history O_ACTH or our response strategy history S_ACTH), LLM extracts the historical relevant strategies with optimal value (the opponent's historical relevant strategies O_ACTHS or our historical relevant strategies S_ACTHS).
[0168] S3. Comprehensive Strategy Deduction Based on Multi-Dimensional Cross-Validation
[0169] The adversary analysis module and our own analysis module have extracted the development context of events based on historical and current events, and proposed preliminary response strategies. To make the strategy specification process more comprehensive, the strategy analysis submodule will have the LLM evaluate the current strategy based on the event and its related history; then, based on previously used strategies, it will evaluate the impact of the current strategy on the development of events. This forms cross-validation and incorporates theoretical support, making the decision-making process more comprehensive, rather than solely relying on the generation capabilities of the language model itself.
[0170] Receive the output of S2 and execute the following steps through the third intelligent agent (integration module):
[0171] S31. Event Dimension Evaluation: Used to evaluate the impact of events on strategies, corresponding to the strategy analysis submodule, such as... Figure 5 , Figure 6As shown, the configuration is to perform event-dimensional evaluation based on the proportional-integral (PI) control concept through the first dedicated prompt word operator, evaluating the impact of the current event, historically relevant events, and future predicted events on the candidate strategy.
[0172] This example separates the evaluation of historical events and current events on the strategy. The output of this separate evaluation serves as a component of the prompt words in subsequent modules, becoming the operator prompt words. This embodiment has a first dedicated prompt word operator that can be implemented collaboratively by two prompt word operators: one for evaluating the impact of the current event on the current strategy (EVENT_ACT_EFP), and the other for evaluating the impact of related historical events and the current event on the current strategy (EVENT_ACT_EFI). The outputs of both operators together constitute the conclusion of the time-dimensional evaluation.
[0173] (1) The impact of the current event on the current strategy EVENT_ACT_EFP
[0174] Using the current events O_EVENT, S_EVENT, and our current strategy S_ACT as cue words, the LLM evaluates the impact of events currently occurring on both the adversary and our side on our response strategies.
[0175] (2) The impact of relevant historical events on the current strategy EVENT_ACT_EFI
[0176] Using relevant historical events O_EVENTHS from the adversary, relevant historical events S_EVENTHS from our side, and our current strategy S_ACT from our side as prompts, the LLM large model can evaluate the impact of historical events on our current response strategy.
[0177] (3) The impact of the event's development on the current strategy EVENT_ACT_EFD
[0178] Using the opponent's event development prediction O_EVENTP, our event development prediction S_EVENTP, and our current strategy S_ACT as prompts, the LLM large model can evaluate the development of events and their impact on our current response strategy.
[0179] S32. Strategy Dimension Evaluation: Used to evaluate the impact of strategies on events, corresponding to the event evaluation submodule, such as... Figure 5 , Figure 7 As shown, the potential impact of candidate inference strategies and historically relevant strategies on the current event situation is evaluated using the second dedicated cue word operator.
[0180] The operator can be implemented through the cooperation of the following sub-operators:
[0181] (1) Policy reaction evaluation sub-operator ACT_EVENT_EFP:
[0182] The impact of current response strategies on the current event. Its inputs include the opponent's current response strategy (O_ACT) and our current response strategy (S_ACT). This sub-operator is used to analyze the direct and potential impacts on the current event situation if this set of strategies is implemented.
[0183] (2) Historical strategy reference evaluation sub-operator ACT_EVENT_EFI:
[0184] The impact of historical response strategies on the current event. Its inputs include the historically relevant response strategies of both the adversary and ourselves (O_ACTHS, S_ACTHS), and our current event situation (S_EVENT). This sub-operator is used to evaluate the lessons or warnings that similar historical strategies can provide for handling the current event.
[0185] The outputs of the first and second dedicated prompt word operators are used together as key inputs to the subsequent policy synthesis steps, achieving cross-validation from both event and policy dimensions.
[0186] The evaluation process for S31 and S32 draws on the concept of "model-based predictive control" from automatic control theory. In some automatic control scenarios, the theoretical basis of closed-loop control heavily relies on the sampling frequency and accuracy of feedback data. However, actual feedback data often falls short of the ideal state—high-precision sensors are prone to introducing noise interference, which in turn affects control accuracy. To mitigate this problem, it is usually necessary to explicitly model the controlled object and then execute subsequent control commands based on the reliability of the model's predicted output.
[0187] Although this embodiment does not fall under the category of traditional automatic control, it faces similar challenges: the inherent lag in decision-making mechanisms and event responses during discrete game processes makes it difficult for game participants to effectively assess the rationality of their decisions at the moment of decision-making. Such problems are highly dependent on modeling methods.
[0188] To this end, this embodiment constructs an implicit "model" architecture consisting of a large language model and a historical event database. By using LLM to identify and represent the evolutionary patterns of "context," "decision," and "outcome" in historical events, it provides traceable historical evidence for current decisions, thereby enabling the prediction of decision rationality in the absence of explicit model support.
[0189] Based on this, the system continuously tracks and provides feedback on the events handled by the game players, dynamically revising the historical event database according to the actual evolution results, so that the event modeling continuously approaches the real situation. This mechanism has a dual advantage: on the one hand, it can respond quickly to sudden situations, ensuring the agility of the simulation; on the other hand, by accumulating and correcting historical experience, it can achieve in-depth optimization of the simulation process, ultimately generating a deduction strategy that is both agile and robust.
[0190] S33. Theoretical Anchoring: Retrieve relevant theoretical basis from a pre-constructed strategic theory library to constrain the generation direction of candidate inference strategies.
[0191] To ensure that decision-making aligns with the agent's core principles, in addition to considering past, present, and future factors, a special module is needed to reflect the agent's initial intentions. Therefore, a theoretical module is required to provide theoretical support for decision-making through prompts. Corresponding to the theory anchoring submodule, relevant historical events (S_EVENTHS), the agent's response strategy (S_ACT), and pre-stored theoretical terms (THEO) are used as prompts. This aims to allow the LLM to identify relevant theoretical support (THEOS) as the basis for further decision-making. Figure 5 , Figure 8 As shown.
[0192] S34. Strategy Synthesis Output: Integrate the evaluation and anchoring results from steps S31 to S33, perform comprehensive reasoning through a large language model, and generate a deduction report.
[0193] like Figure 5 , Figure 9 As shown, the strategy synthesis output submodule is configured to integrate the outputs of the strategy analysis submodule, event evaluation submodule, and theory anchoring submodule, and perform comprehensive reasoning through a large language model to generate a deduction report.
[0194] Example 4
[0195] Based on the above embodiments, this embodiment can combine several intelligent agents, each collecting information from its own and the opponent's information pools, analyzing each other's information, and proposing countermeasures based on their respective behaviors, thus forming a game test, such as... Figure 10 As shown.
Claims
1. A simulation and deduction method for multi-agent resource security game based on a large language model, characterized in that, Includes the following steps: S1. Building a Behavioral Knowledge Base Based on Event Deconstruction S11. Continuously acquire unstructured text data from multiple sources, including those from the adversary and our target domain; S12. An odd-round voting mechanism is used for LLM semantic filtering. An odd number of independent judgments are performed on the same text, and the text that is determined to be relevant by more than half of the judgments is vectorized. S13. Use the recursive K-Means algorithm to perform dynamic clustering analysis on the vectorized text. Take all data points as the initial cluster, iteratively select the sub-cluster with the largest sum of squared errors for K-Means bisection, until the preset number of clusters is reached, and construct dynamic information databases for the adversary and our side respectively. S14. Define the key strategic actions in the dynamic information database as "events" and deconstruct them into a "context-decision-outcome" triple structure; S15. Based on step S14, construct an "event-triple" labeled dataset, and perform supervised fine-tuning on the pre-trained language model based on the dataset to enable it to have the ability to automatically deconstruct events. S16. Define the event behavior triples experienced by a specific subject on the timeline as the subject's "historical behavior sequence". By deconstructing historical events, generate historical behavior triples and store them in the historical behavior database. At the same time, encode each triple into a semantic vector for similarity retrieval. S2. Agent Behavior Modeling and Prediction Based on Historical Behavior Consistency S21. Adversarial Modeling: Through the first intelligent agent, preliminary behaviors are extracted from the adversary's historical event chain, and logical consistency is checked and corrected based on its historical behavior database to generate an optimized adversary behavior sequence and adversary behavior prediction. S22. Our Modeling: Through the second intelligent agent, a preliminary response strategy is extracted from our historical event chain. Based on our historical behavior database, the preliminary strategy is logically consistent and corrected to generate an optimized sequence of our behavior. S3. Comprehensive Strategy Deduction Based on Multi-Dimensional Cross-Validation Receive the output of S2 and execute it through a third agent: S31. Event Dimension Evaluation: Using the first dedicated prompt word operator, evaluate the impact of current events, historically relevant events, and future predicted events on candidate inference strategies respectively; S32. Strategy Dimension Evaluation: Using the second dedicated prompt word operator, evaluate the potential impact of candidate inference strategies and historically relevant strategies on the current event situation; S33. Theoretical Anchoring: Retrieve relevant theoretical basis from the strategic theory database to constrain the direction of candidate strategy generation; S34. Strategy Synthesis Output: Integrate the evaluation and anchoring results from steps S31 to S33, and generate a deduction report through comprehensive reasoning using a large language model; The evaluation process in steps S31 and S32 adopts a proportional-integral control method, with the evaluation of the current event corresponding to the proportional control link and the evaluation of historical events corresponding to the integral control link.
2. The simulation and deduction method for multi-agent resource security game based on a large language model according to claim 1, characterized in that, In step S1, the discrimination instruction provided to the large language model explicitly constrains its output format to be a predefined structured tag pair: "###Yes###" and "###No###".
3. The simulation and deduction method for multi-agent resource security game based on a large language model according to claim 1, characterized in that, The recursive K-Means clustering method described in step S1 specifically includes the following steps: a. Initial cluster formation: Treat all vectorized text data points as the same cluster; b. Segmentation and Selection: The K-Means algorithm is used to segment the current cluster into two sub-clusters, and the sum of squared errors of each sub-cluster is calculated. The sub-cluster with the largest sum of squared errors is selected as the next cluster to be segmented. c. Iteration: Repeat step b until the number of clusters meets the preset value.
4. The simulation and deduction method for multi-agent resource security game based on a large language model according to claim 1, characterized in that, In step S2, the historical event chain is constructed through the following steps: using a large language model to score and summarize the strategic relevance of relevant Chinese texts in the dynamic information database, and archiving the event summaries with scores higher than the threshold in chronological order.
5. The simulation and deduction method for multi-agent resource security game based on a large language model according to claim 1, characterized in that, In step S2, the generation of the adversary's action sequence includes: Event chain construction: Based on the current events of adversarial entities, the large language model is used to retrieve and extract related historical events from their historical event chains to form an event chain containing temporal and logical connections; Behavior extraction: Using a large language model, extract the initial behavior objects taken by the adversarial entity from the event chain; Behavior correction: The preliminary behavior object is compared with the behavior patterns recorded in the historical behavior database of the adversary entity. The logical consistency is evaluated by a large language model, and the parts that do not conform are corrected or re-inferred to generate optimized behavior objects as behavior sequences.
6. The simulation and deduction method for multi-agent resource security game based on a large language model according to claim 1, characterized in that, In step S2, the generation of our action sequence includes: Strategy context extraction: Utilize a large language model to analyze the historical event chain of our side, capture the development context of events, and extract preliminary response strategy objects for the current situation; Strategy Revision and Anchoring: The initial response strategy targets are compared with the strategy patterns recorded in the historical behavior database of the party and the pre-set strategic principles; the large language model is used to evaluate whether they are in line with the party's core interests and strategic continuity, and unreasonable parts are revised to generate optimized response strategy targets as the party's behavior sequence. Strategy archiving: Update the optimized response strategy object to the historical behavior database.
7. The simulation and deduction method for multi-agent resource security game based on a large language model according to claim 1, characterized in that, For newly acquired text data, it is categorized into strategic themes within the dynamic information repository through the following steps: Vectorize the text data into a feature vector; Calculate the distance between the feature vector and the cluster center vectors corresponding to each strategic theme in the dynamic information database; The text data is assigned to the strategic theme corresponding to the cluster center that is closest to its feature vector.
8. The simulation and deduction method for multi-agent resource security game based on a large language model according to claim 1, characterized in that: The event dimension evaluation in step S31 is performed through the first dedicated prompt word operator, whose inputs include: current event object, historical related event object, future predicted event object, and candidate inference strategy object; The strategy dimension evaluation described in step S32 is performed through the second dedicated prompt word operator, whose inputs include: the current strategy object, historical related strategy objects, and the current event situation object.
9. A multi-agent resource security game simulation and deduction system for implementing the method of any one of claims 1 to 8, characterized in that, include: The information acquisition and processing module continuously acquires text data from multiple sources in the target domain and performs the following processing: (a) An odd-round voting mechanism is used for semantic filtering of large language models. An odd number of independent judgments are performed on the same text, and the text that is judged as relevant more than half of the time is adopted. (b) Vectorize the adopted text; (c) The recursive K-Means algorithm is used to perform dynamic clustering analysis on vectorized text. All data points are used as the initial clusters. The sub-clusters with the largest sum of squared errors are selected iteratively for K-Means bisection until the preset number of clusters is reached. The dynamic information databases of the adversary and our side are constructed and updated respectively. (d) Define key strategic actions in the dynamic information database as events, deconstruct the events into "context-decision-outcome" triples through a fine-tuned pre-trained language model, generate subject historical behavior triples and store them in the historical behavior database, and encode the triples into semantic vectors for similarity retrieval. At least one adversary analysis agent is connected to the information collection and processing module, which is used to extract relevant information about the pre-set adversary from the basic information database, analyze and simulate the adversary's behavior patterns, and perform logical consistency verification and correction based on the adversary's historical behavior database, and output the adversary's behavior sequence and behavior prediction. At least one of our analytical intelligent agents is connected to the information collection and processing module. It extracts relevant information about ourselves from the basic information database, analyzes and simulates our response behavior, performs logical consistency verification and correction based on our historical behavior database, and outputs our behavior sequence. The integration module, connected to both the adversary analysis agent and the friendly analysis agent, is used to receive the behavior sequence, behavior prediction, and friendly behavior sequence, and includes: The strategy analysis submodule is configured to perform event-dimensional evaluation based on the proportional-integral control concept through the first dedicated prompt word operator, and evaluate the impact of the current event, historical related events and future predicted events on the candidate strategy respectively; The event evaluation submodule is configured to perform a strategy dimension evaluation based on the proportional-integral control concept through a second dedicated prompt word operator, and to evaluate the potential impact of candidate strategies and historical related strategies on the current event situation. The theory anchoring submodule is configured to retrieve relevant theoretical basis from a pre-built strategic theory library and impose directional constraints on candidate strategies; The strategy synthesis output submodule is configured to integrate the outputs of the strategy analysis submodule, the event evaluation submodule, and the theory anchoring submodule, and perform comprehensive reasoning through a large language model to generate a deduction report. The information acquisition and processing module, the adversary analysis agent, the friendly analysis agent, and the integration module communicate and exchange data through a predefined structured data interface. The data objects transmitted by the structured data interface include at least event objects, behavior objects, strategy objects, and evaluation operator objects.
10. The method according to claim 1 or the system according to claim 9, characterized in that, The resource security game refers to the iron ore resource supply chain security game. The opposing parties include government regulatory agencies and mining companies in major iron ore exporting countries, while the "our" parties include government agencies, industry associations, and mining companies in iron ore importing countries.