Data generation method and system based on social simulation experiment, and electronic device
By conducting chain-of-thought reasoning analysis and social simulation experiments on the target research topic, the role profiles and behavioral constraints of the experimental subjects are determined, which solves the problem of insufficient authenticity of simulation data in social science research in existing technologies and realizes the generation of experimental data with strong research orientation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TIANJIN UNIV
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing deep research intelligent agent systems are unable to produce insightful and innovative research results in social science research, and existing social simulation technologies lack clear research guidance, causing the simulation process to deviate from the research objectives and the simulation data to lack authenticity.
By conducting chain-of-thought reasoning analysis on the target research topic, we determine the role profile, social behavioral constraints, and experimental path of the experimental subjects. We then use a pre-constructed behavioral dataset to retrieve target behavioral data that matches the role profile and simulate the execution of the experimental path under social behavioral constraints, generating reference data for scientific research writing.
Ensure that the simulation process closely follows the research topic, increase the authenticity and research orientation of the experimental data, and provide reference data that aligns with original research for subsequent scientific writing.
Smart Images

Figure CN122242303A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the fields of data processing and artificial intelligence technology, specifically to a data generation method, system, and electronic device based on social simulation experiments. Background Technology
[0002] With the breakthrough development of artificial intelligence technology, it has demonstrated remarkable capabilities in information integration and multi-agent collaboration. This potential is gradually permeating the field of academic research, giving rise to explorations in automated scientific research, data analysis, and even one-stop scientific paper writing. For example, existing deep research agent systems retrieve, extract, and integrate massive amounts of literature from the internet and local knowledge bases, ultimately generating academic texts. While this "retrieval-generation" mechanism demonstrates strong auxiliary capabilities in information-intensive tasks, quickly summarizing existing knowledge for researchers, it fails to produce insightful and innovative research results, thus failing to provide references for social science research.
[0003] In realizing the concept of this application, it was found that although some studies have attempted to simulate human social behavior and group dynamics in specific scenarios by constructing multi-agent virtual environments and pre-setting simple behavioral rules and interaction logic, thus providing quantifiable simulation data support for social science research, the lack of clear research guidance and memory mechanisms has led to the simulation process deviating from the research objectives and the simulation data lacking authenticity. Summary of the Invention
[0004] In view of this, this application provides a data generation method, system, and electronic device based on social simulation experiments.
[0005] The first aspect of this application provides a data generation method based on social simulation experiments, comprising: performing chain-of-thought reasoning analysis on the research path of a target research topic to obtain the target research path; when the target research path includes sub-paths that require the execution of social simulation experiments, performing reasoning analysis on the experimental design of the social simulation experiments according to a predetermined standardized experimental description protocol to determine the role profile, social behavioral constraints, and experimental path of the experimental subjects required to complete the social simulation experiments; retrieving target behavioral data matching the role profile from a pre-constructed behavioral dataset, and determining the target behavioral data as the initial memory of the experimental subjects; and using the experimental subjects to simulate the execution of the experimental path based on the initial memory under social behavioral constraints to generate experimental data, so as to use the experimental data as reference data for scientific research writing.
[0006] According to an embodiment of the present application, retrieving behavior data matching the role portrait from a pre-built behavior dataset includes: determining query information for querying role features based on the semantic information of the role portrait; screening multiple candidate behavior data matching the query information from the behavior dataset; randomly screening any one of the multiple candidate behavior data, and rewriting any one of the behavior data into target behavior data aligned with the target research topic.
[0007] According to an embodiment of the present application, a predetermined standardized experiment description protocol includes the subject type of the experimental subject, a list of attributes, environmental constraints, experimental purposes, and the time interval of experimental behaviors; according to the predetermined standardized experiment description protocol, reasoning and analyzing the experimental design of a social simulation experiment to determine the role portrait, social behavior constraints, and experimental path of the experimental subject required to complete the social simulation experiment, including: generating a role portrait of the subject type using a large language model based on the subject type, list of attributes, experimental purposes, and environmental constraints, where the role portrait includes target attributes matching the subject type and role behavior tendencies inferred based on the experimental purposes; constructing a constraint path generation instruction according to the role portrait, environmental constraints, experimental purposes, and the time interval of experimental behaviors; inputting the constraint path generation instruction into the large language model to output social behavior constraints and an experimental path matching the role portrait.
[0008] According to an embodiment of the present application, there are multiple experimental subjects, and the experimental path includes N sub-paths, where N is an integer greater than or equal to 2; the process of the experimental subject simulating the execution of the experimental path based on the initial memory under social behavior constraints includes: for each experimental subject: performing forgetting and activation processing on the nth historical memory based on the Ebbinghaus forgetting law to obtain the (n + 1)th target memory that is not forgotten and activated, where 0 < n ≤ N and n is an integer, and when n = 1, the historical memory is the initial memory; determining the (n + 1)th target memory as the prompt information for the action decision required to execute the (n + 1)th sub-path; in the case where it is determined that the experimental subject needs to interact with other experimental subjects, screening target experimental subjects whose distance from the experimental subject meets the preset conditions from other experimental subjects and interacting with them; generating the (n + 1)th action decision based on the prompt information, the interaction content obtained from the interaction, the attributes of the experimental subject, and the social behavior constraints, and simulating to obtain the (n + 1)th simulation result based on the (n + 1)th action decision; updating the attributes of the experimental subject based on the (n + 1)th simulation result and the (n + 1)th action decision, and determining the (n + 1)th simulation result and the (n + 1)th action decision as the (n + 1)th historical memory until the Nth simulation result is obtained.
[0009] According to an embodiment of this application, based on the Ebbinghaus forgetting curve, the nth historical memory is subjected to forgetting and activation processing to obtain the (n+1)th target memory that is not forgotten and is activated. This includes: calculating the basic memory strength based on the time interval between the nth and (n-1)th historical memories using a nonlinear fitting function constructed using the Ebbinghaus forgetting curve; determining the intensity increment for memory activation of the (n-1)th historical memory based on the semantic similarity between the nth and (n-1)th historical memories; determining the activated memory strength based on the intensity increment and the basic memory strength; determining the forgetting probability, which has a mapping relationship with the activated memory strength, as the probability that the nth historical memory is forgotten; and determining the (n+1)th target memory from the nth historical memory based on the probability that the nth historical memory is forgotten.
[0010] According to an embodiment of this application, the experimental data includes N simulation results and N action decisions corresponding to each experimental subject; the method further includes: analyzing the N simulation results and N action decisions corresponding to each experimental subject based on multiple agents instantiated from expert roles with different analytical perspectives, and obtaining analysis results; generating structured text based on the analysis results and a writing type that matches the target research topic.
[0011] According to an embodiment of this application, a chain-of-thought reasoning analysis is performed on the research path of the target research topic to obtain the target research path, including: inputting the target research topic into multiple agents to perform the i-th step of the reasoning task, and outputting their respective candidate sub-research paths, where i is an integer greater than 2; if all the candidate sub-research paths meet the predetermined evaluation conditions, the i-th sub-research path of the i-th step is determined from the multiple candidate sub-research paths based on a voting mechanism; based on the i-th sub-research path, the multiple agents again perform the (i+1)-th step of the reasoning task and determine the (i+1)-th sub-research path of the (i+1)-th step; until the semantic similarity between the (i+1)-th sub-research path and the i-th sub-research path meets the threshold, the execution of the reasoning task is stopped, and the target research path is obtained.
[0012] According to embodiments of this application, the predetermined evaluation conditions include at least one of the following: topic relevance condition, adaptability condition, practicality condition, and coherence of thought condition; the topic relevance condition includes that the semantic relevance between the candidate sub-research path and the target research topic is higher than a preset relevance threshold; the adaptability condition includes that the adaptability of the research methods and research content targeted by the candidate sub-research path is higher than a preset adaptability threshold; the practicality condition includes that the practicality score of the research content targeted by the candidate sub-research path is higher than a preset practicality threshold; and the coherence of thought condition includes that the degree of coherence between the candidate sub-research path and the sub-research path determined by historical reasoning is higher than a preset coherence threshold.
[0013] The second aspect of this application provides a data generation system based on social simulation experiments, comprising: a first reasoning module for performing chain-of-thought reasoning analysis on the research path of a target research topic to obtain the target research path; a second reasoning module for performing reasoning analysis on the experimental design of the social simulation experiment according to a predetermined standardized experimental description protocol, when the target research path includes sub-paths that require the execution of a social simulation experiment, to determine the role profile, social behavioral constraints, and experimental path of the experimental subject required to complete the social simulation experiment; a determination module for retrieving target behavioral data matching the role profile from a pre-constructed behavioral dataset and determining the target behavioral data as the initial memory of the experimental subject; and a generation module for using the experimental subject to simulate the execution of the experimental path based on the initial memory under social behavioral constraints to generate experimental data, so that the experimental data can be used as reference data for scientific research writing.
[0014] A third aspect of this application provides an electronic device, comprising: one or more processors; and a memory for storing one or more computer programs, wherein the one or more processors execute the one or more computer programs to implement the aforementioned data generation method based on a social simulation experiment.
[0015] According to embodiments of this application, a sub-path sequence from research questions to experimental investigations can be obtained through chain-of-thinking reasoning analysis, enabling refined analysis of the target research path for the target research topic. When a sub-path requires the execution of a social simulation experiment, by determining the role profile, social behavioral constraints, and experimental path of the experimental subjects, clear identity characteristics, behavioral rules, and execution routes can be assigned to them. This at least partially solves the problem that existing social simulation technologies are prone to deviating from research objectives due to a lack of research orientation, ensuring that the simulation process always stays closely aligned with the research topic. Based on this, target behavioral data matching the role profile is retrieved from the behavioral dataset and identified as the initial memory of the experimental subjects. Through a retrieval enhancement mechanism, personalized prior knowledge related to the research topic is injected into each experimental subject, enabling them to possess a cognitive foundation matching their role before the simulation execution. This increases the authenticity and research orientation of the experimental data, providing more relevant reference data for subsequent scientific writing, thus facilitating original research. Attached Figure Description
[0016] The above and other objects, features and advantages of this application will become clearer from the following description of embodiments of this application with reference to the accompanying drawings.
[0017] Figure 1 The diagram illustrates an application scenario of a data generation method, system, and electronic device based on a social simulation experiment according to an embodiment of this application.
[0018] Figure 2A flowchart of a data generation method based on a social simulation experiment according to an embodiment of this application is shown.
[0019] Figure 3 A mapping table of memory strength and forgetting probability according to an embodiment of this application is shown.
[0020] Figure 4 An architecture diagram of a data generation method based on a social simulation experiment according to an embodiment of this application is shown.
[0021] Figure 5 A block diagram of a data generation system based on a social simulation experiment according to an embodiment of this application is shown.
[0022] Figure 6 A block diagram of an electronic device suitable for implementing a data generation method based on a social simulation experiment, according to an embodiment of this application, is shown. Detailed Implementation
[0023] The embodiments of this application will now be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of this application. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the embodiments of this application for ease of explanation. However, it will be apparent that one or more embodiments may be implemented without these specific details. Furthermore, descriptions of well-known structures and technologies are omitted in the following description to avoid unnecessarily obscuring the concepts of this application.
[0024] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of this application. The terms “comprising,” “including,” etc., as used herein indicate the presence of features, steps, operations, and / or components, but do not exclude the presence or addition of one or more other features, steps, operations, or components.
[0025] All terms used herein (including technical and scientific terms) have the meanings commonly understood by those skilled in the art, unless otherwise defined. It should be noted that the terms used herein are to be interpreted in a manner consistent with the context of this specification, and not in an idealized or overly rigid way.
[0026] When using expressions such as "at least one of A, B and C", they should generally be interpreted in accordance with the meaning that is commonly understood by those skilled in the art (e.g., "a system having at least one of A, B and C" should include, but is not limited to, a system having A alone, a system having B alone, a system having C alone, a system having A and B, a system having A and C, a system having B and C, and / or a system having A, B and C, etc.).
[0027] Figure 1The diagram illustrates an application scenario of a data generation method, system, and electronic device based on a social simulation experiment according to an embodiment of this application.
[0028] like Figure 1 As shown, application scenario 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 serves as a medium for providing a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired or wireless communication links or fiber optic cables.
[0029] Users can use the first terminal device 101, the second terminal device 102, and the third terminal device 103 to interact with the server 105 via the network 104 to receive or send messages, etc. Various communication client applications can be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social media platform software, etc. (for example only).
[0030] The first terminal device 101, the second terminal device 102, and the third terminal device 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, laptops, and desktop computers.
[0031] Server 105 can be a server that provides various services, such as a backend management server that supports websites browsed by users using the first terminal device 101, the second terminal device 102, and the third terminal device 103 (this is just an example). The backend management server can analyze and process data such as received user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal devices.
[0032] It should be noted that the data generation method based on social simulation experiments provided in this application embodiment can generally be executed by server 105. Correspondingly, the data generation device based on social simulation experiments provided in this application embodiment can generally be located in server 105. The data generation method based on social simulation experiments provided in this application embodiment can also be executed by a server or server cluster that is different from server 105 and capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and / or server 105. Correspondingly, the data generation device for social simulation experiments provided in this application embodiment can also be located in a server or server cluster that is different from server 105 and capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and / or server 105.
[0033] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0034] The following will be through Figure 2 The data generation method based on social simulation experiments in this application is described in detail.
[0035] Figure 2 A flowchart of a data generation method based on a social simulation experiment according to an embodiment of this application is shown.
[0036] like Figure 2 As shown, the data generation method 200 based on social simulation experiments in this embodiment includes steps S210 to S240.
[0037] In step S210, the research path of the target research topic is analyzed by a chain of reasoning to obtain the target research path.
[0038] Chain-of-thinking reasoning analysis can be a process of breaking down complex problems into multiple intermediate thinking steps, and then gradually deriving a complete research path. The target research path can be a complete research process composed of sub-tasks in the order of execution.
[0039] For example, based on the target research topic input by the user, the target research path can be derived step by step from the problem to be solved or the research topic direction.
[0040] In step S220, if the target research path includes a sub-path that requires the execution of a social simulation experiment, the experimental design of the social simulation experiment is analyzed by reasoning according to a predetermined standardized experimental description protocol to determine the role profiles, social behavioral constraints, and experimental paths of the experimental subjects required to complete the social simulation experiment.
[0041] For example, according to a predetermined standardized experimental description protocol, the various elements of a social simulation experiment can be systematically planned, and the three types of elements required to complete the experiment can be determined through reasoning analysis: the role profile of the experimental subject, that is, a comprehensive description of the identity characteristics, behavioral tendencies and attributes of the intelligent agents participating in the simulation; social behavioral constraints, that is, the behavioral rules and boundary conditions that the experimental subject must follow during the simulation; and the experimental path, that is, the sequence of target nodes that the experimental subject must execute in sequence during the simulation.
[0042] In step S230, target behavioral data matching the character profile is retrieved from the pre-constructed behavioral dataset, and the target behavioral data is identified as the initial memory of the experimental subject.
[0043] From a pre-built structured database containing a large number of real or simulated behavioral records, specific target behavioral data that matches the characteristics of the character profile can be filtered out through semantic matching or feature similarity calculation, and this data can be identified as the initial memory that the experimental subject possesses before the simulation starts.
[0044] In step S240, the experimental subjects simulate the process of executing the experimental path based on their initial memories under social behavioral constraints, generating experimental data so that the experimental data can be used as reference data for scientific research writing.
[0045] Within the boundaries of social behavioral constraints, and based on initial memories as a cognitive starting point, experimental subjects sequentially complete the experimental path in a virtual environment according to preset rules, generating qualitative and quantitative results during the simulation process, including but not limited to the action decisions, state changes, and interaction records of each experimental subject.
[0046] Scientific writing can include, but is not limited to, academic papers and research reports.
[0047] According to embodiments of this application, a sub-path sequence from research questions to experimental investigations can be obtained through chain-of-thinking reasoning analysis, enabling refined analysis of the target research path for the target research topic. When a sub-path requires the execution of a social simulation experiment, by determining the role profile, social behavioral constraints, and experimental path of the experimental subjects, clear identity characteristics, behavioral rules, and execution routes can be assigned to them. This at least partially solves the problem that existing social simulation technologies are prone to deviating from research objectives due to a lack of research orientation, ensuring that the simulation process always stays closely aligned with the research topic. Based on this, target behavioral data matching the role profile is retrieved from the behavioral dataset and identified as the initial memory of the experimental subjects. Through a retrieval enhancement mechanism, personalized prior knowledge related to the research topic is injected into each experimental subject, enabling them to possess a cognitive foundation matching their role before the simulation execution. This increases the authenticity and research orientation of the experimental data, providing more relevant reference data for subsequent scientific writing, thus facilitating original research.
[0048] In the embodiments of this application, when performing such Figure 2 In step S210, the data generation method based on social simulation experiments further includes: inputting the target research topic into multiple agents to perform the i-th step of the reasoning task, and outputting their respective candidate sub-research paths, where i is an integer greater than 2; if all candidate sub-research paths meet predetermined evaluation conditions, determining the i-th sub-research path of the i-th step from the multiple candidate sub-research paths based on a voting mechanism; based on the i-th sub-research path, multiple agents again perform the (i+1)-th step of the reasoning task and determine the (i+1)-th sub-research path of the (i+1)-th step; until the semantic similarity between the (i+1)-th sub-research path and the i-th sub-research path meets a threshold, stopping the execution of the reasoning task and obtaining the target research path.
[0049] The target research topic is input into multiple agents. During the i-th step of the reasoning task, each agent outputs its corresponding candidate sub-research path. Each candidate sub-research path can contain n candidate reasoning nodes, denoted as ni. , where i is an integer greater than 2.
[0050] It can be done through a scoring function Each candidate inference node is evaluated, and the evaluation results are compared with a threshold. Compare the results; if the evaluation result is below the threshold... Then the agent regenerates candidate inference nodes; if the evaluation result is not lower than the threshold... Then save the candidate inference node and record it as... .
[0051] The voting mechanism can be defined by a consensus evaluation function. Used to quantify candidate inference nodes The level of consensus, from The optimal node is selected from among them. Specifically, the consensus evaluation function... As shown in equation (1):
[0052] (1),
[0053] Among them, the function Used to evaluate two candidate inference nodes and Alignment between them.
[0054] For example, the alignment between two candidate inference nodes can be measured using semantic similarity. For instance, the text content of each candidate inference node can be converted into a semantic vector, the cosine similarity between any two node vectors can be calculated, and the average or sum of the similarities between a node and all other nodes can be used as the alignment of that node.
[0055] In each step, the candidate inference node with the highest alignment is found as the optimal inference node by maximizing the objective function. For example, in step... In this process, the candidate inference node with the highest alignment is found by maximizing the objective function, thus obtaining the optimal inference node. The process is shown in equation (2) below:
[0056] (2),
[0057] in, Let be the set of candidate inference nodes retained after initial screening in step i.
[0058] This will contain the optimal inference nodes. The i-th sub-research path is input to multiple agents, which then perform the (i+1)-th step of the reasoning task to obtain the (i+1)-th sub-research path. This reasoning task is repeated multiple times until the semantic similarity between the (i+1)-th and i-th sub-research paths meets a threshold. At this point, the reasoning task is stopped, and the target research path is obtained.
[0059] This application does not specify a particular method for determining whether the semantic similarity between the (i+1)th sub-research path and the ith sub-research path meets a threshold. For example, the content output by the agent can be input into a large model, and a prompt word can be assigned to the large model. The prompt word could be: "You are a decision-making model specifically designed to determine whether reasoning in a thought chain should terminate. Task: Based on the user's original question and the currently completed reasoning steps, determine whether the reasoning has reached a final answer, whether it is complete enough, and whether further thinking is unnecessary. If the semantic similarity between the last step of the reasoning and the previous steps is high, then the reasoning is considered to have ended. User question: {question}. Completed reasoning steps: {cot_steps}. The output requirements are as follows, strictly adhering to the JSON format: 1. IsEnd: boolean type, returns True if the reasoning has ended, otherwise returns False."
[0060] Multiple agents reason in parallel on the research topic, determining the optimal path for the current step through evaluation and voting, and iterating until the semantic similarity of adjacent steps reaches a certain threshold, thus forming the target path. This divergent reasoning cultivates the agents' ability to generate original experiential data, reduces reliance on existing knowledge, and expands the exploration space for automated scientific research task planning.
[0061] In the embodiments of this application, for example, Figure 2 The step S210 shown, the data generation method based on social simulation experiments, further includes: predetermined evaluation conditions including at least one of the following: topic association condition, suitability condition, practicality condition, and coherence of thought condition; the topic association condition includes that the semantic association between the candidate sub-research path and the target research topic is higher than a preset association threshold; the suitability condition includes that the suitability of the research methods and research content targeted by the candidate sub-research path is higher than a preset suitability threshold; the practicality condition includes that the practicality score of the research content targeted by the candidate sub-research path is higher than a preset practicality threshold; the coherence of thought condition includes that the degree of coherence between the candidate sub-research path and the sub-research path determined by historical reasoning is higher than a preset coherence threshold.
[0062] Pre-defined evaluation criteria can be obtained through a scoring function. Evaluation dimensions when evaluating each candidate inference node.
[0063] The topic association condition can be determined by comparing the text representation of the candidate sub-research path with the text representation of the target research topic through a semantic similarity calculation model. The higher the similarity, the more the candidate path fits the topic, and the higher the semantic association score. Only when the semantic association score is higher than the preset association threshold is the candidate sub-research path determined to meet the topic association condition.
[0064] The suitability criteria can be established by constructing a research method-content suitability assessment model based on a semantic understanding of the research method classification and research content characteristics targeted by the candidate sub-research paths. The suitability result is calculated, and only when the suitability result is higher than a preset suitability threshold is the candidate sub-research path deemed to meet the suitability criteria. For example, for research content that needs to explore the mechanisms of social dynamic evolution, the multi-agent social simulation method has a higher suitability than a simple literature review.
[0065] Practicality criteria can include social value and application potential. A candidate sub-research path is deemed to meet practicality criteria only when the practicality score of the research content it targets is higher than a preset practicality threshold.
[0066] The coherence of thought condition can calculate the semantic similarity between the text representation of the current candidate sub-research path and the sub-research path selected in the previous step or multiple steps. A high semantic similarity indicates a high degree of coherence. Only when the degree of coherence is higher than the preset coherence threshold is the candidate sub-research path determined to meet the coherence of thought condition.
[0067] By introducing multi-dimensional evaluation conditions with thresholds, a mechanism for actively screening candidate sub-research paths was constructed. This mechanism can effectively filter low-quality, irrelevant, or logically disjointed candidate paths while maintaining the breadth of divergent exploration, thereby improving the relevance and reliability of task planning results.
[0068] In the embodiments of this application, the predetermined standardized experimental description protocol includes the subject type, attribute list, environmental constraints, experimental purpose, and experimental behavior interval duration of the experimental subject; for example, Figure 2 Step S220, as shown, involves reasoning and analyzing the experimental design of a social simulation experiment according to a predetermined standardized experimental description protocol. This process determines the role profiles, social behavioral constraints, and experimental paths required to complete the social simulation experiment. This includes: generating role profiles for the subject type based on the subject type, attribute list, experimental purpose, and environmental constraints using a large language model. The role profiles include target attributes matching the subject type and behavioral tendencies inferred from the experimental purpose. Based on the role profiles, environmental constraints, experimental purpose, and experimental behavior intervals, constraint path generation instructions are constructed. The constraint path generation instructions are then input into the large language model, which outputs social behavioral constraints and experimental paths that match the role profiles.
[0069] A predefined standardized experimental description protocol can consist of three parts: an Overview section that clearly defines the core research question of the social experiment, the type of experimental subjects, the experimental objective, and the boundaries of the simulation scenario; a Design Concepts section that systematically elaborates on the behavioral basis and attribute hierarchy of the experimental agents, bridging the Overview and Details sections; and a Details section that further details the specific aspects of the social simulation, specifying environmental constraints, attribute lists, and the duration of experimental behavior intervals. This predefined standardized experimental description protocol serves as the top-level specification for the experiment, ensuring a high degree of consistency between the social simulation process and the experimental design.
[0070] The Overview section can be used to determine the subject type and experimental objective. The Details section can be used to determine the attribute list, environmental constraints, and the duration of the experimental behavior intervals.
[0071] The subject type represents key social entities, primarily describing which social entities are needed to complete the simulation, and which roles or types of social entities are required. The attribute list represents the attributes of social entities. Based on the overview and design concepts defined by the predefined standardized experimental description protocol, a corresponding attribute list is output for each social entity. Each item in the attribute list represents an attribute, including its name and description. These attributes should be quantifiable, meaning their values can be represented by specific numerical values, and they should have a substantial impact on the social entity's operation in the simulator. The attribute list should be diverse, typically containing approximately 15 attributes that significantly affect the simulator's operation. Note that attributes refer to the attributes of individuals within the social simulator, not the attributes of the system as a whole. The experimental objective represents the modeling purpose, primarily outputting the final goal and final state of this social simulation, which may include, but is not limited to, the number of simulation rounds and the number of social entities reaching a specific state. Environmental constraints describe the specific social environmental limitations in the social simulator. Based on the overview and design concepts of the predefined standardized experimental description protocol, combined with the research topic and question-and-answer content from the dialogue history, a summary text is generated to characterize the social environmental constraints of the social simulator.
[0072] Based on the four types of information mentioned above—subject type, attribute list, experimental purpose, and environmental constraints—a structured character profile can be generated by using a large code model to create social entity classes for social simulation experiments. This profile includes not only target attributes corresponding to the subject type, such as an individual's social identity, resource endowment, and behavioral preferences, but also role behavioral tendencies inferred from the experimental purpose, such as cooperation tendency, risk preference, and decision-making strategies.
[0073] The experimental behavior interval can represent the interval between two actions of a social entity, outputting specific social time and providing a basis for the social entity's actions. The role profile, environmental constraints, experimental purpose, and experimental behavior interval are integrated into structured prompt text to construct constraint path generation instructions. These instructions, along with the prompt text, are input into a large language model to obtain social behavior constraints and experimental paths matching the role profile. Social behavior constraints can be the behavioral rules and boundary conditions that a social entity must follow during the simulation, typically divided into multiple levels according to the strength of the constraints, used to regulate the agent's decision space during the simulation. The experimental path can be the sequence of nodes that a social entity must execute sequentially during the simulation.
[0074] Based on subject type, attribute list, experimental purpose, and environmental constraints, a character profile matching the subject type is automatically generated, ensuring that each experimental subject has an initial setting highly aligned with the research theme before the simulation begins. Based on the character profile, environmental constraints, experimental purpose, and behavioral interval duration, constraint path generation instructions are constructed, and a large language model outputs multi-level social behavioral constraints and research-oriented experimental paths. This achieves automatic generation of social simulation experimental designs, improving the efficiency and standardization of experimental design, ensuring that the simulation process remains closely aligned with the research theme, and giving the generated experimental data a clear research orientation and the dynamism of real-world scenarios, providing a reliable foundation of original empirical data for subsequent structured writing.
[0075] In the embodiments of this application, for example, Figure 2 The step S230 shown retrieves behavioral data matching the character profile from the pre-constructed behavioral dataset, including: determining query information for querying character features based on the semantic information of the character profile; filtering multiple candidate behavioral data that match the query information from the behavioral dataset; randomly selecting any behavioral data from the multiple candidate behavioral data and rewriting any behavioral data into target behavioral data aligned with the target research topic.
[0076] The pre-built behavioral dataset can be based on six first-level disciplines: economics, political science, law, sociology, psychology, and history. These disciplines are further subdivided into representative second-level disciplines, systematically constructing a multi-dimensional and scalable social behavior dataset containing nearly 300,000 records.
[0077] The character profile is represented semantically as... , The semantic vectorization of the k-th character's image is represented by a semantic embedding model, which maps the semantic vectorization to a semantic query vector. , A mapping function represented as a semantic embedding model to obtain a vectorized representation of the role features that can be used for queries; assuming the domain behavior dataset is , denote the domain behavior data corresponding to the k-th role portrait, and use as the query vector, and retrieve data from through cosine similarity, filter out multiple behavior data that are semantically most similar to the role portrait, and select the top entries in descending order of similarity to form the candidate behavior dataset ; randomly select a piece of behavior data from , input it together with the research topic into the large language model, and generate, by the model abstraction, target behavior data that matches the role portrait and aligns with the research topic as the initial memory of the experimental subject , that is , denote the generation mapping function of the large language model LLM.
[0078] By constructing a "retrieval-random screening-rewriting alignment" mechanism, initial memories with diversity, authenticity, and research orientation are injected into the experimental subject, which at least partially solves the problems of the lack of initial memories of agents, disconnection from the research topic, and insufficient individual differences in traditional social simulations, and lays a foundation for the authenticity, diversity, and research orientation of subsequent simulation experiments.
[0079] In the embodiments of the present application, for step S240 as shown in Figure 2 , there are multiple experimental subjects, and the experimental path includes N sub-paths, where N is an integer greater than or equal to 2; the process of the experimental subject simulating the execution of the experimental path based on the initial memory under social behavior constraints includes: for each experimental subject: based on the Ebbinghaus forgetting law, perform forgetting and activation processing on the n-th historical memory to obtain the (n + 1)-th target memory that is not forgotten and is activated, where 0 < n ≤ N and n is an integer, and when n = 1, the historical memory is the initial memory; determine the (n + 1)-th target memory as the prompt information for the action decision required to execute the (n + 1)-th sub-path; in the case where it is determined that the experimental subject needs to interact with other experimental subjects, screen out target experimental subjects that satisfy the preset conditions from other experimental subjects and interact with them; generate the (n + 1)-th action decision according to the prompt information, the interaction content obtained from the interaction, the attributes of the experimental subject, and the social behavior constraints, and based on the (n + 1)-th action decision, simulate to obtain the (n + 1)-th simulation result; update the attributes of the experimental subject based on the (n + 1)-th simulation result and the (n + 1)-th action decision, and determine the (n + 1)-th simulation result and the (n + 1)-th action decision as the (n + 1)-th historical memory until the N-th simulation result is obtained.
[0080] For each experimental subject, a memory mechanism based on the Ebbinghaus forgetting curve is used to process and activate their historical memories. Specifically, the system acquires the subject's nth historical memory (when n=1, this is the initial memory). The basic memory strength is calculated by nonlinearly fitting the time interval between the nth and (n-1)th historical memories, and the memory activation strength increment is calculated by combining the semantic similarity between the two. The overall activated memory strength is obtained. The forgetting probability of the nth historical memory is determined according to the mapping relationship between memory strength and forgetting probability. The (n+1)th target memory that has not been forgotten and has been activated is selected as the prompt information for the subject to make action decisions for the (n+1)th sub-path.
[0081] Based on the nodes of the current experimental path and its own attributes, the experimental subject decides whether to interact with other experimental subjects. If interaction is determined to be necessary, it randomly selects a nearby target experimental subject from among the other experimental subjects according to a preset abstract social distance, autonomously sends textual interactive content to the target subject, and waits for the other party's response to complete the interaction process.
[0082] After acquiring the interactive content, the experimental subject generates the (n+1)th action decision based on the aforementioned prompts, the interactive content obtained, their own attributes, and social behavioral constraints. The system further uses a large-scale model of automated experimental design to determine whether the action decision violates the preset social constraints. If it violates the constraints, the action decision is regenerated; if it meets the constraints, the (n+1)th simulation result generated by the action decision is simulated.
[0083] The experimental subject fine-tunes its own attribute data values based on the (n+1)th simulation result and the (n+1)th action decision, and stores the action decision and simulation result as the (n+1)th historical memory in the memory module to complete the state update of this iteration.
[0084] The system repeats the above steps, allowing all experimental subjects to execute iteratively in sequence until the simulation process of all N sub-paths is completed.
[0085] Through the aforementioned iterative interaction and decision-making mechanism, the automated operation of the multi-agent social simulation experiment was achieved, providing authentic, coherent, and research-oriented original empirical data for subsequent data analysis and structured writing.
[0086] In the embodiments of this application, for example, Figure 2Step S240, as shown, involves performing forgetting and activation processing on the nth historical memory based on the Ebbinghaus forgetting curve to obtain the (n+1)th target memory that is not forgotten and is activated. This includes: calculating the basic memory strength using a nonlinear fitting function constructed based on the time interval between the nth and (n-1)th historical memories; determining the intensity increment for memory activation of the (n-1)th historical memory based on the semantic similarity between the nth and (n-1)th historical memories; determining the activated memory strength based on the intensity increment and the basic memory strength; determining the forgetting probability, which has a mapping relationship with the activated memory strength, as the probability that the nth historical memory is forgotten; and determining the (n+1)th target memory from the nth historical memory based on the probability that the nth historical memory is forgotten.
[0087] For each experimental subject, the system acquires its nth historical memory and (n-1)th historical memory. Based on the relative time interval between the nth and (n-1)th historical memories, a basic memory strength quantification function is constructed by nonlinearly fitting the Ebbinghaus forgetting curve, and the basic memory strength of the (n-1)th historical memory is calculated.
[0088] Furthermore, a memory activation mechanism is introduced. The text content of the nth historical memory and the (n-1)th historical memory is transformed into semantic vectors through a semantic embedding model. The cosine similarity is calculated. The higher the similarity, the greater the intensity increment applied.
[0089] The strength of the activated memory is calculated by combining the base memory strength of the (n-1)th historical memory with the strength increment brought by semantic similarity.
[0090] Based on the activated memory strength, and through a preset mapping relationship between memory strength and forgetting probability, the probability of forgetting the nth historical memory is determined. The forgetting probability represents how likely that memory will be forgotten in the agent's next action. If the forgetting probability of the nth historical memory is low, then the memory is retained as the target memory; if the forgetting probability is high, then the memory is filtered out. The subset of memories that are not forgotten and are activated serves as prompting information for the experimental subject's next action decision.
[0091] According to embodiments of this application, an intelligent agent is provided. At any moment The memory sequence is ,in, Represents initial memory. Any historical memory. With new memories The relative time interval between them is defined as A fundamental memory strength quantization function is constructed by nonlinearly fitting the Ebbinghaus curve. , This represents the base memory strength corresponding to any historical memory. Whenever a new memory is generated... When using this function to calculate The fundamental strength of all historical memories. A memory activation mechanism is further introduced, based on new memories. With historical memory semantic similarity ,right Apply random increments to the intensity As shown in equation (3):
[0092] (3).
[0093] Figure 3 A mapping table of memory strength and forgetting probability according to an embodiment of this application is shown. The mapping is based on the strength of memory after activation. ,pass Figure 3 The mapping relationship shown determines its forgetting probability. A subset of memories after forgetting and filtering This will serve as an effective basis for the agent's decision-making in the next moment.
[0094] By constructing a forgetting and activation processing mechanism based on adjacent memories, the experimental subjects were able to simulate the natural forgetting and associative activation process of humans, which effectively improved the authenticity and coherence of the experimental subjects' behavior in the social simulation experiment, and provided more credible original experience data for the subsequent structured writing module.
[0095] In the embodiments of this application, for example, Figure 2 The step S240 shown includes experimental data consisting of N simulation results and N action decisions corresponding to each experimental subject; the method also includes: analyzing the N simulation results and N action decisions corresponding to each experimental subject based on multiple agents instantiated from expert roles with different analytical perspectives, and obtaining analysis results; generating structured text based on the analysis results and the writing type matching the target research topic.
[0096] The data generated from the social simulation experiment includes N simulation results and N action decisions for each experimental subject. The simulation results record the qualitative or quantitative outcomes generated by the experimental subject during the execution of each sub-research path, while the action decisions record the specific behaviors taken by the experimental subject in each step.
[0097] The method of multi-agent question-and-answer debate is used to comprehensively analyze the N simulation results and N action decisions corresponding to each experimental subject. This method can be achieved by having multiple agents with different professional backgrounds or analytical perspectives ask questions, answer questions, debate and reach consensus.
[0098] Based on the predefined writing steps and formats of four styles—academic papers, research reports, policy briefs, and review papers—the system automatically selects the most suitable writing type according to the target research topic and content characteristics. Then, in accordance with the corresponding writing steps and format specifications, it integrates the research background, literature review, experimental design, simulation process, analysis results, and other phased content generated during the task planning process into a final research report with a complete structure and standardized format.
[0099] The original experimental data generated from social simulation experiments are analyzed in depth through multi-agent question-and-answer debates. This process extracts the emerging behavioral patterns, process mechanisms, and key research findings while ensuring the authenticity of the experimental results. The experimental data is automatically transformed into scientific research texts that conform to academic norms, achieving full automation from data generation to paper writing.
[0100] Figure 4 An architecture diagram of a data generation method based on a social simulation experiment according to an embodiment of this application is shown.
[0101] like Figure 4 As shown, based on the target research topic input by the user, a target research path is generated through chain-of-thought reasoning. It then determines whether the target research path includes a sub-path requiring a social simulation experiment. If not, the process ends; if it does, the required role profile, social behavioral constraints, and experimental path for completing the social simulation experiment are determined. Target behavioral data matching the role profile is retrieved from a pre-built behavioral dataset and identified as the experimental subject's initial memory. The experimental subject simulates the execution of the experimental path based on this initial memory under social behavioral constraints, generating experimental data. This generated experimental data serves as reference data for scientific writing. After generating the reference data, it can be further analyzed and combined with a writing type template matching the target research topic to generate structured text, thus completing the scientific writing for user reference.
[0102] Figure 5 A block diagram of a data generation system based on a social simulation experiment according to an embodiment of this application is shown.
[0103] like Figure 5 As shown, the data generation system 500 based on social simulation experiments includes a first reasoning module 510, a second reasoning module 520, a determination module 530, and a generation module 540.
[0104] The first reasoning module 510 is used to perform chain-of-thought reasoning analysis on the research path of the target research topic to obtain the target research path.
[0105] The second reasoning module 520 is used to perform reasoning analysis on the experimental design of the social simulation experiment according to a predetermined standardized experimental description protocol when the target research path includes a sub-path that requires the execution of a social simulation experiment, in order to determine the role profile of the experimental subject, social behavioral constraints and experimental path required to complete the social simulation experiment.
[0106] The determination module 530 is used to retrieve target behavior data that matches the character profile from the pre-constructed behavior dataset, and determine the target behavior data as the initial memory of the experimental subject.
[0107] The generation module 540 is used to generate experimental data by simulating the process of the experimental subject executing the experimental path based on the initial memory under the social behavioral constraints, so as to use the experimental data as reference data for scientific research writing.
[0108] According to an embodiment of this application, the first reasoning module includes a third reasoning submodule, a first determining submodule, a second determining submodule, and a stopping submodule.
[0109] The third reasoning submodule is used to input the target research topic into multiple agents and perform the i-th step of the reasoning task, outputting their respective candidate sub-research paths, where i is an integer greater than 2. The first determination submodule is used to determine the i-th sub-research path for the i-th step from the multiple candidate sub-research paths based on a voting mechanism, provided that all candidate sub-research paths meet predetermined evaluation conditions. The second determination submodule is used to have multiple agents perform the (i+1)-th step of the reasoning task again based on the i-th sub-research path, and determine the (i+1)-th sub-research path for the (i+1)-th step. The stopping submodule is used to stop executing the reasoning task and obtain the target research path when the semantic similarity between the (i+1)-th and i-th sub-research paths meets a threshold.
[0110] According to embodiments of this application, the predetermined standardized experiment description protocol includes the subject type, attribute list, environmental constraints, experiment purpose, and experiment behavior interval duration of the experiment subject. The second reasoning module includes a character profile generation submodule, a construction instruction submodule, and a constraint path generation submodule.
[0111] The character profile generation submodule generates character profiles for a given subject type based on the subject type, attribute list, experimental objective, and environmental constraints, using a large language model. The character profile includes target attributes matching the subject type and behavioral tendencies inferred from the experimental objective. The instruction construction submodule constructs constraint path generation instructions based on the character profile, environmental constraints, experimental objective, and experimental behavior interval duration. The constraint path generation submodule inputs these instructions into the large language model and outputs social behavioral constraints and experimental paths that match the character profile.
[0112] According to an embodiment of the present application, the determination module includes a retrieval sub-module, a screening sub-module, and a memory initialization sub-module.
[0113] The retrieval sub-module is used to determine query information for querying role characteristics based on the semantic information of the role portrait. The screening sub-module is used to screen multiple candidate behavior data that match the query information from the behavior dataset. The memory initialization sub-module is used to randomly screen any one of the multiple candidate behavior data and rewrite any one of the behavior data into target behavior data aligned with the target research topic.
[0114] According to an embodiment of the present application, there are multiple experimental subjects, and the experimental path includes N sub-paths, where N is an integer greater than or equal to 2. The generation module includes a processing sub-module, an information determination sub-module, an interaction sub-module, a simulation module, and a simulation result determination sub-module.
[0115] The processing sub-module is used to perform forgetting and activation processing on the nth historical memory based on the Ebbinghaus forgetting law to obtain the (n + 1)th target memory that is not forgotten and is activated, where 0 < n ≤ N and n is an integer. When n = 1, the historical memory is the initial memory. The information determination sub-module is used to determine the (n + 1)th target memory as the prompt information for the action decision required to execute the (n + 1)th sub-path. The interaction sub-module is used to screen target experimental subjects that meet the preset conditions from other experimental subjects and perform interactions when it is determined that the experimental subject needs to interact with other experimental subjects. The simulation module is used to generate the (n + 1)th action decision according to the prompt information, the interaction content obtained from the interaction, the attributes of the experimental subject, and the social behavior constraints, and simulate to obtain the (n + 1)th simulation result based on the (n + 1)th action decision. The simulation result determination sub-module is used to update the attributes of the experimental subject based on the (n + 1)th simulation result and the (n + 1)th action decision, and determine the (n + 1)th simulation result and the (n + 1)th action decision as the (n + 1)th historical memory until the Nth simulation result is obtained. The processing sub-module, the information determination sub-module, the interaction sub-module, the simulation module, and the simulation result determination sub-module can be repeatedly executed for each experimental subject.
[0116] According to an embodiment of this application, based on the Ebbinghaus forgetting curve, the nth historical memory is subjected to forgetting and activation processing to obtain the (n+1)th target memory that is not forgotten and is activated. This includes: calculating the basic memory strength based on the time interval between the nth and (n-1)th historical memories using a nonlinear fitting function constructed using the Ebbinghaus forgetting curve; determining the intensity increment for memory activation of the (n-1)th historical memory based on the semantic similarity between the nth and (n-1)th historical memories; determining the activated memory strength based on the intensity increment and the basic memory strength; determining the forgetting probability, which has a mapping relationship with the activated memory strength, as the probability that the nth historical memory is forgotten; and determining the (n+1)th target memory from the nth historical memory based on the probability that the nth historical memory is forgotten.
[0117] According to embodiments of this application, the experimental data includes N simulation results and N action decisions corresponding to each experimental subject. The data generation system 500 based on social simulation experiments further includes an analysis module and a text generation module. The analysis module is used to analyze the N simulation results and N action decisions corresponding to each experimental subject based on multiple agents instantiated from expert roles with different analytical perspectives, to obtain analysis results. The text generation module is used to generate structured text based on the analysis results and a writing type matching the target research topic.
[0118] According to embodiments of this application, any plurality of modules among the first inference module 510, the second inference module 520, the determination module 530, and the generation module 540 may be combined into one module, or any one of these modules may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of other modules and implemented in one module. According to embodiments of this application, at least one of the first inference module 510, the second inference module 520, the determination module 530, and the generation module 540 may be at least partially implemented as hardware circuitry, such as a field-programmable gate array (FPGA), a programmable logic array (PLA), a system-on-a-chip, a system-on-a-substrate, a system-on-package, an application-specific integrated circuit (ASIC), or any other reasonable means of integrating or packaging circuitry, or implemented in software, hardware, or firmware, or in any one of the three implementation methods, or in a suitable combination of any of them. Alternatively, at least one of the first reasoning module 510, the second reasoning module 520, the determining module 530, and the generating module 540 may be implemented at least partially as a computer program module, which can perform corresponding functions when the computer program module is run.
[0119] It should be noted that the data generation system based on social simulation experiments in the embodiments of this application corresponds to the data generation method based on social simulation experiments in the embodiments of this application. For a detailed description of the data generation system based on social simulation experiments, please refer to the data generation method based on social simulation experiments section, which will not be repeated here.
[0120] Figure 6 A block diagram of an electronic device suitable for implementing a data generation method based on a social simulation experiment, according to an embodiment of this application, is shown.
[0121] like Figure 6 As shown, an electronic device 600 according to an embodiment of this application includes a processor 601, which can perform various appropriate actions and processes according to a program stored in a read-only memory ROM 602 or a program loaded from a storage portion 608 into a random access memory RAM 603. The processor 601 may include, for example, a general-purpose microprocessor (e.g., a CPU), an instruction set processor and / or an associated chipset and / or a special-purpose microprocessor (e.g., an application-specific integrated circuit (ASIC)). The processor 601 may also include onboard memory for caching purposes. The processor 601 may include a single processing unit or multiple processing units for performing different actions of the method flow according to an embodiment of this application.
[0122] RAM 603 stores various programs and data required for the operation of electronic device 600. Processor 601, ROM 602, and RAM 603 are interconnected via bus 604. Processor 601 executes various operations of the method flow according to embodiments of this application by executing programs in ROM 602 and / or RAM 603. It should be noted that the programs may also be stored in one or more memories other than ROM 602 and RAM 603. Processor 601 may also execute various operations of the method flow according to embodiments of this application by executing programs stored in said one or more memories.
[0123] According to embodiments of this application, the electronic device 600 may further include an input / output (I / O) interface 605, which is also connected to a bus 604. The electronic device 600 may also include one or more of the following components connected to the input / output (I / O) interface 605: an input section 606 including a keyboard, mouse, etc.; an output section 607 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 608 including a hard disk, etc.; and a communication section 609 including a network interface card such as a local area network (LAN) card, modem, etc. The communication section 609 performs communication processing via a network such as the Internet. A drive 610 is also connected to the input / output (I / O) interface 605 as needed. A removable medium 611, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 610 as needed so that computer programs read from it can be installed into the storage section 608 as needed.
[0124] This application also provides a computer-readable storage medium, which may be included in the device / apparatus / system described in the above embodiments; or it may exist independently and not assembled into the device / apparatus / system. The computer-readable storage medium carries one or more programs, which, when executed, implement the method according to the embodiments of this application.
[0125] According to embodiments of this application, the computer-readable storage medium can be a non-volatile computer-readable storage medium, such as including but not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this application, the computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. For example, according to embodiments of this application, the computer-readable storage medium may include ROM 602 and / or RAM 603 and / or one or more memories other than ROM 602 and RAM 603 described above.
[0126] Embodiments of this application also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowchart. When the computer program product is run on a computer system, the program code is used to cause the computer system to implement the methods provided in the embodiments of this application.
[0127] When the computer program is executed by the processor 601, it performs the functions defined in the system / apparatus of this application embodiment. According to the embodiments of this application, the systems, apparatuses, modules, units, etc., described above can be implemented by computer program modules.
[0128] In one embodiment, the computer program may rely on a tangible storage medium such as an optical storage device or a magnetic storage device. In another embodiment, the computer program may also be transmitted and distributed in the form of signals over a network medium, and downloaded and installed via the communication section 609, and / or installed from the removable medium 611. The program code contained in the computer program can be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination thereof.
[0129] In such an embodiment, the computer program can be downloaded and installed from a network via the communication section 609, and / or installed from the removable medium 611. When the computer program is executed by the processor 601, it performs the functions defined in the system of this application embodiment. According to the embodiments of this application, the systems, devices, apparatuses, modules, units, etc., described above can be implemented by computer program modules.
[0130] According to embodiments of this application, program code for executing the computer programs provided in the embodiments of this application can be written in any combination of one or more programming languages. Specifically, these computational programs can be implemented using high-level procedural and / or object-oriented programming languages, and / or assembly / machine languages. Programming languages include, but are not limited to, languages such as Java, C++, Python, "C", or similar programming languages. The program code can be executed entirely on the user's computing device, partially on the user's device, partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).
[0131] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0132] Those skilled in the art will understand that the features described in the various embodiments of this application can be combined and / or combined in various ways, even if such combinations or combinations are not explicitly described in this application. In particular, the features described in the various embodiments of this application can be combined and / or combined in various ways without departing from the spirit and teachings of this application. All such combinations and / or combinations fall within the scope of this application.
[0133] The embodiments of this application have been described above. However, these embodiments are merely illustrative and not intended to limit the scope of this application. Although various embodiments have been described above, this does not mean that the measures in the various embodiments cannot be used advantageously in combination. Without departing from the scope of this application, those skilled in the art can make various substitutions and modifications, all of which should fall within the scope of this application.
Claims
1. A data generation method based on social simulation experiments, characterized in that, The method includes: Performing a thinking chain reasoning analysis on the research path of the target research topic to obtain the target research path; In the case where the target research path includes a sub-path that requires performing a social simulation experiment, reasoning and analyzing the experimental design of the social simulation experiment according to a predetermined standardized experiment description protocol, and determining the role portrait, social behavior constraints, and experimental path of the experimental subjects required to complete the social simulation experiment; Retrieving target behavior data matching the role portrait from a pre-constructed behavior dataset and determining the target behavior data as the initial memory of the experimental subject; Using the process of the experimental subject simulating the execution of the experimental path based on the initial memory under the social behavior constraints to generate experimental data, so as to use the experimental data as reference data for scientific research writing.
2. The method according to claim 1, characterized in that, The retrieving of the behavior data matching the role portrait from the pre-constructed behavior dataset includes: Determining query information for querying role characteristics based on the semantic information of the role portrait; Screening multiple candidate behavior data matching the query information from the behavior dataset; Randomly screening any one of the multiple candidate behavior data and rewriting the any one of the behavior data into the target behavior data aligned with the target research topic.
3. The method according to claim 1, characterized in that, The predetermined standardized experiment description protocol includes the subject type, attribute list, environmental constraints, experimental purpose, and experimental behavior interval duration of the experimental subject; The reasoning and analyzing the experimental design of the social simulation experiment according to the predetermined standardized experiment description protocol to determine the role portrait, social behavior constraints, and experimental path of the experimental subjects required to complete the social simulation experiment includes: Based on the subject type, the attribute list, the experimental purpose, and the environmental constraints, using a large language model to generate a role portrait of the subject type, where the role portrait includes target attributes matching the subject type and role behavior tendencies inferred based on the experimental purpose; Constructing a constraint path generation instruction according to the role portrait, the environmental constraints, the experimental purpose, and the experimental behavior interval duration; Inputting the constraint path generation instruction into the large language model to output the social behavior constraints and the experimental path matching the role portrait.
4. The method according to claim 1, characterized in that, There are multiple experimental subjects, and the experimental path includes N sub-paths, where N is an integer greater than or equal to 2; The process of the experimental subject simulating the execution of the experimental path based on the initial memory under the social behavior constraints includes: For each experimental subject: Based on the Ebbinghaus forgetting law, performing forgetting and activation processing on the nth historical memory to obtain the (n + 1)th target memory that is not forgotten and activated, where 0 < n ≤ N and n is an integer, and when n = 1, the historical memory is the initial memory; Determining the (n + 1)th target memory as the prompt information for the action decision required to execute the (n + 1)th sub-path; If it is determined that the experimental subject needs to interact with other experimental subjects, target experimental subjects whose distance from the experimental subject meets the preset conditions are selected from the other experimental subjects, and interaction is carried out. Based on the prompts, the interactive content obtained from the interaction, the attributes of the experimental subject, and the social behavioral constraints, the (n+1)th action decision is generated, and based on the (n+1)th action decision, the (n+1)th simulation result is obtained. Based on the (n+1)th simulation result and the (n+1)th action decision, the attributes of the experimental subject are updated, and the (n+1)th simulation result and the (n+1)th action decision are determined as the (n+1)th historical memory, until the Nth simulation result is obtained.
5. The method according to claim 4, characterized in that, Based on the Ebbinghaus forgetting curve, the process involves forgetting and activating the nth historical memory to obtain the (n+1)th target memory that is neither forgotten nor activated, including: Based on the time interval between the nth historical memory and the (n-1)th historical memory, the basic memory strength is calculated using a nonlinear fitting function constructed using the Ebbinghaus forgetting curve. Based on the semantic similarity between the nth historical memory and the (n-1)th historical memory, determine the intensity increment of memory activation for the (n-1)th historical memory; The activated memory strength is determined based on the intensity increment and the base memory strength. The forgetting probability, which is mapped to the activated memory strength, is determined as the probability that the nth historical memory is forgotten. Based on the probability that the nth historical memory is forgotten, the (n+1)th target memory is determined from the nth historical memory.
6. The method according to claim 4, characterized in that, The experimental data includes N simulation results and N action decisions for each experimental subject; The method further includes: Based on multiple agents instantiated from expert roles with different analytical perspectives, the N simulation results and N action decisions corresponding to each experimental subject are analyzed to obtain the analysis results. Based on the analysis results and the writing type that matches the target research topic, structured text is generated.
7. The method according to claim 1, characterized in that, The research path for the target research topic is analyzed using a chain-of-thought reasoning approach to obtain the target research path, which includes: The target research topic is input into multiple intelligent agents to perform the i-th step of the reasoning task, and the corresponding candidate sub-research paths are output, where i is an integer greater than 2; If all of the candidate sub-research paths meet the predetermined evaluation conditions, the i-th sub-research path for the i-th step is determined from the candidate sub-research paths based on a voting mechanism. Based on the i-th sub-research path, the multiple agents execute the (i+1)-th step of the reasoning task again and determine the (i+1)-th sub-research path of the (i+1)-th step. The reasoning task is stopped when the semantic similarity between the (i+1)th sub-research path and the ith sub-research path meets the threshold, thus obtaining the target research path.
8. The method according to claim 7, characterized in that, The predetermined evaluation conditions include at least one of the following: topic relevance conditions, adaptability conditions, practicality conditions, and coherence of thought conditions. The topic association condition includes that the semantic association between the candidate sub-research path and the target research topic is higher than a preset association threshold; The adaptability condition includes that the adaptability of the research methods and research content targeted by the candidate sub-research paths is higher than a preset adaptability threshold. The practicality criteria include that the practicality score of the research content targeted by the candidate sub-research path is higher than a preset practicality threshold. The coherence condition includes that the degree of coherence between the candidate sub-research path and the sub-research path determined by historical reasoning is higher than a preset coherence threshold.
9. A data generation system based on social simulation experiments, characterized in that, The system includes: The first reasoning module is used to perform chain-of-thinking reasoning analysis on the research path of the target research topic to obtain the target research path. The second reasoning module is used to perform reasoning analysis on the experimental design of the social simulation experiment according to a predetermined standardized experimental description protocol when the target research path includes a sub-path that requires the execution of a social simulation experiment, in order to determine the role profile of the experimental subject, social behavioral constraints and experimental path required to complete the social simulation experiment. The determination module is used to retrieve target behavior data that matches the character profile from a pre-constructed behavior dataset, and determine the target behavior data as the initial memory of the experimental subject; The generation module is used to simulate the process of the experimental path based on the initial memory under the social behavioral constraints of the experimental subject, and generate experimental data so that the experimental data can be used as reference data for scientific research writing.
10. An electronic device, comprising: One or more processors; Memory, used to store one or more computer programs. The characteristic feature is that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1 to 8.