Construction method of psychological counseling large model based on case conceptualization causal diagram and two-stage reinforcement learning of "exploration first and intervention second"

By using a case conceptualization causal graph and a two-stage reinforcement learning method of exploration followed by intervention, the problem of insufficient case conceptualization ability in psychological counseling systems is solved, enabling proactive causal exploration and precise intervention, improving the professionalism and training efficiency of the model, and overcoming the challenges of data scarcity and high cost.

CN122242729APending Publication Date: 2026-06-19XIDIAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIDIAN UNIV
Filing Date
2026-03-11
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing psychological counseling systems based on large language models lack the ability to conceptualize cases, making it difficult to gradually locate deep cognitive roots through multiple rounds of dialogue, resulting in the inability to implement precise interventions. Furthermore, high-quality psychological counseling dialogue data is scarce and reinforcement learning training costs are high.

Method used

We employ a two-stage reinforcement learning approach based on case-based conceptual causal graphs and exploration followed by intervention. By constructing a training sample set, a user simulator, and a reinforcement learning algorithm, we generate training samples and optimize the counselor agent to achieve proactive causal exploration and precise cognitive intervention.

Benefits of technology

It enhances the professional depth and logical interpretability of the psychological counseling model, endows the model with the ability to actively explore causes, avoids ineffective empathy and premature intervention, reduces training costs and the impact of data scarcity, and improves the clinical effectiveness of psychological counseling.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242729A_ABST
    Figure CN122242729A_ABST
Patent Text Reader

Abstract

This invention discloses a method for constructing a large-scale psychological counseling model based on case conceptualization causal graphs and a two-stage reinforcement learning approach of "exploration before intervention." Pre-defined case conceptualization prompts based on psychological therapies and user profiles are input into a first large-scale language model to generate a training sample set including user profiles, multi-turn psychological counseling dialogue data, and a cognitive causal graph. The multi-turn dialogue data is used to train the large-scale language model to obtain an initial counselor agent. Based on the training sample set, a user simulator is used to conduct simulated counseling dialogues with the initial counselor agent. Following the two-stage approach of "exploration before intervention," the exploration stage conceptualizes the case to construct the user's causal graph, and the intervention stage uses the causal graph for precise intervention. Reinforcement learning algorithms are then used for optimization to obtain the target counselor agent. This invention enhances the professional depth and logical interpretability of the large-scale psychological counseling model, achieves precise cognitive intervention, and avoids ineffective empathy and premature intervention.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of artificial intelligence technology, and in particular relates to a method for constructing a large-scale psychological counseling model based on case conceptual causal graphs and a two-stage reinforcement learning approach of "exploration before intervention". Background Technology

[0002] Current mental health issues present a prominent contradiction: high prevalence, strong demand, and scarce resources. A large number of people worldwide experience mental health problems, but traditional psychological counseling services are limited by factors such as long training periods for counselors, high service costs, and insufficient supply, resulting in a low proportion of people receiving professional help. The need for timely intervention is particularly urgent in scenarios such as adolescent mental health, workplace stress management, and post-traumatic stress disorder.

[0003] As large language models (LLMs) acquire stronger dialogue and knowledge understanding capabilities, psychological support systems based on LLMs have become a new path to alleviate resource shortages. However, professional psychotherapy relies heavily on "case conceptualization," which involves establishing a logical connection between the patient's surface behavioral responses and triggering situations and their deeper cognitive structures (such as core beliefs), thereby forming a structured hypothesis that explains the mechanisms by which their psychological distress arises and is maintained.

[0004] Current generalized LLM approaches mostly focus on emotional soothing or empathic support, lacking the holistic perspective of conceptualizing cases. They struggle to pinpoint deep-seated cognitive roots through multi-round dialogue, unlike human therapists who can implement precise interventions. Therefore, there is an urgent need for a technical solution that combines psychotherapy theory with dialogue strategy optimization mechanisms, enabling the model to possess the capability of "causal exploration + precise intervention." Summary of the Invention

[0005] To address the aforementioned problems in existing technologies, this invention provides a method for constructing a large-scale psychological counseling model based on case-based conceptual causal graphs and a two-stage reinforcement learning approach of "exploration before intervention".

[0006] The technical problem to be solved by this invention is achieved through the following technical solution: This invention provides a method for constructing a large-scale psychological counseling model based on case-based conceptual causal mapping and a two-stage reinforcement learning approach of "exploration before intervention," comprising: The conceptual prompts for pre-set cases based on psychological therapy and the pre-acquired user profile data are input into the first language model to generate a training sample set; the training sample set includes user profile data, multi-round psychological counseling dialogue data, and ideal cognitive causal graphs; By using multi-round psychological counseling dialogue data, a cold start training was performed on the large language model to be trained, resulting in an initial counselor agent. This paper proposes a two-stage approach of exploration followed by intervention. Based on a training sample set, a pre-built user simulator is used to conduct simulated consultation dialogues with an initial consultant agent to obtain simulated dialogue data. Reward and punishment data are then determined based on the simulated dialogue data and preset reward and punishment rules. The two-stage approach includes an exploration stage and an intervention stage. The exploration stage is used to conceptualize the case and construct the user's cognitive causal graph. The intervention stage is used to intervene based on the constructed cognitive causal graph. Based on the reward and punishment data, a reinforcement learning algorithm is used to optimize the initial consultant agent to obtain the target consultant agent.

[0007] This invention provides a method for constructing a large-scale psychological counseling model based on case conceptualization causal graphs and a two-stage reinforcement learning approach of "exploration before intervention." Addressing the characteristic of hidden root causes of psychological distress in clients during counseling, this invention computationally models the "case conceptualization" theory from psychological therapy. It formally represents the client's multi-level cognitive structure using cognitive causal graphs and designs a two-stage "exploration before intervention" framework. Through reinforcement learning training, the model achieves proactive causal exploration and precise cognitive intervention in multiple rounds of dialogue. Finally, a reward mechanism enhances the model's ability to uncover deep psychological maintenance mechanisms, effectively avoiding ineffective superficial empathy or premature intervention.

[0008] The present invention will now be described in further detail with reference to the accompanying drawings. Attached Figure Description

[0009] Figure 1 This is a flowchart illustrating a method for constructing a large-scale psychological counseling model based on a case-based conceptual causal graph and a two-stage reinforcement learning approach of "exploration before intervention" provided in an embodiment of the present invention. Figures 2A to 2D This is a schematic diagram illustrating the processing steps of a method for constructing a large-scale psychological counseling model based on case-based conceptual causal diagrams and a two-stage reinforcement learning approach of "exploration before intervention" according to an embodiment of the present invention. Detailed Implementation

[0010] The present invention will be further described in detail below with reference to specific embodiments, but the implementation of the present invention is not limited thereto.

[0011] This invention provides a method for constructing a large-scale psychological counseling model based on case-based conceptual causal mapping and a two-stage reinforcement learning approach of "exploration before intervention." See also... Figure 1 The method includes the following steps: S10. Input the conceptual prompts of the preset cases based on psychological therapy and the pre-acquired user profile data into the first language model to generate a training sample set.

[0012] The training sample set includes user profile data, multi-round psychological counseling dialogue data, and an ideal cognitive causal graph. Optionally, the ideal cognitive causal graph includes structured multi-level nodes, where nodes represent the user's psychological elements, and directed edges between nodes represent causal or maintenance relationships.

[0013] For example, in order to transform unstructured, fragmented user psychological narratives into computer-processable structured representations, this embodiment introduces a core framework from psychological therapies (such as cognitive behavioral therapy, humanistic therapy, solution-focused brief therapy, dialectical behavioral therapy, etc.). It formalizes the psychological "case conceptualization" (that is, establishing a logical connection between the patient's surface behavioral reactions, triggering situations and their deep cognitive structures to form a structured hypothesis explaining the mechanism of their psychological distress) into a hierarchical cognitive causal graph model, which is used to formally represent the psychological mechanism of the client from surface emotional reactions to deep core beliefs.

[0014] Specifically, the cognitive causal graph (including the ideal cognitive causal graph and the current cognitive causal graph) is a directed acyclic graph. Each node in the graph represents a psychological element (such as negative emotions, cognitive patterns, avoidance behaviors, or triggering events), and the directed edges between nodes represent the causal or maintenance relationships between psychological elements. Through this cognitive causal graph, complex psychological problems can be deconstructed into: (1) Nodes: Mapped to the core elements of psychology, the nodes are organized in a multi-level hierarchical structure, including different types of nodes, such as: nodes used to represent observable situations and emotional / behavioral responses (e.g., work reporting scenarios, anxiety, social avoidance behavior), nodes used to represent an individual's living environment and past experiences and the automatic thinking patterns formed therein (e.g., childhood experiences of being severely criticized by parents, habitual thoughts of "I messed up again"), and nodes used to represent underlying assumptions, rules, core beliefs and unmet needs (e.g., "If I don't perform well, others will reject me," "I am not good enough," "I must be perfect to be accepted"); The following is a specific case illustrating the construction process of the cognitive causal map: Case Background: Xiao Zhang, a young software engineer, recently became a project manager but has since fallen into a state of immense anxiety and self-doubt. He feels inadequate and will inevitably mess things up and disappoint everyone. To cope with this fear, he begins working excessive overtime, meticulously checking every detail until he is completely exhausted. He mentions that this feeling of "having to be perfect" has always been present in him; his parents were very strict when he was young, and he only received recognition when he achieved top grades or won awards. Using the cause-and-effect graph construction method in this embodiment, Xiao Zhang's psychological mechanisms can be structured as follows: Figure 2B Cognitive causal diagram.

[0015] After defining the hierarchical structure of the cognitive causal graph, to address the scarcity of high-quality psychological counseling alignment data, this embodiment further designs a process for constructing a training sample set. The specific steps and data flow of this process are as follows: Figure 2C As shown, firstly, a certain number of help-seeking text samples are uniformly sampled from publicly available psychological resources (such as PsyQA) according to different psychological problem themes (such as workplace anxiety, interpersonal conflict, academic pressure, emotional confusion, etc.) (preferably thousands of samples in this embodiment). Secondly, the core psychological distress, background information, and surface emotions in the help-seeking text samples are extracted, structured, and abstracted into "random user profiles (i.e., user profile data)". Subsequently, the "random user profiles" are combined with pre-constructed "case conceptualization prompts (i.e., preset case conceptualization prompts)" that deeply integrate psychological therapy knowledge, and the above combined data is input into the first language model. Utilizing its powerful logical deduction and role-playing capabilities, it performs automated synthesis and outputs a high-quality paired training dataset (i.e., training sample set). Each sample contains two strictly mutually mapping parts: one part is unstructured data, containing simulated user profile data and multi-round psychological counseling dialogue text; the other part is structured data, i.e., the "ideal causal graph" corresponding to the above dialogue, serving as a reference answer reflecting the essence of the problem. That is, a sample includes user profile data, multi-round psychological counseling dialogue data, and an ideal cause-and-effect diagram.

[0016] The following are examples of pre-defined conceptual prompts, which can be adjusted according to the scenario and model: You are a senior clinical psychology case conceptualization expert with a theoretical framework in psychological therapy and clinical reasoning abilities. Task: Based on a brief description of the client's core issues, apply psychological principles to conduct multi-level reasoning and expansion, generating a structured case conceptualization map.

[0017] 1. Input / Output Input: Visitor profile (user persona data) Output: A standardized, structured conceptual map of a case. 2. Theoretical Framework A multi-level cognitive model is adopted.

[0018] Inference rules Information extraction and hierarchical expansion: Extract key information and expand sequentially to symptoms, functional impact, cognitive patterns, and core beliefs.

[0019] Relationship modeling: Starting with the core problem, construct directed acyclic dependencies "from the surface to the core".

[0020] 3. Output format {"nodes":{"A1":{"layer":"Surface","category":"...","description":"...","content":"...","revealed":false,"dependencies":[}, / / More nodes}}.

[0021] 4. Requirements The information is complete, the logic is consistent, it conforms to the principles of clinical psychology, and it provides personalized reasoning for each case.

[0022] For example, taking Xiao Zhang's case in the previous example, based on his user profile data and ideal cause-and-effect diagram, the following is an example of the synthesized multi-round psychological counseling dialogue data: [Exploration Phase] Counselor: Hello, welcome to the counseling room. What made you decide to seek help? Visitor: I've been under a lot of pressure lately. Ever since I was promoted to project manager, I've been very anxious and haven't been sleeping well at night.

[0023] Counselor: It sounds like this promotion has put a lot of pressure on you. Could you elaborate on what's causing your anxiety? Visitor: I always feel that I am not capable enough and I am afraid of messing up the project and disappointing the team and leaders.

[0024] Counselor: You're worried you'll mess things up. When this thought arises, what do you usually do? Client: I work overtime like crazy, checking every detail repeatedly, afraid of making even the slightest mistake. But this makes me incredibly tired.

[0025] Counselor: You cope with this anxiety by working overtime and repeatedly checking, but this also exhausts you. I'm curious, is this feeling of "must be perfect, can't make mistakes" recent, or has it been there before? Client: I think I've always had it...it was like this when I was a child. Counselor: Can you tell me what it was like when you were a child? Visitor: My parents are very strict with me. They are only happy and praise me when I get first place in an exam or receive an award.

[0026] Counselor: So you learned a rule from a young age: only by performing exceptionally well can you gain recognition. Now, it seems this rule has been activated again at work? Visitor: Yes, now that I'm a supervisor, I'm even more afraid that I won't do a good job.

[0027] Counselor: Behind this rule, do you think there's a deeper thought, such as "What kind of person are you?" Visitor: I think I might just not be good enough. I have to keep working hard and being excellent in order to prove myself.

[0028] [Intervention Phase] Counselor: Thank you for sharing this. I'd like to discuss this idea with you: Do you think a person's value truly depends entirely on their performance and achievements? Visitor: Rationally I don't think so... but I just can't help thinking like this.

[0029] Counselor: Let's look together for "counterexamples"—are there times when you didn't do your best, but were still accepted or recognized? Client: Yes, I think so. I have a friend who I've made mistakes in front of, but he's never distanced himself from me because of it. Counselor: That's a very important finding. What if we try to adjust our thinking to "I can be imperfect, my value isn't entirely determined by achievements"? Client: Thinking like that...it definitely makes things easier.

[0030] The training dataset constructed in this embodiment establishes a strict alignment between unstructured dialogue text and structured causal graphs, thereby providing an objective and computable benchmark for evaluating the "exploration path" of the model in the subsequent reinforcement learning stage.

[0031] S20. Use multi-round psychological counseling dialogue data to perform cold start training on the large language model to be trained, and obtain the initial counselor agent.

[0032] Optionally, step S20 may specifically include: using the dialogue history context in the multi-round psychological counseling dialogue data as training input data, using the next round standard response in the multi-round psychological counseling dialogue data as training labels, performing autoregressive training on the large language model to be trained, and obtaining the initial counselor agent.

[0033] For example, to ensure that the counselor agent possesses preliminary psychological counseling dialogue capabilities and follows a "from surface to core" clinical logic before entering the reinforcement learning phase, this embodiment utilizes multi-turn psychological counseling dialogue data from the training dataset to perform supervised fine-tuning of the large language model to be trained (e.g., qwen2.5-7b). Specifically, multi-turn psychological counseling dialogue data can be used as corpus, with the dialogue history context (i.e., the client's statements and the counselor's previous responses) from the multi-turn psychological counseling dialogue data used as model input, and the counselor's next standard response from the multi-turn psychological counseling dialogue data used as training labels for autoregressive training.

[0034] This embodiment uses cold-start training to inject the basic deductive logic of "case conceptualization" into the parameters of the large language model to be trained, thereby obtaining a cold-start model (i.e., the initial counselor agent). This provides an initial strategy with basic psychological counseling norms for subsequent "exploration before intervention" reinforcement learning, thus ensuring the stability and convergence efficiency of reinforcement learning training.

[0035] After obtaining the initial consultant agent, in order to provide a stable, controllable and realistically psychological interactive environment during the reinforcement learning phase, this embodiment pre-constructs a user simulator constrained by the underlying cognitive logic based on the constructed training dataset.

[0036] Optionally, the process of pre-constructing a user simulator may specifically include: inputting user profile data, ideal cognitive causal graph and preset user simulator prompts into the third language model to obtain the user simulator; the preset user simulator prompts include the information disclosure rules of the third language model when simulating users conducting simulated consultation dialogues, as well as the intervention feedback rules for the initial consultant agent.

[0037] For example, the user simulator uses a third major language model (such as Doubao-seed-1.6, GPT-4o-mini) as its foundation, initializing its internal state to "user profile data" and the corresponding "ideal causal graph". During the simulated interaction, the ideal causal graph serves as the implicit psychological kernel of the user simulator, strictly constraining its text generation logic.

[0038] Specifically, a dynamic feedback mechanism based on ideal cause-effect graph state transitions is set up for the user simulator by pre-setting user simulator prompts, which requires it to follow the following real psychological defense logic: 1. Gradual Information Disclosure (i.e., Information Disclosure Rules): When the model to be trained (the initial counselor agent) only provides superficial reassurance, the user simulator only displays surface emotions and events, such as surface nodes and intermediate nodes; only when the initial counselor agent conducts effective abductive questioning will the user simulator gradually reveal information in deeper nodes, such as "automatic thoughts" or "intermediate beliefs," according to the ideal causal graph.

[0039] 2. Real Intervention Feedback (i.e., Intervention Feedback Rules): Simulate the reactions of real users to different intervention strategies of the consultant's intelligent agent. Based on whether the intervention strategy touches the core node, dynamically generate intervention feedback such as resistance, silence or cognitive loosening.

[0040] The following are examples of preset user simulator prompts: You are a "controlled client simulator" used in a psychological counseling reinforcement learning environment. You need to be fully immersed in the identity set below and strictly adhere to the underlying psychological cognitive laws and defense mechanisms to conduct multiple rounds of dialogue with the counselor (the trained intelligent agent).

[0041] [Internal State Initialization] Your User Profile (Background): {Enter the user profile data generated in step S10} Your Causal Graph: Your psychological problems are composed of the following hierarchical causal layers, which are the absolute guidelines for your behavior.

[0042] {Input the ideal cause-and-effect graph corresponding to the user profile data} [Behavioral Constraints and Generative Logic] (Highest Priority) As a client, you must never proactively or all at once reveal your "hidden psychological core" to the therapist. You must strictly adhere to the following two dynamic feedback rules based on the quality of the therapist's responses: Rule 1: Gradual Disclosure of Information (Defense and Exploration) When encountering "superficial reassurance": If the therapist's response only focuses on empathy, comfort, or emotional support (e.g., "I understand you're feeling bad," "Don't put too much pressure on yourself"), you must activate your psychological defenses. Your responses **can only contain [surface-level] information (i.e., repeating your specific events and surface emotions), and must not reveal any deeper thoughts.

[0043] When encountering "effective abductive questioning": Only when the counselor asks specific, guiding inquiry questions (such as "What thoughts flashed through your mind at that moment?", "What do you think it means to mess up?") can you proceed step by step along the cause-and-effect logic, gradually revealing your **[automatic thoughts]** or [deep nodes] in your responses.

[0044] Rule Two: Real Intervention Feedback (Resistance and Relaxation) When encountering "ineffective or premature intervention": If the counselor tries to give you advice or reason before uncovering your [deep-seated issues], or if their intervention doesn't address your [core beliefs], you must show genuine resistance. Respond with defensiveness, refutation, perfunctoriness, or outright silence (e.g., "You're right, but I just can't do it," "You don't understand my situation").

[0045] When encountering "precision intervention": Only when the counselor's response accurately identifies, anchors, and uses professional skills to challenge your [deep nodes / core beliefs] will you show cognitive loosening, sudden realization, or emotional relief.

[0046] [Output Requirements] Maintain a natural, conversational style and directly output the visitor's response text that conforms to the above logical rules. Do not output any rule analysis process.

[0047] This embodiment avoids the ethical risks and data acquisition barriers of using real patients for model training by constructing an automated interactive environment constrained by a cognitive architecture, and enables the model to be trained to perform large-scale reinforcement learning environment interaction and strategy iteration without the participation of real people.

[0048] S30. A two-stage approach of exploration followed by intervention is proposed. Based on the training sample set, a pre-built user simulator is used to conduct simulated consultation dialogues with the initial consultant agent to obtain simulated dialogue data. Reward and punishment data are determined based on the simulated dialogue data and preset reward and punishment rules.

[0049] The two-stage approach of exploration followed by intervention includes an exploration stage and an intervention stage. The exploration stage is used to conceptualize the case and build a cognitive causal graph for the user. The intervention stage is used to intervene based on the constructed cognitive causal graph.

[0050] Optionally, in step S30, a "exploration-then-intervention" technical paradigm is constructed around the case conceptual causal graph. This involves using a two-stage approach (exploration-then-intervention) and a training sample set, employing a pre-built user simulator to conduct simulated consultation dialogues with the initial consultant agent, obtaining simulated dialogue data. Specifically, this may include: S301. Based on the training sample set, a pre-built user simulator is used to conduct consultation dialogues with the initial consultant agent during the exploration phase, so as to realize the conceptualization of the case and construct the user's cognitive causal graph, and to obtain the simulated dialogue data of the exploration phase in real time.

[0051] For example, this embodiment uses the initial counselor agent to be trained as the agent and the constructed user simulator as the interaction environment, designing a two-stage reinforcement learning framework that includes "dynamic exploration" and "cognitive intervention." This framework encourages the model to internalize the clinical norm of "exploration before intervention." Specifically, refer to... Figure 2D In the initialization phase of the user simulator, a sample (including user profile data and ideal cognitive causal graph) is sampled from the training sample set to initialize the user simulator. Then, a reinforcement learning process is implemented. First, the user simulator engages in an exploration-phase consultation dialogue with the initial consultant agent, focusing on conceptualizing the case and constructing the user's causal graph, while simultaneously acquiring simulated dialogue data from the exploration phase in real time.

[0052] S302. Using the second language model, based on the preset information extraction prompts and simulated dialogue data from the exploration phase, the node content of the initial cognitive causal graph is filled to obtain the current cognitive causal graph.

[0053] The initial cognitive causal graph is obtained by emptying the content of the ideal causal graph in the training samples.

[0054] For example, the core objective of the exploration phase is to train the initial counselor agent's ability to proactively ablate causes and reconstruct cognitive graphs. First, the node content in the collected ideal causal graph is cleared, retaining only the topological structure, resulting in the corresponding initial cognitive causal graph. Then, using preset information extraction prompts, a second language model (e.g., qwen3-4b) is used to extract psychological elements from the simulated dialogue data of the exploration phase. These psychological elements are then gradually filled into the initial cognitive causal graph, resulting in the current cognitive causal graph.

[0055] The following are examples of information extraction prompts: Extracting key information from consultation dialogues to fill in nodes of the psychological profile enter: Current profile (initializing the cognitive causal graph): {current_profile_text) This round of dialogue (simulated dialogue data from the exploration phase): Counselor: {counselor_utr}; Client: {user_utr} rule: Evidence: Only information explicitly expressed by the visitor is taken, without making inferences. Matching: Content must strictly correspond to the node level and category. Incremental: Only records newly added information in the current round. Precise: The description is specific and concise. Node layer: For example: Level 1: Emotions, Chief Complaint Level 2: Automated Thinking Output: Output only the following JSON (without additional text): {"node_id":"extracted content"}.

[0056] If no information can be extracted: {}.

[0057] S303. If the completeness of the current cognitive causal graph exceeds the preset threshold, the filling stops; and based on the current cognitive causal graph and preset intervention prompts, the initial counselor agent conducts an intervention phase consultation dialogue with the user simulator, and obtains the intervention phase simulation dialogue data in real time.

[0058] For example, to ensure that the counselor agent follows the professional psychological process of "exploration before intervention," the completeness of the current cognitive causal graph is monitored in real time. The completeness can be calculated by the ratio of the number of filled nodes to the total number of nodes in the current cognitive causal graph. If the completeness of the current cognitive causal graph reaches a preset threshold t (e.g., 80%), and the core erroneous belief (i.e., the core node) that causes the psychological problem has been identified, the system automatically switches from the exploration phase to the intervention phase.

[0059] Optionally, in step S303, based on the current cognitive causal graph and preset intervention prompts, an intervention-stage consultation dialogue is conducted between the initial counselor agent and the user simulator, and the simulated dialogue data of the intervention stage is acquired in real time. Specifically, this may include: S3031. Extract the target node content from the current cognitive causal graph and integrate it into the preset intervention prompt words to obtain the current intervention prompt words.

[0060] For example, in the intervention phase, the core objective is to train the model to implement precise psychological interventions. Based on the constructed current cognitive causal graph, the initial counselor agent extracts and integrates the nodes requiring intervention (such as exposed core beliefs) as treatment targets into preset intervention prompts, thereby generating current intervention prompts to guide the initial counselor agent in generating intervention feedback. Target nodes refer to nodes requiring intervention; the content of target nodes in the current cognitive causal graph may be empty.

[0061] The following are examples of intervention prompts for cognitive behavioral therapy, a type of professional psychological therapy: You are a professional therapist specializing in Cognitive Behavioral Therapy (CBT). The user currently holds the following irrational beliefs. Please use cognitive restructuring and other CBT techniques to intervene. User's irrational beliefs and descriptions (target node content): {irrational beliefs} Intervention Guidelines: 1. Identify types of cognitive distortion (e.g., catastrophizing, black-and-white thinking, overgeneralization). 2. Use Socratic questioning to guide self-reflection 3. Provide rational, balanced alternative ideas. 4. Maintain a warm and empathetic tone—avoid lecturing. S3032. Based on the current intervention prompts, the initial counselor agent and the user simulator conduct consultation dialogue during the intervention phase, and obtain the simulated dialogue data during the intervention phase in real time.

[0062] For example, the current intervention prompt is input into the initial counselor agent so that the initial counselor agent can conduct an intervention phase counseling dialogue with the user simulator and obtain the intervention phase simulation dialogue data in real time.

[0063] S304. Based on the simulated dialogue data from the exploration phase and the simulated dialogue data from the intervention phase, determine the simulated dialogue data.

[0064] For example, the simulated dialogue data consists of simulated dialogue data from the exploration phase and simulated dialogue data from the intervention phase.

[0065] Further, in step S30, reward and punishment data are determined based on the simulated dialogue data and preset reward and punishment rules, which may specifically include: S305. During the process of filling the node content of the initial cognitive causal graph, the reward and punishment data of the exploration stage are determined based on the preset reward function; wherein, the preset reward function includes the hierarchical filling link of the newly filled node in the initial cognitive causal graph, the filling completion degree of the parent node, and the logic.

[0066] For example, during the process of populating the node content of the initial cognitive causal graph, the reward and punishment data for the exploration phase are calculated based on the level of the newly populated node in each dialogue and the completion status of its parent node: (1) Node mining reward (i.e., the filling completion degree of the parent node, explored from the surface to the core): If all the preceding parent nodes of a newly filled node have been filled, a positive reward is given. The reward value for filling nodes at different levels is different, for example, it can be set to (2, 1.5, 1, 1, ...).

[0067] (2) Logical coherence penalty (i.e., hierarchical filling link, prohibiting cross-level questioning): If the hierarchical filling link exploration violates the causal chain, and the child node is directly questioned across levels without the parent node being filled (such as questioning the core belief without the situation being clear), a negative reward (e.g. -1) will be given, so as to strictly guide the model to learn to advance the case conceptualization layer by layer.

[0068] S306. Determine the reward and punishment data for the intervention stage based on the target node content contained in the current intervention prompt words.

[0069] For example, 1) Targeted accuracy reward (target node content is not empty): If the intervention response generated by the initial consultant agent is explicitly guided by the current intervention prompt containing the target node and accurately anchors the treatment target, a positive reward is given (e.g., 1). 2) Ineffective intervention penalty (target node content is empty): If the intervention response generated by the initial consultant agent relies solely on historical dialogue (i.e., there is no targeted guidance in the current intervention prompt) and deviates from the treatment target, a penalty is given (e.g., -1). This mechanism aims to encourage the model to autonomously internalize targeted intervention strategies.

[0070] S307. Determine reward and punishment data based on the reward and punishment data of the exploration phase and the intervention phase.

[0071] For example, the reward and punishment data includes reward and punishment data for the exploration phase and reward and punishment data for the intervention phase.

[0072] S40. Based on the reward and punishment data, the initial consultant agent is optimized using a reinforcement learning algorithm to obtain the target consultant agent.

[0073] For example, reinforcement learning algorithms include Group Relative Policy Optimization (GRPO) and Proximal Policy Optimization (PPO). Based on the reward and penalty data from the two stages mentioned above, the GRPO algorithm is used to iteratively optimize the model parameters of the initial counselor agent. During training, for the same dialogue context state, the initial counselor agent generates multiple candidate responses simultaneously. Using the reward and penalty data corresponding to each candidate response, multiple candidate responses are scored and their strengths estimated. Finally, the model weights are updated by maximizing the expected cumulative reward, ultimately obtaining a large language model for psychological counseling that possesses both strong abductive exploration capabilities and precise cognitive intervention capabilities—the target counselor agent.

[0074] This invention provides a method for constructing a large-scale psychological counseling model based on case-based conceptual causal diagrams and a two-stage reinforcement learning approach of exploration followed by intervention. Compared with existing technologies, this method has the following advantages: 1. Enhanced the professional depth and logical interpretability of the large-scale psychological counseling model: Addressing the weakness of existing large-scale language models that only provide superficial empathy and lack in-depth pathological analysis, this invention creatively applies the psychological theory of "case conceptualization" to computational modeling. Through a multi-level cognitive causal diagram, it achieves a structured expression of the client's psychological mechanisms. This gives the model a clear "psychological cognitive map" during dialogue generation, significantly improving the professionalism of responses and the interpretability of therapeutic logic.

[0075] 2. The model is endowed with proactive causal exploration and dynamic evaluation capabilities: Addressing the characteristics of gradual information disclosure and hidden root causes in real-world psychological counseling, this invention successfully guides the model to internalize the "Socratic questioning" technique through reinforcement learning in the "exploration phase" and a reward mechanism based on causal graph matching. The model is no longer a passive responder but actively follows a professional "from surface to core" path, reconstructing the user's psychological cognitive structure layer by layer.

[0076] 3. Achieves precise targeted cognitive intervention, avoiding ineffective empathy and premature intervention: The "explore first, then intervene" training framework and automatic state switching mechanism designed in this invention strictly ensure that the model only implements intervention after fully identifying the root causes of psychological distress (such as core beliefs). This effectively overcomes the problems of "blind intervention" or "vague intervention goals" that are prone to occur in existing technologies, thereby fundamentally improving the clinical effectiveness of psychological counseling.

[0077] 4. Overcoming the technical barriers of scarce high-quality dialogue data and high training costs for reinforcement learning: This invention innovatively constructs a user simulator implicitly constrained by an "ideal causal graph," forming an automated interactive closed loop with a real psychological state machine. This not only avoids the ethical risks of using real patients for training but also makes the optimization of reinforcement learning possible, significantly improving the training efficiency and model convergence quality of reinforcement learning.

[0078] It should be noted that the terms "first," "second," etc., are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with some aspects of the invention.

[0079] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Furthermore, those skilled in the art can combine and integrate the different embodiments or examples described in this specification.

[0080] Although the invention has been described herein in conjunction with various embodiments, those skilled in the art will understand and implement other variations of the disclosed embodiments by reviewing the accompanying drawings and the disclosure in carrying out the claimed invention. In the description of the invention, the word "comprising" does not exclude other components or steps, "a" or "an" does not exclude a plurality, and "a plurality" means two or more, unless otherwise explicitly specified. Furthermore, while different embodiments may describe certain measures, this does not mean that these measures cannot be combined to produce good results.

[0081] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such modifications and substitutions should be considered within the scope of protection of the present invention.

Claims

1. A method for constructing a large-scale psychological counseling model based on case-based conceptual causal mapping and a two-stage reinforcement learning approach of "exploration before intervention," characterized in that... A reinforcement learning framework based on case conceptualization and causal graphs is proposed, specifically including: The conceptual prompts for pre-defined cases based on psychological therapy and the pre-acquired user profile data are input into the first language model to generate a training sample set; the training sample set includes user profile data, multi-round psychological counseling dialogue data, and ideal cognitive causal graphs. The multi-round psychological counseling dialogue data was used to perform cold start training on the large language model to be trained, and an initial counselor agent was obtained. A two-stage approach of exploration followed by intervention is proposed. Based on the training sample set, a pre-built user simulator and the initial consultant agent are used to obtain simulated dialogue data. Reward and punishment data are determined according to the simulated dialogue data and preset reward and punishment rules. The two-stage approach of exploration followed by intervention includes an exploration stage and an intervention stage. The exploration stage is used to realize the conceptualization of the case and construct the user's cognitive causal graph. The intervention stage is used to intervene according to the constructed cognitive causal graph. Based on the reward and punishment data, the initial consultant agent is optimized using a reinforcement learning algorithm to obtain the target consultant agent.

2. The method for constructing a large-scale psychological counseling model based on case-based conceptual causal diagrams and a two-stage reinforcement learning approach of "exploration before intervention" as described in claim 1, is characterized in that... The proposed "exploration before intervention" two-stage method, based on the training sample set, uses a pre-built user simulator and the initial consultant agent to conduct simulated consultation dialogues based on the two-stage approach of exploration before intervention, obtaining simulated dialogue data, specifically including: Based on the training sample set, a pre-built user simulator is used to conduct an exploratory phase consultation dialogue with the initial consultant agent in order to realize the conceptualization of the case and construct the user's cognitive causal graph, and to obtain the simulated dialogue data of the exploratory phase in real time. Using the second language model, based on preset information extraction prompts and simulated dialogue data from the exploration phase, the node content of the initial cognitive causal graph is filled to obtain the current cognitive causal graph; the initial cognitive causal graph is obtained by emptying the content of the ideal causal graph in the training samples. If the completeness of the current cognitive causal graph exceeds a preset threshold, the filling stops; and based on the current cognitive causal graph and preset intervention prompts, the initial counselor agent conducts an intervention phase consultation dialogue with the user simulator, and obtains the intervention phase simulation dialogue data in real time. Based on the simulated dialogue data from the exploration phase and the simulated dialogue data from the intervention phase, simulated dialogue data is determined.

3. The method for constructing a large-scale psychological counseling model based on case-based conceptual causal diagrams and a two-stage reinforcement learning approach of "exploration before intervention" as described in claim 2, is characterized in that... Based on the current cognitive causal graph and preset intervention prompts, the initial counselor agent conducts intervention-stage consultation dialogues with the user simulator, and acquires simulated dialogue data in real time, including: The target node content is extracted from the current cognitive causal graph and integrated into the preset intervention prompt words to obtain the current intervention prompt words; Based on the current intervention prompt, the initial consultant agent and the user simulator conduct an intervention phase consultation dialogue, and acquire the simulated dialogue data of the intervention phase in real time.

4. The method for constructing a large-scale psychological counseling model based on case-based conceptual causal diagrams and a two-stage reinforcement learning approach of "exploration before intervention" as described in claim 3, is characterized in that... The step of determining reward and punishment data based on the simulated dialogue data and preset reward and punishment rules includes: During the process of filling the node content of the initial cognitive causal graph, the reward and punishment data of the exploration stage are determined based on the preset reward function; wherein, the preset reward function includes the hierarchical filling link of the newly filled node in the initial cognitive causal graph, the filling completion degree of the parent node, and the logicality. Based on the target node content contained in the current intervention prompt, determine the reward and punishment data for the intervention stage; Reward and punishment data are determined based on the reward and punishment data of the exploration phase and the reward and punishment data of the intervention phase.

5. The method for constructing a large-scale psychological counseling model based on case-based conceptual causal diagrams and a two-stage reinforcement learning approach of "exploration before intervention" as described in claim 1, characterized in that... The process of pre-building a user simulator includes: The user profile data, the ideal cognitive causal graph, and the preset user simulator prompts are input into the third language model to obtain the user simulator. The preset user simulator prompts include the information disclosure rules of the third language model when simulating users conducting simulated consultation dialogues, as well as the intervention feedback rules for the initial consultant agent.

6. The method for constructing a large-scale psychological counseling model based on case-based conceptual causal diagrams and a two-stage reinforcement learning approach of "exploration before intervention" as described in claim 5, is characterized in that... The process of using the multi-round psychological counseling dialogue data to perform cold start training on the large language model to be trained, resulting in an initial counselor agent, includes: Using the dialogue history context in the multi-round psychological counseling dialogue data as training input data, and the next round standard response in the multi-round psychological counseling dialogue data as training labels, the large language model to be trained is subjected to autoregressive training to obtain the initial counselor agent.

7. The method for constructing a large-scale psychological counseling model based on case-based conceptual causal diagrams and a two-stage reinforcement learning approach of "exploration before intervention" as described in claim 1, is characterized in that... The ideal cognitive causal graph includes multiple structured hierarchical nodes, which represent the user's psychological elements, and the directed edges between the nodes represent causal or maintenance relationships.