Old-age mental large language model training method and device based on instruction synthesis
By acquiring psychological language data of the elderly and using expert rule sets for transformation and preprocessing, combined with preset instruction templates and a safe reward model for training, the problem of emotional misjudgment and safety issues in large language models among elderly users was solved, and efficient and safe training for psychological counseling of the elderly was achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TSINGHUA UNIVERSITY
- Filing Date
- 2026-01-29
- Publication Date
- 2026-06-19
AI Technical Summary
Existing large language models lack data on elderly psychological adjustment during training, making it difficult to understand the dialects and traditional cultural elements of elderly users, easily misjudging emotions, and lacking alignment with expert experience, thus failing to provide safe and targeted psychological counseling.
By acquiring psychological language data of the elderly, transforming and preprocessing it using expert rule sets, and training it with preset instruction templates and a safety reward model, we can achieve alignment with expert experience and improve safety.
It improves the ability to understand the expressions of the elderly, reduces misjudgment of emotions, ensures the safety and reliability of model output, and adapts to the special needs of elderly scenarios.
Smart Images

Figure CN122245294A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of large language model technology, and in particular to a training method and apparatus for an elderly psychological large language model based on instruction synthesis. Background Technology
[0002] In modern society, the number of elderly people living alone, empty-nest elderly, and those with chronic diseases continues to rise, and psychological problems such as anxiety, loneliness, and mood disorders are becoming increasingly complex and prevalent. The limited number of grassroots psychological service outlets and the uneven distribution of professional resources make it difficult to guarantee the accessibility and responsiveness of psychological counseling for the elderly. Medical institutions and community service centers urgently need a smart assistant that can be online for extended periods, understands the elderly's communication habits, and provides warm feedback to alleviate the supply-demand imbalance and improve the coverage and resilience of mental health services.
[0003] Mainstream large-scale models primarily use publicly available internet text as their training corpus, lacking dialogue data related to elderly psychological adjustment and companionship / comfort. This makes the models prone to misinterpreting emotions and ignoring potential risk signals. Elderly users often use dialects, slang, or ambiguous expressions and have a high reliance on traditional cultural elements, making it difficult for general-purpose models to maintain empathetic dialogue. Geriatric psychology involves interdisciplinary knowledge of aging physiology, cognitive decline, and family structure; existing models lack the ability to align with expert experience databases, leading to potentially biased and formulaic advice. Responding to sensitive emotions requires more rigorous safety instructions and review processes; general-purpose models typically only offer general safety strategies for the general public and cannot cover the specific risk points in elderly scenarios. Summary of the Invention
[0004] This invention provides a method and apparatus for training a large-scale language model of elderly psychology based on instruction synthesis. It addresses the shortcomings of existing technologies where mainstream large-scale model training corpora primarily consist of publicly available internet text, lacking dialogue data related to elderly psychological adjustment and companionship / comfort. This leads to models easily misjudging emotions and ignoring potential risk signals. Elderly users often use dialects, colloquialisms, or ambiguous expressions and have a high reliance on traditional cultural elements, making it difficult for general models to maintain empathetic dialogue. Furthermore, geriatric psychology involves interdisciplinary knowledge of aging physiology, cognitive decline, and family structure; existing models lack the ability to align with expert experience databases, resulting in suggestions that are potentially biased or formulaic. Responding to sensitive emotions requires more rigorous security instructions and review processes. General-purpose models typically only have general security strategies for the general public and cannot cover the specific risk points in elderly scenarios. The technical solution of this invention obtains elderly psychological language data instead of publicly available text on the Internet, which reduces misjudgment of emotions, improves the understanding of common expressions used by the elderly, and transforms the elderly psychological language data through expert rule sets, involving various cross-knowledge of the elderly. Combined with subsequent alignment training, it achieves alignment with expert experience. Data preprocessing improves the security and reliability of the model, and the synthesis of target instructions improves the accuracy of training data. Finally, it realizes safe and rapid training of a large-scale language model for elderly psychology in elderly scenarios.
[0005] This invention provides a training method for an elderly psychological large language model based on instruction synthesis, comprising the following steps.
[0006] Acquire elderly psychological language data, and transform the elderly psychological language data based on expert rule set to obtain rule-transformed data; The rule-transformed data is preprocessed to obtain the target elderly psychological data; Multiple target instructions are determined based on the target elderly psychological data and the preset instruction template; Based on the target elderly psychological data, a general large language model is pre-trained to obtain a pre-trained large language model; The pre-trained large language model is supervised fine-tuned based on all the target instructions to obtain a supervised fine-tuned large language model. The supervised fine-tuning large language model is aligned and trained based on the expert rule set, and a safety reward model is introduced to punish unsafe behaviors during the alignment training process, thus obtaining an elderly psychological large language model.
[0007] According to the present invention, a training method for an elderly psychological large language model based on instruction synthesis is provided, wherein the data preprocessing includes desensitization processing, corpus cleaning processing, and annotation processing; The process of preprocessing the rule-transformed data to obtain the target elderly psychological data includes: The desensitization process is performed on the rule-transformed data to obtain desensitized data; For each desensitized corpus in the desensitized data, the desensitized corpus is cleaned to obtain cleaned corpus; scoring prompts are constructed based on the cleaned corpus, and the scoring prompts are input into a first scoring model to obtain a cleaning score corresponding to the cleaned corpus output by the first scoring model; cleaned data is determined based on cleaned corpus whose cleaning score is greater than a first preset threshold; the first scoring model is a general large language model; The cleaned data is then labeled to obtain the target elderly psychological data.
[0008] According to the present invention, a method for training a large-scale psychological language model of the elderly based on instruction synthesis is provided, wherein the step of performing the desensitization processing on the rule-transformation data to obtain desensitized data includes: For each rule-transformation corpus in the rule-transformation data, the rule-transformation corpus is mapped using a feature mask set to obtain the de-identified corpus corresponding to the rule-transformation corpus; The desensitized data is determined based on all of the desensitized corpora.
[0009] According to the present invention, a method for training a large language model of elderly psychology based on instruction synthesis is provided, wherein the determination of multiple target instructions based on the target elderly psychological data and a preset instruction template includes: Multiple basic instructions are generated based on the target elderly psychological data and the preset instruction template; For each of the basic instructions, the basic instructions are input into the scorer to obtain the instruction score corresponding to the basic instruction output by the scorer; The basic instructions whose instruction scores are greater than the second preset threshold are identified as instructions to be expanded. All the instructions to be expanded are expanded to obtain the target instructions.
[0010] According to the present invention, a method for training a large language model of elderly psychology based on instruction synthesis is provided, wherein multiple basic instructions are generated based on the target elderly psychological data and the preset instruction template, including: The target elderly psychological data is input into the entity relation extraction big language model to obtain the target entities and target relations output by the entity relation extraction big language model; the entity relation extraction big language model is obtained by fine-tuning and training a general big language model based on the vocabulary of the elderly psychological domain; Construct an elderly psychology knowledge graph based on the target entities and the target relationships; The basic instructions are generated based on the elderly psychological knowledge graph and the preset instruction template.
[0011] According to the present invention, a training method for an elderly psychological large language model based on instruction synthesis is provided, wherein the elderly psychological corpus data includes multiple elderly psychological corpora; The process of expanding all the instructions to be expanded to obtain the target instructions includes: For each of the instructions to be expanded, the instruction to be expanded is input into the expansion model to obtain the expanded instruction corresponding to the instruction to be expanded output by the expansion model; the expansion model is obtained by training a general large language model based on high-confidence security instructions provided by experts. For each of the aforementioned expansion instructions, instruction rating prompts are constructed based on the expansion instructions and the corresponding elderly psychological corpus; the instruction rating prompts are input into a second rating model to obtain a consistency score output by the second rating model; the second rating model is a general large language model; The expansion instruction whose consistency score is greater than the third preset threshold is determined as the target instruction.
[0012] This invention also provides a training device for an elderly psychological large language model based on instruction synthesis, comprising the following modules: The acquisition module is used to acquire elderly psychological corpus data, and transform the elderly psychological corpus data based on expert rule sets to obtain rule-transformed data; The preprocessing module is used to preprocess the rule-transformed data to obtain the target elderly psychological data; The instruction module is used to determine multiple target instructions based on the target elderly psychological data and a preset instruction template; The pre-training module is used to pre-train the general large language model based on the target elderly psychological data to obtain the pre-trained large language model. The supervised fine-tuning module is used to perform supervised fine-tuning on the pre-trained large language model based on all the target instructions, so as to obtain a supervised fine-tuned large language model. The alignment training module is used to perform alignment training on the supervised fine-tuning large language model based on the expert rule set, and introduces a safety reward model to punish unsafe behaviors during the alignment training process, thereby obtaining an elderly psychological large language model.
[0013] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the instruction synthesis-based training method for a large language model of the elderly.
[0014] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the instruction synthesis-based training method for a large language model of the elderly's mind as described above.
[0015] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the instruction synthesis-based training method for a large language model of the elderly's psychology as described above.
[0016] This invention provides a method and apparatus for training a large-scale language model for elderly psychology based on instruction synthesis. The method involves acquiring elderly psychological corpus data, transforming the corpus data using an expert rule set to obtain rule-transformed data, preprocessing the rule-transformed data to obtain target elderly psychological data, determining multiple target instructions based on the target elderly psychological data and a preset instruction template, pre-training a general large-scale language model using the target elderly psychological data to obtain a pre-trained large-scale language model, supervising fine-tuning the pre-trained large-scale language model based on all target instructions, obtaining a supervised fine-tuned large-scale language model, and performing alignment training on the supervised fine-tuned large-scale language model based on the expert rule set, while introducing a safety reward model to penalize unsafe behaviors during the alignment training process, ultimately obtaining the large-scale language model for elderly psychology. This invention uses elderly psychological corpus data instead of publicly available internet text, reducing misjudgment of emotions and improving understanding of common expressions used by the elderly. The transformation of the corpus data using an expert rule set involves various cross-disciplinary knowledge of the elderly, and the subsequent alignment training achieves alignment with expert experience. Data preprocessing improves the safety and reliability of the model, and the synthesis of target instructions improves the accuracy of the training data. Ultimately, this invention achieves safe and rapid training of a large-scale language model for elderly psychology in elderly scenarios. Attached Figure Description
[0017] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0018] Figure 1 This is a flowchart illustrating the training method for the large language model of elderly psychology based on instruction synthesis provided by the present invention.
[0019] Figure 2 This is a schematic diagram of the structure of the training device for the large language model of elderly psychology based on instruction synthesis provided by the present invention.
[0020] Figure 3 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0021] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0022] To address the aforementioned problems in the prior art, this invention provides a method for training a large-scale language model of elderly psychology based on instruction synthesis. Figure 1 This is a flowchart illustrating the training method for an elderly psychological large language model based on instruction synthesis provided by the present invention, as shown below. Figure 1 As shown, the method includes the following steps 110 to 160.
[0023] Step 110: Obtain elderly psychological corpus data, and transform the elderly psychological corpus data based on the expert rule set to obtain rule-transformed data.
[0024] Specifically, it can acquire psychological data on the elderly, which can be collected from psychological adjustment and comfort data, expert knowledge bases, traditional cultural data and public policy guidelines, and can support multimodal input such as text, audio transcription and structured recording.
[0025] Furthermore, the corpus data on elderly psychology can be transformed based on expert rule sets to obtain rule-transformed data. The expert rule set can be pre-constructed by collecting experiential rules, dialogue strategies, and crisis intervention procedures from elderly psychology experts through structured questionnaires.
[0026] Step 120: Perform data preprocessing on the rule-transformed data to obtain the target elderly psychological data.
[0027] Specifically, data preprocessing can be performed on the rule-transformed data to obtain the target elderly psychological data. This data preprocessing can include desensitization, corpus cleaning, and annotation. Desensitization can be a hybrid process of reversible or irreversible desensitization; for example, token replacement and hash pseudo-identifiers can be used for sensitive fields such as name, address, and medical history to ensure that the sample cannot be traced back to a specific individual.
[0028] In one embodiment, the data preprocessing includes desensitization, corpus cleaning, and annotation. The process of preprocessing the rule-transformed data to obtain the target elderly psychological data includes: The desensitization process is performed on the rule-transformed data to obtain desensitized data; For each desensitized corpus in the desensitized data, the desensitized corpus is cleaned to obtain cleaned corpus; scoring prompts are constructed based on the cleaned corpus, and the scoring prompts are input into a first scoring model to obtain a cleaning score corresponding to the cleaned corpus output by the first scoring model; cleaned data is determined based on cleaned corpus whose cleaning score is greater than a first preset threshold; the first scoring model is a general large language model; The cleaned data is then labeled to obtain the target elderly psychological data.
[0029] Specifically, the rule-transformed data can be anonymized to obtain anonymized data. This anonymized data includes multiple anonymized corpora, which can then be cleaned to obtain cleaned corpora. This cleaning process can, for example, involve inputting the anonymized corpora into a general-purpose language model. The general-purpose language model detects and removes noisy statements, hate speech, and discriminatory content, and corrects spelling and punctuation of colloquial expressions to obtain cleaned corpora. Then, scoring prompts can be constructed based on the cleaned corpora. These prompts are input into a first scoring model to obtain a cleaning score corresponding to the cleaned corpora. After obtaining the cleaning scores, cleaned data can be determined based on cleaned corpora whose scores exceed a first preset threshold. This first preset threshold can be pre-set as needed.
[0030] Furthermore, the cleaned data can be labeled to obtain the target elderly psychological data. This labeling process involves adding emotion tags, role identities, dialogue stages, risk levels, and other information to each cleaned corpus. In other words, each corpus in the final target elderly psychological data corresponds to annotated information such as emotion tags, role identities, dialogue stages, and risk levels.
[0031] In the above embodiments, data preprocessing improves the security, data quality, and reliability of the target elderly psychological data, laying the foundation for model training.
[0032] In one embodiment, the desensitization process performed on the rule-transformed data to obtain desensitized data includes: For each rule-transformation corpus in the rule-transformation data, the rule-transformation corpus is mapped using a feature mask set to obtain the de-identified corpus corresponding to the rule-transformation corpus; The desensitized data is determined based on all of the desensitized corpora.
[0033] Specifically, for each rule-transformation corpus in the rule-transformation data, feature mask sets can be used. The rule-transformed corpus is mapped to obtain the corresponding desensitized corpus. This process can be represented by the following formula: in, Indicates the first A desensitized corpus, Indicates the first One rule transforms the corpus, This indicates a mapping.
[0034] Furthermore, all the desensitized corpora can constitute desensitized data.
[0035] In the above embodiments, the rule-transformed corpus is mapped by a feature mask set, thereby achieving desensitization processing of the rule-transformed corpus and protecting the privacy and security of the elderly.
[0036] Step 130: Determine multiple target instructions based on the target elderly psychological data and the preset instruction template.
[0037] Specifically, multiple target instructions can be determined based on target elderly psychological data and preset instruction templates. These target instructions are used to train the elderly psychological language model.
[0038] In one embodiment, determining multiple target instructions based on the target elderly psychological data and a preset instruction template includes: Multiple basic instructions are generated based on the target elderly psychological data and the preset instruction template; For each of the basic instructions, the basic instructions are input into the scorer to obtain the instruction score corresponding to the basic instruction output by the scorer; The basic instructions whose instruction scores are greater than the second preset threshold are identified as instructions to be expanded. All the instructions to be expanded are expanded to obtain the target instructions.
[0039] Specifically, multiple basic instructions can be generated based on target elderly psychological data and preset instruction templates. This process can be rule-driven or automatically generated based on a small instruction model. The preset instruction templates can include scenario templates such as companionship and comfort, emotion regulation, crisis intervention, and family communication, and include fields such as goals, roles, tone, constraints, and evaluation indicators. The generated basic instructions can also include chain-of-thought extensions of multi-turn dialogue examples, injecting historical turns, key signals, and expected response styles into the basic instructions.
[0040] Furthermore, a scorer can be pre-built, which can also be based on a general large language model. For each basic instruction, the basic instruction is input into the scorer, and the scorer outputs the instruction score corresponding to the basic instruction. The instruction score can include four dimensions: professionalism, safety, naturalness, and semantic consistency. For example, for the first instruction... Basic instructions The corresponding instruction score can be expressed as ,in, Indicates professionalism rating. Indicates the safety sub-score, Indicates the naturalness sub-score. This represents the semantic consistency sub-score. It can also determine the overall score for all basic instructions. ,in, The dimensions representing the rating This indicates the corresponding weight.
[0041] After obtaining the instruction score for each basic instruction, basic instructions with an instruction score greater than a second preset threshold are identified as instructions to be expanded. Basic instructions with an instruction score less than or equal to the second preset threshold can be reflowed for correction. The second preset threshold can be set in advance as needed; this embodiment of the invention does not impose specific limitations on it.
[0042] After obtaining multiple instructions to be expanded, all instructions can be expanded to obtain the target instructions.
[0043] In the above embodiments, the dynamic instruction extension framework can generate high-quality instructions in real time according to the scenario, and continuously optimize through a scoring system, significantly improving the generalization ability of the subsequently trained model in companionship, consultation and crisis scenarios.
[0044] In one embodiment, the generation of multiple basic instructions based on the target elderly psychological data and the preset instruction template includes: The target elderly psychological data is input into the entity relation extraction big language model to obtain the target entities and target relations output by the entity relation extraction big language model; the entity relation extraction big language model is obtained by fine-tuning and training a general big language model based on the vocabulary of the elderly psychological domain; Construct an elderly psychology knowledge graph based on the target entities and the target relationships; The basic instructions are generated based on the elderly psychological knowledge graph and the preset instruction template.
[0045] Specifically, besides directly generating multiple basic instructions based on the target elderly psychological data and preset instruction templates, one can first construct a knowledge graph based on the target elderly psychological data, and then combine it with the preset instruction templates to generate basic instructions. That is, the target elderly psychological data can be input into an entity relation extraction language model to obtain the target entities and target relations output by the entity relation extraction language model. Target entities may include entities such as emotional states, accompanying symptoms, intervention strategies, and family relationships, while target relations may include relationships such as "trigger factor-emotional response" and "symptom-suggestion." The entity relation extraction language model is obtained by fine-tuning and training a general language model based on a vocabulary for the elderly psychological domain.
[0046] Furthermore, the extracted target entities and relationships can be mapped to a graph database to form an aging psychology knowledge graph. After obtaining the aging psychology knowledge graph, basic instructions can be generated based on the aging psychology knowledge graph and preset instruction templates.
[0047] In the above embodiments, the basic instructions obtained by knowledge graph representation can further ensure that the model accurately understands the emotions, dialects and cultural cues of elderly users, reducing misunderstandings and awkward responses.
[0048] In one embodiment, the elderly psychological corpus data includes multiple elderly psychological corpora; The process of expanding all the instructions to be expanded to obtain the target instructions includes: For each of the instructions to be expanded, the instruction to be expanded is input into the expansion model to obtain the expanded instruction corresponding to the instruction to be expanded output by the expansion model; the expansion model is obtained by training a general large language model based on high-confidence security instructions provided by experts. For each of the aforementioned expansion instructions, instruction rating prompts are constructed based on the expansion instructions and the corresponding elderly psychological corpus; the instruction rating prompts are input into a second rating model to obtain a consistency score output by the second rating model; the second rating model is a general large language model; The expansion instruction whose consistency score is greater than the third preset threshold is determined as the target instruction.
[0049] Specifically, for each instruction to be expanded, it can be input into the expansion model to obtain the corresponding expanded instruction output by the model. The expanded instructions can cover sub-topics such as medication reminders, emergency crises, and privacy protection. The expansion model is trained on a general large language model based on high-confidence security instructions provided by experts.
[0050] Furthermore, for each expanded instruction, instruction scoring prompts can be constructed based on the expanded instruction and its corresponding elderly psychological corpus. These prompts aim to perform a consistency comparison between the expanded instruction and its corresponding elderly psychological corpus (raw data). The instruction scoring prompts are then input into a second scoring model to obtain the consistency score output by the second scoring model. The second scoring model is a general large language model.
[0051] After obtaining multiple consistency scores, the expanded instructions with consistency scores greater than a third preset threshold can be identified as target instructions. The third preset threshold is set in advance as needed, and this embodiment does not impose a specific limitation. Expanded instructions with consistency scores less than or equal to the third preset threshold can be revised. For example, multi-model cross-evaluation or formal verification tools can be introduced to automatically correct the consistency of unqualified expanded instructions.
[0052] In the above embodiments, instruction expansion and the consistency scoring self-check closed-loop process can quickly expand security policies and maintain consistency, and output more robust and compliant recommendations when facing sensitive topics.
[0053] Step 140: Based on the target elderly psychological data, pre-train the general large language model to obtain the pre-trained large language model.
[0054] Specifically, a general-purpose large language model can be used as the foundation. This general-purpose large language model can be, for example, Deep Search or Thousand Questions. Then, the general-purpose large language model can be pre-trained based on the target elderly psychological data to obtain a pre-trained large language model that is adapted to the elderly's language style and narrative rhythm.
[0055] Step 150: Perform supervised fine-tuning on the pre-trained large language model based on all the target instructions to obtain a supervised fine-tuned large language model.
[0056] Specifically, a supervised fine-tuning of the pre-trained large language model can be performed based on all target instructions to obtain a supervised fine-tuned large language model. The loss function in the supervised fine-tuning process... It can be expressed by the following formula: in, This represents the discrete time step in the text sequence generated by the large language model during supervised fine-tuning. This indicates that the pre-trained large language model to be optimized has parameters of Conditional probability distribution at time , Indicates at time step The first of the generated target answers Each token This indicates the target answer at the time step. The previously generated word sequence, This indicates the target instruction being input.
[0057] It is easy to understand that the focus of oversight and fine-tuning is to improve the ability to understand emotions, express reassurance, and reference cultural elements.
[0058] Step 160: Based on the expert rule set, perform alignment training on the supervised fine-tuning large language model, and introduce a safety reward model to punish unsafe behaviors during the alignment training process, thereby obtaining the elderly psychological large language model.
[0059] Specifically, a supervised fine-tuning large language model can be trained for alignment based on an expert rule set, and a safe reward model can be introduced to penalize unsafe behaviors during the alignment training process, thus obtaining a large language model of aging psychology. Unsafe behaviors can include inappropriate suggestions and indifferent responses. The comprehensive objective of the alignment training process is expressed as: in, This represents the parameters of the large-scale psychological language model for the elderly that need to be optimized. This indicates the distribution of psychological corpus among the elderly. Medium-sampled input sequence The mathematical expectation, This represents a security reward model. This indicates the degree of deviation between the control and the initial strategy. The KL (Kullback-Leibler Divergence) divergence measures the difference between two probability distributions. This indicates that the currently trained large-scale mental language model for the elderly is performing well under a given input. The probability distribution of generated answers is as follows. This indicates that the supervised fine-tuned large language model performs well on a given input. The probability distribution of the generated answers.
[0060] It should be noted that weight labels can be set during the training process, such as scene weight, dialect weight and risk weight, and sample probability can be calculated.
[0061] The obtained large language model of elderly psychology can be applied to the following scenarios: (1) Psychological companionship in communities or nursing homes: The model can provide continuous dialogue during the limited nighttime hours of on-duty staff, and push manual intervention reminders when negative emotions are detected based on the risk model. (2) Remote psychological counseling queue management: The online platform can use the model to conduct pre-interviews and conduct preliminary screening of the psychological level of visitors, thereby improving the efficiency of expert scheduling and resource allocation. (3) Family care counseling: The model provides communication suggestions for family members based on traditional cultural sayings and expert experience, helping to alleviate intergenerational communication barriers. (4) Policy education and compliance reminders: Combining policy guideline corpus, the model can automatically transform psychological service guidelines, medical insurance policies, and other content into popular expressions, thereby improving the publicity effect.
[0062] Optionally, after obtaining the large-scale language model of elderly psychology, it can be deployed on an inference service layer supporting heterogeneous computing power. This layer exposes capabilities such as emotion understanding, dialogue generation, and risk warning through an Application Programming Interface (API), and can incorporate interactive auxiliary functions such as conversation memory, speech rate adjustment, and dialect prompts. During the operation of the large-scale language model, dialogue logs and user feedback can be collected in real time, triggering quality monitoring and anomaly alarms. A compliant data backflow data governance layer supports subsequent incremental training. The inference service layer supports API calls and interpretable feedback, facilitating integration with various terminals such as communities, elderly care institutions, and online consultation platforms, thus improving deployment friendliness. When deployed in specific regions, it can be trained collaboratively with local medical systems through federated learning or privacy computing methods, avoiding the out-of-domain transmission of raw data. This invention forms a pipeline for data governance, instruction synthesis, and model training, further reducing the cost of manual intervention and shortening the training cycle.
[0063] This invention provides a training method for a large-scale language model of elderly psychology based on instruction synthesis. The method involves acquiring elderly psychological corpus data, transforming the corpus data using an expert rule set to obtain rule-transformed data, preprocessing the rule-transformed data to obtain target elderly psychological data, determining multiple target instructions based on the target elderly psychological data and a preset instruction template, pre-training a general large-scale language model using the target elderly psychological data to obtain a pre-trained large-scale language model, supervising fine-tuning the pre-trained large-scale language model based on all target instructions, obtaining a supervised fine-tuned large-scale language model, and performing alignment training on the supervised fine-tuned large-scale language model based on the expert rule set, while introducing a safety reward model to penalize unsafe behaviors during the alignment training process, ultimately obtaining the large-scale language model of elderly psychology. This invention obtains elderly psychological corpus data instead of publicly available internet text, reducing misjudgment of emotions and improving the understanding of commonly used expressions by the elderly. The transformation of the elderly psychological corpus data using an expert rule set involves various cross-disciplinary knowledge of the elderly, and the subsequent alignment training achieves alignment with expert experience. Data preprocessing improves the safety and reliability of the model, and the synthesis of target instructions improves the accuracy of the training data. Ultimately, this method achieves safe and rapid training of a large-scale language model of elderly psychology for elderly scenarios.
[0064] The following describes the instruction synthesis-based training device for the large language model of the elderly's psychology provided by the present invention. The instruction synthesis-based training device for the large language model of the elderly's psychology described below can be referred to in correspondence with the instruction synthesis-based training method for the large language model of the elderly's psychology described above.
[0065] Figure 2 This is a schematic diagram of the structure of the training device for the large language model of elderly psychology based on instruction synthesis provided by the present invention, as shown below. Figure 2 As shown, the instruction-synthesis-based training device 200 for a large-scale psychological language model of the elderly includes the following modules: The acquisition module 210 is used to acquire elderly psychological corpus data, and transform the elderly psychological corpus data based on the expert rule set to obtain rule-transformed data; Preprocessing module 220 is used to preprocess the rule-transformed data to obtain target elderly psychological data; Instruction module 230 is used to determine multiple target instructions based on the target elderly psychological data and a preset instruction template; The pre-training module 240 is used to pre-train the general large language model based on the target elderly psychological data to obtain the pre-trained large language model. The supervised fine-tuning module 250 is used to perform supervised fine-tuning on the pre-trained large language model based on all the target instructions to obtain a supervised fine-tuned large language model. The alignment training module 260 is used to perform alignment training on the supervised fine-tuning large language model based on the expert rule set, and introduce a safety reward model to punish unsafe behaviors during the alignment training process, so as to obtain an elderly psychological large language model.
[0066] In one embodiment, the data preprocessing includes desensitization, corpus cleaning, and annotation; the preprocessing module 220 is specifically used for: The desensitization process is performed on the rule-transformed data to obtain desensitized data; For each desensitized corpus in the desensitized data, the desensitized corpus is cleaned to obtain cleaned corpus; scoring prompts are constructed based on the cleaned corpus, and the scoring prompts are input into a first scoring model to obtain a cleaning score corresponding to the cleaned corpus output by the first scoring model; cleaned data is determined based on cleaned corpus whose cleaning score is greater than a first preset threshold; the first scoring model is a general large language model; The cleaned data is then labeled to obtain the target elderly psychological data.
[0067] In one embodiment, the preprocessing module 220 is further configured to: For each rule-transformation corpus in the rule-transformation data, the rule-transformation corpus is mapped using a feature mask set to obtain the de-identified corpus corresponding to the rule-transformation corpus; The desensitized data is determined based on all of the desensitized corpora.
[0068] In one embodiment, instruction module 230 is specifically used for: Multiple basic instructions are generated based on the target elderly psychological data and the preset instruction template; For each of the basic instructions, the basic instructions are input into the scorer to obtain the instruction score corresponding to the basic instruction output by the scorer; The basic instructions whose instruction scores are greater than the second preset threshold are identified as instructions to be expanded. All the instructions to be expanded are expanded to obtain the target instructions.
[0069] In one embodiment, the instruction module 230 is further configured to: The target elderly psychological data is input into the entity relation extraction big language model to obtain the target entities and target relations output by the entity relation extraction big language model; the entity relation extraction big language model is obtained by fine-tuning and training a general big language model based on the vocabulary of the elderly psychological domain; Construct an elderly psychology knowledge graph based on the target entities and the target relationships; The basic instructions are generated based on the elderly psychological knowledge graph and the preset instruction template.
[0070] In one embodiment, the elderly psychological corpus data includes multiple elderly psychological corpora; the instruction module 230 is further configured to: For each of the instructions to be expanded, the instruction to be expanded is input into the expansion model to obtain the expanded instruction corresponding to the instruction to be expanded output by the expansion model; the expansion model is obtained by training a general large language model based on high-confidence security instructions provided by experts. For each of the aforementioned expansion instructions, instruction rating prompts are constructed based on the expansion instructions and the corresponding elderly psychological corpus; the instruction rating prompts are input into a second rating model to obtain a consistency score output by the second rating model; the second rating model is a general large language model; The expansion instruction whose consistency score is greater than the third preset threshold is determined as the target instruction.
[0071] This invention provides a training device for a large-scale language model of elderly psychology based on instruction synthesis. The device acquires elderly psychological language data, transforms it based on an expert rule set to obtain rule-transformed data, preprocesses the rule-transformed data to obtain target elderly psychological data, determines multiple target instructions based on the target elderly psychological data and a preset instruction template, pre-trains a general large-scale language model based on the target elderly psychological data to obtain a pre-trained large-scale language model, performs supervised fine-tuning on the pre-trained large-scale language model based on all target instructions, and performs alignment training on the supervised fine-tuned large-scale language model based on the expert rule set, introducing a safety reward model to penalize unsafe behaviors during the alignment training process, thus obtaining the large-scale language model of elderly psychology. This invention uses elderly psychological language data instead of publicly available internet text, reducing misjudgment of emotions and improving the understanding of commonly used expressions by the elderly. The transformation of the elderly psychological language data through the expert rule set involves various cross-disciplinary knowledge of the elderly, and the subsequent alignment training achieves alignment with expert experience. Data preprocessing improves the safety and reliability of the model, and the synthesis of target instructions improves the accuracy of the training data. Ultimately, this invention achieves safe and rapid training of a large-scale language model of elderly psychology for elderly scenarios.
[0072] Figure 3 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 3As shown, the electronic device may include: a processor 310, a communications interface 320, a memory 330, and a communication bus 340, wherein the processor 310, the communications interface 320, and the memory 330 communicate with each other via the communication bus 340. The processor 310 can call logical instructions from the memory 330 to execute a training method for an elderly psychological large language model based on instruction synthesis. This method includes: Acquire elderly psychological language data, and transform the elderly psychological language data based on expert rule set to obtain rule-transformed data; The rule-transformed data is preprocessed to obtain the target elderly psychological data; Multiple target instructions are determined based on the target elderly psychological data and the preset instruction template; Based on the target elderly psychological data, a general large language model is pre-trained to obtain a pre-trained large language model; The pre-trained large language model is supervised fine-tuned based on all the target instructions to obtain a supervised fine-tuned large language model. The supervised fine-tuning large language model is aligned and trained based on the expert rule set, and a safety reward model is introduced to punish unsafe behaviors during the alignment training process, thus obtaining an elderly psychological large language model.
[0073] Furthermore, the logical instructions in the aforementioned memory 330 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0074] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program that can be stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, the computer is able to execute the instruction synthesis-based training method for a large language model of the elderly's psychology provided by the above methods, the method comprising: Acquire elderly psychological language data, and transform the elderly psychological language data based on expert rule set to obtain rule-transformed data; The rule-transformed data is preprocessed to obtain the target elderly psychological data; Multiple target instructions are determined based on the target elderly psychological data and the preset instruction template; Based on the target elderly psychological data, a general large language model is pre-trained to obtain a pre-trained large language model; The pre-trained large language model is supervised fine-tuned based on all the target instructions to obtain a supervised fine-tuned large language model. The supervised fine-tuning large language model is aligned and trained based on the expert rule set, and a safety reward model is introduced to punish unsafe behaviors during the alignment training process, thus obtaining an elderly psychological large language model.
[0075] In another aspect, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the instruction synthesis-based training method for a large language model of the elderly's mind provided by the above methods, the method comprising: Acquire elderly psychological language data, and transform the elderly psychological language data based on expert rule set to obtain rule-transformed data; The rule-transformed data is preprocessed to obtain the target elderly psychological data; Multiple target instructions are determined based on the target elderly psychological data and the preset instruction template; Based on the target elderly psychological data, a general large language model is pre-trained to obtain a pre-trained large language model; The pre-trained large language model is supervised fine-tuned based on all the target instructions to obtain a supervised fine-tuned large language model. The supervised fine-tuning large language model is aligned and trained based on the expert rule set, and a safety reward model is introduced to punish unsafe behaviors during the alignment training process, thus obtaining an elderly psychological large language model.
[0076] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0077] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0078] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A training method for an elderly psychological large language model based on instruction synthesis, characterized in that, include: Acquire elderly psychological language data, and transform the elderly psychological language data based on expert rule set to obtain rule-transformed data; The rule-transformed data is preprocessed to obtain the target elderly psychological data; Multiple target instructions are determined based on the target elderly psychological data and the preset instruction template; Based on the target elderly psychological data, a general large language model is pre-trained to obtain a pre-trained large language model; The pre-trained large language model is supervised fine-tuned based on all the target instructions to obtain a supervised fine-tuned large language model. The supervised fine-tuning large language model is aligned and trained based on the expert rule set, and a safety reward model is introduced to punish unsafe behaviors during the alignment training process, thus obtaining an elderly psychological large language model.
2. The training method for an elderly psychological large language model based on instruction synthesis according to claim 1, characterized in that, The data preprocessing includes desensitization, corpus cleaning, and annotation. The process of preprocessing the rule-transformed data to obtain the target elderly psychological data includes: The desensitization process is performed on the rule-transformed data to obtain desensitized data; For each desensitized corpus in the desensitized data, the desensitized corpus is cleaned to obtain cleaned corpus; scoring prompts are constructed based on the cleaned corpus, and the scoring prompts are input into a first scoring model to obtain a cleaning score corresponding to the cleaned corpus output by the first scoring model; cleaned data is determined based on cleaned corpus whose cleaning score is greater than a first preset threshold; the first scoring model is a general large language model; The cleaned data is then labeled to obtain the target elderly psychological data.
3. The method for training a large-scale psychological language model of the elderly based on instruction synthesis according to claim 2, characterized in that, The process of de-identifying the rule-transformed data to obtain de-identified data includes: For each rule-transformation corpus in the rule-transformation data, the rule-transformation corpus is mapped using a feature mask set to obtain the de-identified corpus corresponding to the rule-transformation corpus; The desensitized data is determined based on all of the desensitized corpora.
4. The method for training a large-scale psychological language model of the elderly based on instruction synthesis according to claim 1, characterized in that, The process involves determining multiple target instructions based on the target elderly psychological data and a preset instruction template, including: Multiple basic instructions are generated based on the target elderly psychological data and the preset instruction template; For each of the basic instructions, the basic instructions are input into the scorer to obtain the instruction score corresponding to the basic instruction output by the scorer; The basic instructions whose instruction scores are greater than the second preset threshold are identified as instructions to be expanded. All the instructions to be expanded are expanded to obtain the target instructions.
5. The method for training a large-scale psychological language model of the elderly based on instruction synthesis according to claim 4, characterized in that, The process generates multiple basic instructions based on the target elderly psychological data and the preset instruction template, including: The target elderly psychological data is input into the entity relation extraction big language model to obtain the target entities and target relations output by the entity relation extraction big language model; the entity relation extraction big language model is obtained by fine-tuning and training a general big language model based on the vocabulary of the elderly psychological domain; Construct an elderly psychology knowledge graph based on the target entities and the target relationships; The basic instructions are generated based on the elderly psychological knowledge graph and the preset instruction template.
6. The method for training a large-scale psychological language model of the elderly based on instruction synthesis according to claim 4, characterized in that, The elderly psychological corpus data includes multiple elderly psychological corpora; The process of expanding all the instructions to be expanded to obtain the target instructions includes: For each of the instructions to be expanded, the instruction to be expanded is input into the expansion model to obtain the expanded instruction corresponding to the instruction to be expanded output by the expansion model; the expansion model is obtained by training a general large language model based on high-confidence security instructions provided by experts. For each of the aforementioned expansion instructions, instruction rating prompts are constructed based on the expansion instructions and the corresponding elderly psychological corpus; the instruction rating prompts are input into a second rating model to obtain a consistency score output by the second rating model; the second rating model is a general large language model; The expansion instruction whose consistency score is greater than the third preset threshold is determined as the target instruction.
7. A training device for an elderly psychological large language model based on instruction synthesis, characterized in that, include: The acquisition module is used to acquire elderly psychological corpus data, and transform the elderly psychological corpus data based on expert rule sets to obtain rule-transformed data; The preprocessing module is used to preprocess the rule-transformed data to obtain the target elderly psychological data; The instruction module is used to determine multiple target instructions based on the target elderly psychological data and a preset instruction template; The pre-training module is used to pre-train the general large language model based on the target elderly psychological data to obtain the pre-trained large language model. The supervised fine-tuning module is used to perform supervised fine-tuning on the pre-trained large language model based on all the target instructions, so as to obtain a supervised fine-tuned large language model. The alignment training module is used to perform alignment training on the supervised fine-tuning large language model based on the expert rule set, and introduces a safety reward model to punish unsafe behaviors during the alignment training process, thereby obtaining an elderly psychological large language model.
8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the instruction synthesis-based training method for a large language model of the elderly's mind as described in any one of claims 1 to 6.
9. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the instruction synthesis-based training method for a large language model of the elderly's mind as described in any one of claims 1 to 6.
10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the instruction synthesis-based training method for a large language model of the elderly's mind as described in any one of claims 1 to 6.