Methods and devices for automatic generation and evaluation of psychological scales for a target population
By constructing an item knowledge base and generating a large model to create customized psychological scales, and combining them with multi-dimensional automatic evaluation, the problem of traditional scales lacking personalization is solved. This achieves efficient and low-cost scale generation and evaluation, and improves measurement accuracy and controllability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANDONG UNIV
- Filing Date
- 2026-04-29
- Publication Date
- 2026-06-30
AI Technical Summary
Traditional psychological scales lack the ability to be customized for specific groups and situations, resulting in insufficient measurement accuracy and validity. Furthermore, the generation process relies on human experience, which is costly and time-consuming.
We construct an item knowledge base, extract expert factors and psychological constructs using a large model, generate customized psychological scales, and introduce a multi-dimensional automatic evaluation system by combining group characteristics and contextual information to ensure that the scales meet the requirements of psychometrics.
It achieves personalized and professional scale generation, reduces costs and time investment, improves measurement accuracy and controllability, forms a closed-loop optimization mechanism, ensures that the scale matches the subject's situation, and meets the requirements of psychometrics.
Smart Images

Figure CN122117191B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of healthcare informatics, specifically relating to a method and apparatus for automatically generating and evaluating psychological scales for a subject population. Background Technology
[0002] Mental health issues are widespread globally, and their incidence and social impact continue to rise. Psychological scales, as important tools for mental health assessment and prevention, can enable early identification and tiered screening of individual risks, thereby reducing the long-term impact of mental disorders. However, traditional psychological testing scales are usually uniformly constructed by experts around specific psychological problems. Different test groups often use the same set of scales, with relatively generic item descriptions, emphasizing universality while neglecting individual differences. Furthermore, the occurrence of mental health problems is closely related to specific work and life situations, and the situations in which different groups are situated vary significantly. Due to the lack of customization capabilities for specific groups and situations, traditional scales struggle to accurately reflect the true experiences of test takers, thus affecting the accuracy of responses and the validity of the measurement. Therefore, it is necessary to design differentiated and diversified psychological testing scales based on the characteristics of the test taker groups to improve the relevance and effectiveness of the measurement.
[0003] The current development process for psychological scales heavily relies on the involvement of domain experts, resulting in long development cycles and high costs. With the advancement of artificial intelligence and natural language processing technologies, automated scale generation is gradually emerging as a promising alternative. The core challenges of automated psychological scale generation can be summarized into three issues: 1) How to ensure that the generated scale conforms to psychological measurement principles; 2) How to generate personalized scales for different subject groups; and 3) How to automatically evaluate the generated psychological scales.
[0004] Regarding the automatic generation of psychological scales, several patent documents have been published in the existing technical field. One approach is to summarize existing scale items into a question bank, and then generate new scales by extracting items from the question bank and rewriting the scenarios. Chinese patent document CN120613057A discloses an artificial intelligence-based method for generating and assisting in the assessment of psychological scales. This method automatically generates and optimizes psychological scales by constructing and updating a knowledge base for the psychological assessment industry and combining it with a large model, thereby assisting professionals in conducting psychological assessments and improving assessment efficiency and accuracy. Chinese patent document CN120878087A proposes a scenario-adaptive MBTI scale construction method. This method uses a large model to analyze the scenario information input by the user, generates a weight matrix for the four dimensions of MBTI, and uses this matrix to select candidate items from the question bank. An optimization algorithm is then used to determine the optimal combination of items, ultimately forming an MBTI psychological scale that matches a specific scenario. These methods mainly expand or reorganize existing scale resources, and their generation process essentially still relies on the reuse and rewriting of question bank items, lacking a contextualized and personalized generation mechanism for specific groups. Furthermore, the item extraction or combination generation methods focus on item-level optimization without constraining and validating the overall structure of the scale. This makes it difficult to ensure that the generated scale meets psychometric requirements in terms of factor structure and construct consistency, thus making it difficult to guarantee the structural validity of the scale.
[0005] Another approach focuses on subject characteristics to generate customized psychological scales that match individual traits. For example, Chinese patent document CN120356627A collects multimodal data from users, including text, voice, facial expressions, and drawings. It then uses machine learning algorithms and probabilistic graphical models to model users' psychological states and optimizes the calculation process through variational inference, outputting quantitative assessment results for each psychological dimension. Based on this, the scale generation module dynamically adjusts the item content, dimension weights, and option settings according to the analysis results, and optimizes the item combination by combining historical data to improve the scale's relevance and discrimination. However, while this type of method can generate personalized scales based on the subject's multidimensional characteristics, the quality verification of the generated results still mainly relies on the model's internal predictions or subsequent manual interpretation, lacking an independent and systematic automatic evaluation index system. In large-scale or continuous generation scenarios, relying on manual review or actual test feedback for quality verification is difficult to scale and cannot form a closed-loop optimization mechanism. Summary of the Invention
[0006] To address the aforementioned technical problems, this invention provides a method and apparatus for the automatic generation and evaluation of psychological scales for different subject groups. The method is based on expert-designed expert factors, utilizes a large model to perform semantic analysis and structural induction on existing psychological scales, extracts psychological constructs and construct dimensions, and forms a knowledge base of psychological scale items. Based on this knowledge base, and given the scale design requirements such as target psychological constructs and subject group characteristics, the invention combines the item knowledge base with the structural validity constraint mechanism of the expert-designed psychological scales to achieve the generation of customized psychological scales for different subject groups. Furthermore, it proposes a multi-dimensional automatic evaluation system for psychological scales that meets psychometric requirements, covering multiple different dimensions and providing feedback information for scale content optimization.
[0007] To achieve the above objectives, the present invention adopts the following technical solution:
[0008] The method for automatically generating and evaluating psychological scales for a given group of test subjects includes the following steps:
[0009] S1. Constructing an entry knowledge base:
[0010] S11. For expert-designed psychological scales, the large model automatically extracts construct dimensions based on the expert factors and psychological constructs that designed the psychological scale, and generates a set of construct dimensions.
[0011] S12. Based on the generated set of construct dimensions, construct dimension assignment prompts are built to enable the large model to automatically label the construct dimension to which each scale item in the expert-designed psychological scale belongs. At the same time, scoring bias is introduced. Based on the scoring tendency judgment prompts, the large model analysis guides the scoring tendency of each scale item in the psychological scale designed by experts, forming a scale item... Its psychological construct Construct Dimension and its scoring tendency The tuples formed Then, the tuples are stored in the entry knowledge base to complete the construction of the entry knowledge base;
[0012] S2. Based on the mental constructs of the expected scale items, and building upon the item knowledge base constructed in step S1, combine group attributes... and entry format attributes Design customized item generation prompts to guide the large model in generating customized psychological scales;
[0013] S3. Based on psychometric theory and scale design principles, evaluate the psychological scale generated in step S2.
[0014] Preferably, the large model mentioned in step S11 is any one of the GPT series model, Qwen series model, and DeepSeek series model.
[0015] Preferably, in step S11, a mental construct induction prompt is designed, and the construct dimension is extracted using a large model based on the designed mental construct induction prompt and expert factors.
[0016] More preferably, the psychological construct induction prompt consists of an induction requirement module and an induction example module. The induction requirement module includes the scale name, psychological construct, test purpose, expert factors, and scale items of the scale to be inducted; the induction example module includes the inducted psychological constructs and their corresponding construct dimension examples.
[0017] In step S11, by inputting psychological construct induction prompts into the large model, the large model is guided to jointly analyze the scale items of the expert-designed psychological scale, including the scale name, test purpose, scale information of psychological constructs, and corresponding expert factors; and based on the semantic content of the scale items, the potential correlation between scale items, and the psychological meaning of expert factors, the construct dimensions representing different aspects of psychological constructs are further summarized.
[0018] Preferably, the construct dimension allocation prompt in step S12 includes allocation instructions, allocation examples, and output formats.
[0019] More preferably, step S12, based on the generated set of construct dimensions, constructs construct dimension assignment prompts so that the large model automatically labels the construct dimension to which each scale item in the expert-designed psychological scale belongs, specifically as follows:
[0020] Based on the provided psychological scale items, the corresponding psychological constructs, and the set of construct dimensions extracted from the psychological constructs, construct dimension allocation prompts are constructed. The construct dimension allocation prompts are input into the large model, which is guided to judge the correspondence between each scale item and the construct dimension according to the allocation instructions, allocation examples, and preset output format of the construct dimension allocation prompts, thereby completing the construct dimension allocation of the scale items.
[0021] Preferably, the scoring bias described in step S12 It includes three types: positive scoring, negative scoring, and neutral scoring;
[0022] Positive scoring refers to a positive correlation between the quantitative score of the psychological construct corresponding to the scale item and the psychological construct itself; that is, the higher the quantitative score of the subject on the scale item, the higher the level of the corresponding psychological characteristic. Negative scoring refers to a negative correlation between the quantitative score of the psychological construct corresponding to the scale item and the psychological construct itself; that is, the higher the quantitative score of the subject on the scale item, the lower the level of the corresponding psychological characteristic. In this case, reverse scoring is usually required during scale calculation. Neutral scoring refers to a situation where the quantitative score of the psychological construct corresponding to the scale item does not directly reflect the level of the psychological construct, or there is no clear positive or negative correlation between it and the psychological construct. It is mainly used for auxiliary description or situation guidance.
[0023] The scoring tendency judgment prompt, which guides the scoring tendency of each item in the psychological scale designed by the large model analysis experts, specifically refers to:
[0024] A scoring bias judgment prompt is constructed, taking mental constructs and corresponding scale items as inputs, and defining the judgment rules for scoring bias, wherein: when the semantic content of the scale item can reflect that the subject has the mental construct, it is judged as positive scoring; when the semantic content of the scale item can reflect that the subject does not have the mental construct, it is judged as negative scoring; when the semantic content of the scale item cannot clearly reflect the level of the mental construct, it is judged as neutral scoring.
[0025] The scoring tendency judgment prompts are input into the large model, which is then guided to infer the scoring tendency of the scale items based on the relationship between the semantic content of the scale items and the mental constructs, and output the corresponding scoring tendency results in a preset structured format.
[0026] Preferably, in step S2, based on the mental constructs of the expected scale items, the scale items in the item knowledge base are retrieved, the corresponding scale items are obtained, and their construct dimensions are acquired. And scoring bias, combined with group attributes and entry format attributes The system designs customized item generation prompts and inputs them into the large model, guiding the large model to generate a customized psychological scale under the premise of satisfying the psychological construct.
[0027] Further preferred, the group attribute described in step S2 This includes information such as subject type, age group, occupational background, and related life or work scenarios; the item format attributes. This includes the person form of the scale items, the way the scale items are expressed, and the scoring tendency ratio.
[0028] More preferably, the customized item generation prompt includes dynamically modifying variables and scale generation instructions. The dynamically modifying variables include the description and dynamic selection of scoring tendencies, and the scale generation instructions include group settings, scenario settings, statement format settings, and output constraints. By inputting the customized item generation prompt into the large model, the large model, under the constraints of constructs and construct dimensions, combines the characteristics of the target subject group and its corresponding scenario to replace, recombine, and rewrite the original items retrieved from the item knowledge base, which consist of the constructs of the expected scale items and the retrieved corresponding scale items, construct dimensions, and scoring tendencies, with situational events containing group attributes and item format attributes. Based on the set statement format and scoring tendency requirements, new scale items are generated, thereby forming a customized psychological scale that matches the target group and meets the requirements of psychometrics.
[0029] Preferably, in step S3, the evaluation indicators include: psychological professionalism, subject group adaptability, item diversity, reliability and validity.
[0030] More preferably, the psychological professionalism indicators include the professionalism of psychological constructs. This is used to assess whether the scale items focus on constructs; and the specialization of the construct dimension. The construct dimension used to evaluate whether an item corresponds to an expert factor is as follows:
[0031] Specialization of psychological constructs The score range is [0,1]. The closer to 1, the higher the level of specialization of the psychological constructs of the scale, and vice versa.
[0032] The formula for calculating the professionalization index of mental constructs is as follows:
[0033] (1)
[0034] in, This indicates a psychological scale designed by experts. The psychological construct, For each item in the generated customized psychological scale... Validation using a large model Does it belong to The scope, formally represented as ; It is an indicator function; if the condition is true, it equals 1; otherwise, it equals 0.
[0035] Construct Dimension Specialization The score range is [0,1]. The closer to 1, the higher the level of specialization of the scale's construct dimensions, and vice versa.
[0036] The specialized calculation formula for the concept dimension is as follows:
[0037] (2)
[0038] Among them, the function This represents the construct dimension of newly generated customized scale items extracted using a large model. yes The corresponding items in the expert-designed psychological scale; Indicates an entry The expert factor; Whether the construct dimensions of the newly generated items and the expert factors in the corresponding expert design scale items are present in the scale. The CCP appears; if the Communist Party appears, then... Conversely, it equals 0;
[0039] The formula for calculating the subject population fitness index is as follows:
[0040] (3)
[0041] in, It comes from The attributes, including rating trends, participant population, scenario setting, and item format, are used for each generated customized psychological scale item. Use a large model to determine whether the generated scale items conform to the specified attributes. Formal representation as ;
[0042] The item diversity index is mainly divided into the diversity among items within the psychological scale and the diversity between the generated customized psychological scale and the expert-designed psychological scale. The Self-BLEU diversity assessment index is used to measure the generated customized psychological scale. The internal diversity in [the data] is calculated using the following formula:
[0043] (4)
[0044] Where the function The function representing the BLEU score is used to measure the items of a newly generated, customized psychological scale. and entries The similarity between them This refers to the number of items in the customized psychological scale S';
[0045] Interscale diversity index The calculation formula is as follows:
[0046] (5)
[0047] Using Clonbach coefficient (Cronbach's) The reliability of the generated customized psychological scales was evaluated.
[0048] The Kaiser-Meyer-Olkin (KMO) method was used to assess the construct validity of the generated customized psychological scales. The Pearson correlation coefficient was used to evaluate the consistency between the generated customized psychological scales and the expert-designed psychological scales in terms of construct validity.
[0049] The present invention also provides an apparatus for implementing the above method, the apparatus comprising:
[0050] The item knowledge base construction module is used to summarize and analyze existing expert-designed psychological scales, extract scale items, psychological constructs, and construct dimensions, and introduce scoring tendencies to build an item knowledge base to guide the scale generation process.
[0051] A module for generating customized psychological scales for a target group of subjects is used to incorporate subject characteristics and context into the generation process, so that scale items can be matched with the actual life scenarios of the target group and automatically generate customized psychological scales.
[0052] The multidimensional psychological scale automatic evaluation module is used to systematically evaluate the generated customized psychological scale from multiple dimensions such as psychological professionalism, subject group adaptability, item diversity, and reliability and validity, providing signals for scale optimization.
[0053] Compared with the prior art, the present invention has the following beneficial effects:
[0054] (1) This invention forms a complete technical process for the generation and evaluation of psychological scales by three steps: constructing an item feature knowledge base, realizing the generation of customized scales for the subject group, and designing a multi-dimensional automatic evaluation system. This realizes the transformation of psychological scales from manual design to automated and controllable generation.
[0055] (2) This invention utilizes a large model to systematically analyze existing psychological scales and, in conjunction with scale design dimensions summarized by experts, extracts and structures the psychological expertise required for scale construction, thereby building an item feature knowledge base. This transforms the implicit expert experience scattered in existing scales into reusable structured knowledge, avoiding repeated reliance on manual experience for scale design. As a result, while ensuring professionalism, it significantly reduces development costs and time investment, and improves the controllability and scalability of the scale generation process.
[0056] (3) The method for automatically generating customized psychological scales for the target group proposed in this invention introduces subject characteristics and situational information as generation constraints, enabling scale items to match the actual life scenarios of the target group. This method can improve the subject's sense of situational engagement, conform to the objective law that psychological problems are closely related to individual situations, reduce the comprehension bias caused by abstract expressions in conventional scales, and improve the accuracy of subject responses and the authenticity of measurement results. At the same time, the introduction of psychometric structural constraints (psychological constructs and construct dimensions of expert-designed scales) in the generation process can ensure the consistency of scale in construct expression and factor structure, thereby taking into account both personalized needs and psychological structural validity.
[0057] (4) This invention discloses an automated evaluation system for multidimensional psychological scales that meets the requirements of psychometrics, and can comprehensively evaluate the quality of the generated scales. The multidimensional psychological scale evaluation system designed in this invention systematically evaluates the generated scales from multiple dimensions such as psychological professionalism, subject group adaptability, item diversity, and reliability and validity. It can transform the evaluation process that originally relied on manual review or actual administration into a calculable automated evaluation, which not only improves the quality control capability of large-scale scale generation, but also serves as a feedback signal to optimize the generation process, forming a closed-loop mechanism of generation, evaluation, and optimization, thereby continuously improving the quality and stability of scale generation. Attached Figure Description
[0058] Figure 1 This is the framework diagram for automatically generating customized psychological scales for the subject population as described in the method of this invention. Detailed Implementation
[0059] Explanation of technical terms
[0060] Psychological scales are measurement tools based on psychometric theory. They consist of a set of structured items and are used to quantitatively assess an individual's psychological characteristics, mental state, or behavioral tendencies.
[0061] Scale items: These are the basic measurement units that make up a psychological scale. They are usually presented in the form of statements or questions and are used to guide subjects to self-report on their own psychological state, behavioral tendencies, or emotional experiences.
[0062] Psychological constructs refer to the core measurement targets of a psychological scale, namely, the psychological traits, psychological states, or specific psychological problems that need to be quantified and assessed. For example, the psychological construct of the UCLA Loneliness Scale is "loneliness".
[0063] Expert factors: These refer to potential psychological dimensions or influencing factors identified by domain experts during the scale construction process based on theoretical analysis and empirical induction.
[0064] Construct dimensions: These refer to key psychological dimensions obtained by further refining the expert factors. They are used to break down or reconstruct the underlying constructs described by the expert factors, making the larger model easier to understand during its generation. In this invention, they are obtained by the larger model through induction and extraction based on existing expert-designed scales. For example, for the UCLA Loneliness Scale, its construct dimensions include "suicidal tendencies," "sadness or emptiness," and "loss of interest," among others.
[0065] The present invention will now be described in detail with reference to the embodiments and accompanying drawings. Obviously, the present invention can be implemented in many forms and is not limited to the embodiments described.
[0066] Example 1
[0067] Methods for the automatic generation and evaluation of customized psychological scales for specific participant groups, such as... Figure 1 As shown, the method includes:
[0068] The method for constructing an item knowledge base, used to improve the construct consistency and factor structure rationality of the generated psychological scale items, is as follows:
[0069] S1. Constructing an entry knowledge base:
[0070] S11. For expert-designed psychological scales, the large model automatically extracts construct dimensions based on the expert factors and psychological constructs that designed the psychological scale, and generates a set of construct dimensions.
[0071] Given that the large-scale model has learned rich psychological knowledge from a vast pre-training corpus, we employ it to simulate the role of a psychology expert, in order to analyze the latent dimensions of constructs in expert-designed psychological scales. Each expert-designed psychological scale contains constructs. This is used to explain the specific purpose and intent of the scale design. For example, regarding Russell DW's article "UCLA Loneliness Scale (Version 3): Reliability, validity, and factor structure" published in the Journal of Personality Assessment, Volume 66, Issue 1, pp. 20-40 in 1996, the UCLA Loneliness Scale tests the construct of "loneliness," and its purpose is "this scale is a self-report scale, primarily evaluating loneliness arising from the gap between the desire for social interaction and the actual level of such interaction." Experts use factor analysis to analyze data from a large number of subjects, utilizing item correlations to reveal the relationships between several aspects of the scale, forming expert factors. Inspired by this process, we refine expert factors into more granular construct dimensions. For example, for the psychological construct "depression," its expert factors are "core depression," "cognitive type," and "somatic symptoms." These expert factors can be further subdivided into multiple construct dimensions, such as "suicidal tendencies," "sadness or emptiness," and "loss of interest," in order to assess the depressive symptoms and their severity from different perspectives.
[0072] The large model is a pre-trained generative language model with contextual understanding, semantic reasoning, and structured information extraction capabilities, including the GPT series models, the Qwen series models, and the DeepSeek series models.
[0073] Specifically, this invention uses designed mental constructs to summarize prompts and extracts construct dimensions using a large model based on expert factors.
[0074] This construct induction prompt consists of an induction requirement module and an induction example module. The induction requirement module includes the scale name, construct, test purpose, expert factors, and scale items of the scale to be induced; while the induction example module displays the induced constructs and their corresponding construct dimension examples, and clearly defines the output format.
[0075] This step guides the large model to perform a joint analysis of the scale items of the expert-designed psychological scale, including the scale name, test purpose, scale information of the psychological construct, and corresponding expert factors, by inputting psychological construct induction prompts into the large model. Based on the semantic content of the scale items, the potential relationships between scale items, and the psychological meaning of the expert factors, it further summarizes the construct dimensions representing different aspects of the psychological construct. An example of a psychological construct induction prompt design is shown below:
[0076] Table 1. Psychological Construct Induction Suggestion Design Table
[0077]
[0078] S12. Based on the generated set of construct dimensions, construct dimension assignment prompts are built to enable the large model to automatically label the construct dimension to which each scale item in the expert-designed psychological scale belongs. At the same time, scoring bias is introduced. It is used to characterize the directional relationship between scale item scores and target psychological constructs. Based on scoring tendency judgment prompts, it guides the scoring tendency of each scale item in the psychological scale designed by large-scale model analysis experts, forming a scale item... Its psychological construct Construct Dimension and its scoring tendency The tuples formed Then, the tuples are stored in the entry knowledge base to complete the construction of the entry knowledge base;
[0079] To specifically map the extracted construct dimensions to the scale items in existing expert-designed psychological scales, we adopted the inductively derived set of construct dimensions, constructed construct dimension allocation prompts, and used a large model to automatically label the construct dimensions to which the scale items belong.
[0080] The construct dimension assignment hints include three parts: assignment instructions, assignment examples, and output format.
[0081] Based on the generated set of construct dimensions, construct dimension assignment prompts are constructed to enable the large model to automatically label the construct dimension to which each scale item in the expert-designed psychological scale belongs. Specifically:
[0082] Based on the provided psychological scale items, the corresponding psychological constructs, and the set of construct dimensions extracted from the psychological constructs, construct dimension allocation prompts are constructed. The construct dimension allocation prompts are input into the large model, which is guided to judge the correspondence between each scale item and the construct dimension according to the allocation instructions, allocation examples, and preset output format of the construct dimension allocation prompts, thereby completing the construct dimension allocation of the scale items.
[0083] Specifically as follows:
[0084] Table 2 Construct Dimension Allocation Hints
[0085]
[0086] In expert-designed psychological scales, the subjects' responses to each scale item are mapped to numerical values, and quantitative scores for the corresponding psychological constructs are generated through accumulation or weighted calculation. The directionality of the changes in the obtained quantitative scores with item responses is defined as the scoring tendency of the item. ;
[0087] The scoring tendency It includes three types: positive scoring, negative scoring, and neutral scoring;
[0088] Positive scoring refers to a positive correlation between the quantitative score of the psychological construct corresponding to the scale item and the psychological construct itself; that is, the higher the quantitative score of the subject on the scale item, the higher the level of the corresponding psychological characteristic. Negative scoring refers to a negative correlation between the quantitative score of the psychological construct corresponding to the scale item and the psychological construct itself; that is, the higher the quantitative score of the subject on the scale item, the lower the level of the corresponding psychological characteristic. In this case, reverse scoring is usually required during scale calculation. Neutral scoring refers to a situation where the quantitative score of the psychological construct corresponding to the scale item does not directly reflect the level of the psychological construct, or there is no clear positive or negative correlation between it and the psychological construct. It is mainly used for auxiliary description or situation guidance.
[0089] The scoring tendency judgment prompt, which guides the scoring tendency of each item in the psychological scale designed by the large model analysis experts, specifically refers to:
[0090] A scoring bias assessment mechanism is constructed, using constructs and corresponding scale items as inputs. Rules for determining scoring bias are defined as follows: a scale item is considered positively scored when its semantic content reflects the subject's possession of the construct; a scale item is considered negatively scored when its semantic content reflects the subject's lack of the construct; and a scale item is considered neutrally scored when its semantic content does not clearly reflect the level of the construct. For example, when measuring the construct of "confidence level," "I can confidently cope with any difficulty" and "I often lack confidence when coping with difficulties" are two different scale items. If the subject believes they match the former description, it indicates high confidence, leading to a positive score for the construct, and thus a positive scoring bias for that item. Conversely, if the subject believes they match the latter description, it indicates low confidence, leading to a negative score, and thus a negative scoring bias for that item. When analyzing the scale items, we found that some scales had rather special options, which made it impossible to score the psychological constructs based on the content of the items. We classified these as neutral.
[0091] Secondly, the scoring tendency judgment prompts are input into the large model, guiding it to infer the scoring tendency of the scale items based on the relationship between the semantic content of the scale items and the mental constructs, and output the corresponding scoring tendency results according to the preset structured format.
[0092] Ultimately, each entry Its psychological construct Construct Dimension and its scoring tendency tuples Add it to the entry knowledge base.
[0093] The aforementioned knowledge base construction method, centered on target psychological constructs, can perform structured modeling of scale items within a psychometric framework, ensuring consistency between construct expression and factor structure. This provides large-scale models with example items that match the psychological constructs, guiding the scale generation process to follow psychometric principles and improving the professionalism and construct validity of the generated scales.
[0094] This invention also includes PySIG, a method for automatically generating scales that incorporates a knowledge base and control information, to improve the controllability, population fit, and construct validity of scale generation, as detailed below:
[0095] S2. Based on the mental constructs of the expected scale items, and building upon the item knowledge base constructed in step S1, combine group attributes... and entry format attributes Design customized item generation prompts to guide the large model in generating customized psychological scales;
[0096] The group attributes This refers to information used to describe the characteristics of the target subject group and their context, used to constrain the scale items used to generate customized psychological scales to match the actual living or working environment of the target group. The group attributes... This includes information such as subject type, age group, occupational background, and related life or work scenarios; the item format attributes. It refers to the constraint information used to control the expression form and structural characteristics of scale items in the generation of customized psychological scales, which is used to improve the diversity and standardization of scale items in the generation of customized psychological scales, including the person form of scale items, the expression mode of scale items, and the scoring tendency ratio.
[0097] To adapt the scale to the characteristics of the test subjects, the items should reflect their living and working environment while preserving the original scale's potential construct validity. Therefore, we introduced group attributes. These settings can be described. For example, users can specify scales to assess the level of interpersonal trust among older adults and provide scenario settings—such as "interacting with children, interacting with neighbors, or living alone at home"—to assist the large model in making "associations." Scale items can be generated to assess the psychological state of a specific age group in a specific environment, or to assess the professional psychological characteristics of a particular occupational group.
[0098] Furthermore, to enhance the diversity and controllability of the scale items, this invention further defines a series of item format attributes. It is used to constrain the item generation process, including factors such as the person of the item, the way it is presented, and the positive and negative scoring ratios.
[0099] The item knowledge base provides the necessary psychological knowledge as contextual information for generating the scale. Based on the expected psychological constructs for generating scale items, the scale items in the item knowledge base are retrieved, and their construct dimensions are obtained. And scoring bias, combined with group attributes and entry format attributes The system designs customized item generation prompts and inputs them into the large model, guiding the large model to generate a customized psychological scale under the premise of satisfying the psychological construct.
[0100] A diverse range of scoring items helps prevent malicious responses and the concealment of opinions by test takers. Introducing more negatively scored items can improve the effectiveness of the test. Therefore, before generating a customized psychological scale, it is necessary to set the ratio of positively and negatively scored items in the generated scale.
[0101] The customized item generation prompt consists of two parts: dynamic variable modification and scale generation instructions. The dynamic variable modification includes the description and dynamic selection of item scoring tendencies to ensure that the number of positive and negative scale items in the generated customized psychological scale meets the set positive and negative tendency ratio. The scale generation instructions include group settings, scenario settings, statement format settings, and output constraints. By inputting the customized item generation prompt into the large model, the large model, under the constraints of psychological constructs and construct dimensions, combines the characteristics of the target subject group and its corresponding scenario. It retrieves original items from the item knowledge base, which consist of the psychological constructs of the expected scale items and the retrieved corresponding scale items, construct dimensions, and scoring tendencies, and replaces, reorganizes, and rewrites them with situational events containing group attributes and item format attributes. Based on the set statement format and scoring tendency requirements, it generates new scale items, thereby forming a customized psychological scale that matches the target group and meets psychometric requirements.
[0102] Specifically as follows:
[0103] Table 3 Customized Item Generation Prompt Table
[0104]
[0105] This invention also includes a multidimensional evaluation system for psychological scales, which automatically evaluates the quality of the generated scales from multiple dimensions to support iterative updates, as detailed below:
[0106] S3. Based on psychometric theory and scale design principles, evaluate the psychological scale generated in step S2.
[0107] The focus of scale evaluation is on two aspects: (1) whether the generated items meet the requirements of psychological experts; and (2) whether the generated scale is suitable for the characteristics of the subject group. Based on psychometric theory and scale design principles, this invention proposes a systematic scale evaluation index system to evaluate the scale from four aspects: psychological professionalism, subject group adaptability, diversity, reliability and validity.
[0108] Psychological professionalism is used to evaluate the reasonableness of the generated psychological scales under the principles of psychometrics, including whether they conform to psychological constructs and construct dimensions;
[0109] Psychological professionalism indicators include the professionalization of psychological constructs. This is used to assess whether an item focuses on mental constructs and the specialization of the construct dimension. This is used to assess whether the construct dimension of an entry corresponds to an expert factor.
[0110] Specialization of psychological constructs The score range is [0,1]. The closer to 1, the higher the level of specialization of the psychological constructs of the scale, and vice versa.
[0111] The formula for calculating the professionalization index of mental constructs is as follows:
[0112] (1)
[0113] in, This indicates a psychological scale designed by experts. The psychological construct, For each item in the generated customized psychological scale... Validation using a large model Does it belong to The scope, formally represented as ; Calculate the proportion of items in the generated scale that conform to the mental construct. It is an indicator function; if the condition is true, it equals 1; otherwise, it equals 0.
[0114] Construct Dimension Specialization The score range is [0,1]. The closer to 1, the higher the level of specialization of the scale's construct dimensions, and vice versa.
[0115] The specialized calculation formula for the concept dimension is as follows:
[0116] (2)
[0117] Among them, the function This represents the construct dimension of newly generated customized scale items extracted using a large model. yes The corresponding items in the expert-designed psychological scale; Indicates an entry The expert factor; Whether the construct dimensions of the newly generated items and the expert factors in the corresponding expert design scale items are present in the scale. The CCP appears; if the Communist Party appears, then... Conversely, it equals 0;
[0118] Subject group adaptability is used to evaluate the degree of matching between the generated scale and the characteristics and life situations of the target subject group, reflecting the applicability and contextual fit of the scale in different groups;
[0119] The formula for calculating the subject population fitness index is as follows:
[0120] (3)
[0121] in, It comes from The attributes, including rating trends, participant population, scenario setting, and item format, are used for each generated customized psychological scale item. Use a large model to determine whether the generated scale items conform to the specified attributes. Formal representation as ;
[0122] Item diversity is used to evaluate the differences in the expression and semantic content of scale items, and to avoid item repetition or monotonous expression;
[0123] Diversity indicators are mainly divided into diversity among items within the scale and diversity between the generated scale and the original scale. We use the diversity assessment indicator Self-BLEU to measure the generated scale. Internal diversity. The formula for calculating the internal diversity index is as follows:
[0124] (4)
[0125] Where the function The function representing the BLEU indicator is used to measure newly generated, customized psychological scale items. and entries The similarity between them This refers to the number of items in the customized psychological scale S'.
[0126] This formula is used to measure the diversity among newly generated scale items. The higher the internal diversity, the stronger the differences in semantic expression and measurement perspective among the items in the newly generated scale, thus reducing content duplication.
[0127] To generate the difference between the scale items and the original scale items, we performed a comparison for each pair of items. and Calculate the BLEU score. Higher... Scores indicate that the higher the degree of difference between the newly generated scale and the original scale, the better the diversity. (Inter-scale diversity index) The calculation formula is as follows:
[0128] (5)
[0129] Reliability and validity are used to evaluate the statistical stability and measurement accuracy of the generated psychological scale, including the consistency of results, the validity of construct measurement, and the performance of construct validity.
[0130] Traditional reliability and validity analysis methods rely on human subjects completing scales, which typically requires significant human resources and is ill-suited for large-scale, automated scale generation applications. To address this issue, this paper attempts to use a large-scale model to simulate human subjects completing the scale, thereby achieving reliability and validity analysis. Reliability primarily measures the scale's measurement precision, stability, and internal consistency; this paper employs Cronbach's alpha model. coefficient (Cronbach's) An evaluation will be conducted.
[0131] Construct validity reflects the degree of fit between measurement items and latent variables. In this embodiment, it is measured using the Kaiser-Meyer-Olkin (KMO) method. Furthermore, given the one-to-one correspondence between the automatically generated customized psychological scale items and the expert-designed psychological scale items, this embodiment uses a large model to evaluate the consistency between the generated scale and the expert-designed psychological scale in terms of construct validity.
[0132] This embodiment generates 19 customized test scales for student groups, middle-aged working groups, and elderly groups. These include 6 mental health scales, 2 behavioral test scales, and 11 cognitive attitude test scales. The scale types, titles, and abbreviations are as follows.
[0133] Table 4 Customized Scale
[0134]
[0135] To ensure the quality of the generated scales, this invention uses qwen-max as the large-scale testing model, completing the induction of construct dimensions, allocation of construct dimensions, and generation of customized items within the framework. All scales are based on the specialization of psychological constructs (…). ) and specialization of the concept dimension ( The scores of all indicators exceeded 0.8, indicating that the generated scale items had strong psychological professionalism.
[0136] Regarding the adaptability to the test group, compared with the base method that simply calls a large model to generate a scale, this invention has advantages in score propensity (…). ), characteristics of the subject group ( ), Subjects' life scenarios ( ) and statement format ( In all aspects, it is better than the comparison method, indicating that the scale generated by the present invention is closer to the subject population.
[0137] The Base method prompts the user as follows: "You are a psychologist. Please refer to the following scale items and generate a new item to measure the subject's {main_variable}. Item: {item}. Please output in JSON format with the key "new item".
[0138] Table 5. Comparison of the group adaptability of the customized psychological scales generated according to the method of the present invention with the effectiveness of the comparative method.
[0139]
[0140] Based on the above application examples, in order to further verify the effectiveness and reliability of the psychological scale generated by the present invention, a data evaluation experiment based on real subjects was carried out.
[0141] Specifically, the items of the expert-designed psychological scale are mixed and arranged with the items of the customized psychological scale generated by this invention, and tests are conducted on different subject groups, including students, general adults, and the elderly. Based on the collected response data, the reliability and validity of the generated scale are statistically analyzed.
[0142] Table 6. Results of Manual Assessment of Customized Psychological Scales
[0143]
[0144] Regarding reliability, Cronbach's alpha for each group... The coefficients reached 0.841, 0.831, and 0.847, respectively, all significantly higher than 0.8, indicating good internal consistency of the scale. Regarding construct validity, the KMO indices were 0.794, 0.763, and 0.802, respectively, all higher than 0.7, and Bartlett's test of sphericity was significant (p<0.05), indicating that the data is suitable for factor analysis. Regarding structural consistency, the Pearson correlation coefficients between the generated scale and the expert-designed psychological scale reached 0.752, 0.786, and 0.769, respectively, indicating that the generated scale maintains a high degree of structural consistency with the expert-designed scale. These results demonstrate that the psychological scale generated by this invention not only meets the requirements of psychological theory at the content level but also satisfies psychometric standards in terms of statistical characteristics, possessing good reliability and validity, and can be used in practical psychological assessment scenarios.
[0145] Example 2
[0146] This invention discloses an apparatus for generating a psychological scale based on the target psychological problem, subject characteristics, and item attributes in the scale.
[0147] This embodiment provides an apparatus for automatically generating and evaluating customized psychological scales for a given group of test subjects. The apparatus includes:
[0148] The item knowledge base construction module is used to summarize and analyze existing expert-designed psychological scales, extract scale items, psychological constructs, construct dimensions, and introduce information such as scoring tendencies, and build an item knowledge base to guide the scale generation process.
[0149] A module for generating customized psychological scales for a target group of subjects is used to incorporate subject characteristics and context into the generation process, so that scale items can be matched with the actual life scenarios of the target group and automatically generate customized psychological scales.
[0150] The multidimensional psychological scale automatic evaluation module is used to systematically evaluate the generated customized psychological scale from multiple dimensions such as psychological professionalism, subject group adaptability, item diversity, reliability and validity, and provide signals for scale optimization.
Claims
1. A method for automatically generating and evaluating psychological scales for a given group of test subjects, characterized in that: Specifically, the steps include the following: S1. Constructing an entry knowledge base: S11. For expert-designed psychological scales, the large model automatically extracts construct dimensions based on the expert factors and psychological constructs that designed the psychological scale, and generates a set of construct dimensions. S12. Based on the generated set of construct dimensions, construct dimension assignment prompts are built to enable the large model to automatically label the construct dimension to which each scale item in the expert-designed psychological scale belongs. At the same time, scoring bias is introduced. Based on the scoring tendency judgment prompts, the large model analysis guides the scoring tendency of each scale item in the psychological scale designed by experts, forming a scale item... Its psychological construct Construct Dimension and its scoring tendency The tuples formed Then, the tuples are stored in the entry knowledge base to complete the construction of the entry knowledge base; S2. Based on the mental constructs of the expected scale items, and building upon the item knowledge base constructed in step S1, combine group attributes... and entry format attributes Design customized item generation prompts to guide the large model in generating customized psychological scales; S3. Based on psychometric theory and scale design principles, evaluate the psychological scale generated in step S2. Step S11 involves designing mental construct induction prompts. Using these prompts and expert factors, a large model is employed to extract construct dimensions. Specifically: By inputting construct induction prompts into the large model, the model is guided to conduct joint analysis of the scale items of the expert-designed psychological scale, including the scale name, test purpose, scale information of the psychological construct, and corresponding expert factors; and based on the semantic content of the scale items, the potential correlation between the scale items, and the psychological meaning of the expert factors, construct dimensions representing different aspects of the psychological construct are further summarized.
2. The method for automatically generating and evaluating psychological scales for a subject population according to claim 1, characterized in that, The large model mentioned in step S11 can be any one of the GPT series model, Qwen series model, and DeepSeek series model.
3. The method for automatically generating and evaluating psychological scales for a subject population according to claim 2, characterized in that, The psychological construct induction prompt consists of an induction requirement module and an induction example module. The induction requirement module includes the scale name, psychological construct, test purpose, expert factors, and scale items of the scale to be induced. The induction example module includes the induced psychological constructs and their corresponding construct dimension examples.
4. The method for automatically generating and evaluating psychological scales for a subject population according to claim 1, characterized in that, Step S12, the construct dimension allocation prompt, includes allocation instructions, allocation examples, and output format; the scoring tendency It includes three types: positive scoring, negative scoring, and neutral scoring.
5. The method for automatically generating and evaluating psychological scales for a subject population according to claim 4, characterized in that, Step S12 involves constructing construct dimension assignment prompts based on the generated construct dimension set, so that the large model automatically labels the construct dimension to which each scale item in the expert-designed psychological scale belongs. Specifically: Based on the provided psychological scale items, the corresponding psychological constructs of the scale items, and the set of construct dimensions extracted from the psychological constructs, construct dimension allocation prompts are constructed; the construct dimension allocation prompts are input into the large model, and the large model is guided to judge the correspondence between each scale item and the construct dimension according to the allocation instructions, allocation examples and preset output formats of the construct dimension allocation prompts, thereby completing the construct dimension allocation of the scale items. The scoring tendency judgment prompt, which guides the scoring tendency of each item in the psychological scale designed by the large model analysis experts, specifically refers to: A scoring bias judgment prompt is constructed, taking mental constructs and corresponding scale items as inputs, and defining the judgment rules for scoring bias, wherein: when the semantic content of the scale item can reflect that the subject has the mental construct, it is judged as positive scoring; when the semantic content of the scale item can reflect that the subject does not have the mental construct, it is judged as negative scoring; when the semantic content of the scale item cannot clearly reflect the level of the mental construct, it is judged as neutral scoring. The scoring tendency judgment prompts are input into the large model, which is then guided to infer the scoring tendency of the scale items based on the relationship between the semantic content of the scale items and the mental constructs, and output the corresponding scoring tendency results in a preset structured format.
6. The method for automatically generating and evaluating psychological scales for a subject population according to claim 1, characterized in that, In step S2, based on the mental constructs of the expected scale items, the scale items in the item knowledge base are retrieved, the corresponding scale items are obtained, and their construct dimensions are acquired. And scoring bias, combined with group attributes and entry format attributes The system designs customized item generation prompts and inputs them into the large model, guiding the large model to generate a customized psychological scale under the premise of satisfying the psychological construct.
7. The method for automatically generating and evaluating psychological scales for a subject population according to claim 6, characterized in that, The group attributes This includes information such as subject type, age group, occupational background, and related life or work scenarios; the item format attributes. The system includes the person form of the scale items, the expression method of the scale items, and the scoring tendency ratio; the customized item generation prompts include dynamically modifying variables and scale generation instructions. The dynamically modifying variables include the description and dynamic selection of scoring tendencies, and the scale generation instructions include group settings, scenario settings, statement format settings, and output constraints. By inputting the customized item generation prompts into the large model, the large model, under the constraints of constructs and construct dimensions, combines the characteristics of the target subject group and its corresponding scenario. It replaces, reorganizes, and rewrites the original items retrieved from the item knowledge base, which consist of the constructs of the expected scale items and the retrieved corresponding scale items, construct dimensions, and scoring tendencies, with situational events containing group attributes and item format attributes. Based on the set statement format and scoring tendency requirements, it generates new scale items, thereby forming a customized psychological scale that matches the target group and meets the requirements of psychometrics.
8. The method for automatically generating and evaluating psychological scales for a subject population according to claim 1, characterized in that, The evaluation indicators in step S3 include: psychological professionalism, subject group adaptability, item diversity, reliability and validity.
9. The method for automatically generating and evaluating psychological scales for a subject population according to claim 8, characterized in that, The psychological professionalism indicators include the professionalism of psychological constructs. and the specialization of conceptual dimensions The details are as follows: Specialization of psychological constructs The score range is [0,1]. The closer to 1, the higher the level of specialization of the psychological constructs of the scale, and vice versa. The formula for calculating the professionalization index of mental constructs is as follows: (1) in, This indicates a psychological scale designed by experts. The psychological construct, For each item in the generated customized psychological scale... Validation using a large model Does it belong to The scope, formally represented as ; It is an indicator function; if the condition is true, it equals 1; otherwise, it equals 0. Construct Dimension Specialization The score range is [0,1]. The closer to 1, the higher the level of specialization of the scale's construct dimensions, and vice versa. The specialized calculation formula for the concept dimension is as follows: (2) Among them, the function This represents the construct dimension of newly generated customized scale items extracted using a large model. yes The corresponding items in the expert-designed psychological scale; Indicates an entry The expert factor; Whether the construct dimensions of the newly generated items and the expert factors in the corresponding expert design scale items are present in the scale. The CCP appears; if the Communist Party appears, then... Conversely, it equals 0; The formula for calculating the subject population fitness index is as follows: (3) in, It comes from The attributes, including rating trends, participant population, scenario setting, and item format, are used for each generated customized psychological scale item. Use a large model to determine whether the generated scale items conform to the specified attributes. Formal representation as ; The item diversity index is mainly divided into the diversity among items within the psychological scale and the diversity between the generated customized psychological scale and the expert-designed psychological scale. The Self-BLEU diversity assessment index is used to measure the generated customized psychological scale. The internal diversity in [the data] is calculated using the following formula: (4) Where the function The function representing the BLEU score is used to measure the items of a newly generated, customized psychological scale. and entries The similarity between them This refers to the number of items in the psychological scale S'. Interscale diversity index The calculation formula is as follows: (5) Using Clonbach The coefficients are used to assess the reliability of the generated customized psychological scale; The Kaiser-Meyer-Olkin method was used to assess the construct validity of the generated customized psychological scale. The Pearson correlation coefficient was used to evaluate the consistency between the generated customized psychological scale and the expert-designed psychological scale in terms of construct validity.
10. An apparatus for implementing the automatic generation and evaluation method of psychological scales for a subject population as described in any one of claims 1-9, characterized in that, The device includes: The item knowledge base construction module is used to summarize and analyze existing expert-designed psychological scales, extract scale items, psychological constructs, and construct dimensions, and introduce scoring tendencies to build an item knowledge base to guide the scale generation process. A module for generating customized psychological scales for a target group of subjects is used to incorporate subject characteristics and context into the generation process, so that scale items can be matched with the actual life scenarios of the target group and automatically generate customized psychological scales. The multidimensional psychological scale automatic evaluation module is used to systematically evaluate the generated customized psychological scale from multiple dimensions such as psychological professionalism, subject group adaptability, item diversity, and reliability and validity, providing signals for scale optimization.