Student mental health data processing method based on feature extraction and model analysis
By constructing a mental health database and iteratively supplementing missing items, and utilizing multidimensional statistical features and standardized risk level analysis, the problems of poor comparability of historical data and unstable risk labels in student mental health assessments were solved. This achieved accurate data supplementation and consistency in risk prediction, thereby improving the accuracy and stability of the assessment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUNAN ANZHI NETWORK TECH CO LTD
- Filing Date
- 2026-03-09
- Publication Date
- 2026-06-16
AI Technical Summary
Existing methods for assessing the mental health of middle school students suffer from poor comparability of historical data, errors in item completion, and instability of risk labels. They are unable to effectively handle data comparability issues caused by multiple versions of questionnaires and cannot fully consider the impact of missing items on assessment results. In particular, when questionnaire versions, scoring boundaries, and data collection standards change, old data is difficult to recalculate or align according to the new standards, affecting the accuracy and robustness of the model.
By collecting detailed and summarized information on questionnaire items, a mental health database is constructed. The statistics of items included in the database are calculated according to the questionnaire version number. Missing items are iteratively supplemented. Multidimensional statistical features are used to quantify the differences between the supplemented samples and the distribution of the same version. Reliable data are screened and processed in layers. Deviation and stability analysis of data related to risk levels and item distribution with unified standards is performed. Structured features are generated and a mental health risk prediction model is constructed to achieve prediction of mental health risk levels, error tracking, and closed-loop monitoring of version drift.
It enables accurate completion of missing data, ensuring data integrity, and solves the problem of inaccurate analysis caused by differences in data quality. It also enables automatic label correction and consistency of risk prediction results, accurately predicts students' mental health risks and monitors version drift, thus improving the accuracy and stability of the assessment.
Smart Images

Figure CN121789913B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, specifically to a method for processing student mental health data based on feature extraction and model analysis. Background Technology
[0002] Student mental health assessment is an important research topic in the field of education. With the increasing severity of mental health problems, how to accurately assess and predict students' mental health status has become an urgent problem to be solved. In current technology, mental health assessment usually relies on questionnaires, based on self-assessment scales or standardized testing tools, to classify risk levels by collecting students' responses.
[0003] For example, the invention patent announcement CN120236722B discloses a method and system for identifying student psychological states based on multidimensional data analysis, which relates to the field of data processing technology. Specifically, it discloses a method and system for identifying student psychological states based on multidimensional data analysis, which achieves precision and efficiency in student mental health management. First, after data cleaning and standardization, invalid information can be eliminated and data dimensions can be unified, significantly improving the reliability of analysis. Through a collaborative analysis mechanism, data features of different dimensions can be dynamically associated. The adaptive adjustment function of the feature extraction process can automatically optimize the algorithm weight according to the data quality, ensuring the accuracy of key psychological indicators. Through data-driven closed-loop management, mental health monitoring is transformed from passive response to proactive prevention, effectively improving the efficiency of school psychological crisis intervention, while providing scientific support for educational decision-making and promoting the precise allocation of mental health resources.
[0004] For example, the invention patent publication number CN118585631A discloses a method and system for intelligent recommendation of psychological test questions based on knowledge graphs. This method extensively collects test questions from domestic and international psychological testing websites, accumulating a large data foundation for question recommendation. By extracting features and processing the collected questions, a knowledge graph is constructed based on the processing results and input into an intelligent recommendation model to achieve intelligent question recommendation. This solves the technical problems of existing technologies that rely on manual division and recommendation of each question, which are labor-intensive, inefficient, and prone to errors. Furthermore, this solution can determine the direction, theme, and purpose of the test based on market and customer needs, such as psychological risk factor testing for weeding out underperformers or comprehensive psychological quality testing to understand the psychological characteristics of test takers. Based on the relationship between the test population and the intelligently recommended test questions, it recommends specific examination content and suitable test materials for different test populations.
[0005] However, these assessment methods have certain limitations. They are difficult to effectively handle the data comparability issues caused by multiple versions of questionnaires, cannot fully consider the impact of missing items on the assessment results, and fail to effectively combine historical data for accurate risk prediction during the assessment process. In particular, when the questionnaire version, scoring boundaries, and data collection standards change, it is difficult to recalculate or align the old data according to the new standards, resulting in inaccurate comparison of cross-version data and affecting the accuracy and robustness of the model. Summary of the Invention
[0006] Technical problems to be solved
[0007] To address the shortcomings of existing technologies, this invention provides a student mental health data processing method based on feature extraction and model analysis, which solves the problems of poor comparability of historical data, errors in item completion, and instability of risk labels.
[0008] Technical solution
[0009] To achieve the above objectives, this invention employs the following technical solution: a student mental health data processing method based on feature extraction and model analysis, comprising: S1, collecting detailed and summary information of items and simultaneously binding the effective caliber of version ledgers and boundary ledgers, followed by preprocessing to construct a mental health database; S2, calculating and storing the statistical quantities of items according to questionnaire version numbers, and iteratively supplementing missing items under the constraint of total score to obtain the score vector of the supplemented items; S3, quantifying the differences between the supplemented samples and the distribution of the same version using multidimensional statistical features, screening reliable data and processing it hierarchically to achieve historical data alignment, anomaly correction, and consistent mapping of risk levels; S4, performing deviation and stability analysis using data related to risk levels and item distribution with unified caliber, and performing label shrinkage correction, anchor point backfilling, and sample diversion operations based on the analysis results; S5, generating structured features based on stable samples and constructing a mental health risk prediction model according to version to achieve mental health risk level prediction, error tracking, and closed-loop monitoring of version drift.
[0010] Furthermore, after collecting detailed and summary information on question items and simultaneously binding the version ledger and boundary ledger's effective criteria, preprocessing is performed to construct the mental health database. Specific measures for this process are as follows: Collect detailed question item data: Obtain student anonymity identifier, response time, questionnaire version number, boundary version number, question item identifier, question item score, response time, missed answer location, submission method, raw total score, total score scale, and raw risk level from the mental health assessment entry point. Simultaneously collect version effective information and boundary effective information matching the response time from the questionnaire version ledger and scoring boundary ledger. During the collection phase, unified indexing is completed to ensure that each response record can be associated with a unique identifier after collection. The questionnaire version number and unique boundary version number are used; field standardization is performed on the item detail data package and summary data package, unifying field naming, data type, missing item expression, time format and coding rules; item scores are normalized according to the maximum and minimum value method based on the item value range, and the original total score is normalized by dividing by the full score, and an item completeness mark and a missing item list are generated; combined with the scoring boundary ledger, the text labels of each risk level are mapped to discrete values in order of level, a risk level value coding table is generated, and the middle level code is selected from the coding table and registered as the risk neutral value; after standardization and normalization are completed, the data is stored and a mental health database is constructed.
[0011] Further, the specific measures for calculating and storing the item statistics according to the questionnaire version number are as follows: Detailed records of items are collected in the mental health database according to the questionnaire version number; complete sample sets are selected and marked as complete; the mean of each item is calculated using the maximum likelihood estimation method and assembled into an item score mean vector; the pairwise covariance of items is calculated using the sample covariance estimation method and a covariance matrix of item scores is constructed; to reduce the impact of extreme responses on the statistics, the quantile truncation method is used to obtain the upper and lower bounds of the actual distribution of items and serve as the source of subsequent pruning boundaries; simultaneously, the square root of the diagonal elements of the covariance matrix is used to obtain item fluctuation statistics for residual allocation; the item score mean vector, item score covariance matrix, item fluctuation statistics, upper and lower bounds of the actual distribution of items, statistical sample count, and calculation batch identifier are written into the item statistics table within the version.
[0012] Furthermore, the specific measures for iteratively filling in missing items under the total score constraint to obtain the completed item score vector are as follows: For records where items are complete but marked as incomplete, the mean item score vector, item fluctuation statistics, and upper and lower bounds of the actual item distribution corresponding to the questionnaire version number are read. The constraint projection iteration method is used for filling in the missing items. First, the mean item score is used to fill in the missing items initially while keeping the observed item scores unchanged. Then, the sum of the original total score and the observed item scores is used to form the total score constraint. The projection of the completed vector onto the total score constraint hyperplane is achieved through Lagrange multiplier updates. In each iteration, the missing items are updated using a residual amortization strategy based on the fluctuation ratio. The updated item scores are truncated based on the upper and lower bounds of quantiles to suppress the spread of outliers. After truncating, the residuals are recalculated and the next iteration is entered until the residual change in two consecutive iterations is less than the residual change threshold. The converged completion result is recorded as the completed item score vector.
[0013] Furthermore, the specific measures for quantifying the difference between the supplemented sample and the same version distribution using multidimensional statistical features are as follows: obtain the item score vector after the i-th sample is supplemented, the mean item score vector under version v, and the item score covariance matrix under version v; subtract the mean item score vector under version v from the item score vector after the i-th sample is supplemented to obtain the difference vector; perform the inverse operation on the item score covariance matrix under version v to obtain the inverse covariance matrix; transpose the difference vector and perform matrix multiplication operations with the inverse covariance matrix and the difference vector in sequence to obtain the quadratic form result; perform the square root operation on the quadratic form result to finally obtain the item distribution deviation value between the i-th sample and the same version item distribution.
[0014] Furthermore, the specific measures for screening reliable data and processing it hierarchically to achieve historical data alignment, anomaly correction, and consistent mapping of risk levels are as follows: Real-time comparison of item distribution deviations and deviation thresholds; when the item distribution deviation is less than the deviation threshold, the record is written to the stable sample table, and an alignment pass mark is added to the record. This part of the record is selected as the benchmark sample, and the total score percentile curve and mapping relationship are updated using the stable sample according to the rolling time window, generating a unified caliber comparable total score and a unified caliber risk level; when the item distribution deviation is greater than or equal to the deviation threshold, it is written to the restricted alignment pool and the restricted reason code is recorded. The k-means clustering method is used with the molecular vector of the observed items in the current record as the clustering feature, utilizing... Local clustering is performed using a benchmark sample set, and missing items are supplemented a second time based on the item statistics within their respective clusters. After superimposing the mode of the item variance as a stabilizing term onto the diagonal of the version item distribution covariance matrix, the item distribution deviation is recalculated. If the item distribution deviation is still greater than or equal to the deviation threshold after recalculation, the record stops participating in quantile summary updates. Only the upper and lower bounds of the quantile position are given based on the cumulative count of the original total score in the version quantile bucket. The upper and lower bounds of the comparable total score under the unified caliber and the risk level values of the corresponding unified caliber risk level intervals are output in the reference caliber quantile mapping table, and a review mark is written. If the review result is less than the deviation threshold, the revised record is transferred to the stable sample table to participate in subsequent mapping relationships and statistical updates.
[0015] Furthermore, the specific measures for deviation and stability analysis using standardized risk levels and item distribution data are as follows: obtain the risk level value of the i-th sample after standardized alignment, the corresponding item distribution deviation value, and the risk neutral value; calculate the difference between the risk level value and the risk neutral value of the i-th sample to obtain the level deviation; take the ratio of the item distribution deviation value to the sum of the item distribution deviation values plus one to obtain the contraction term; multiply the contraction term by the level deviation to obtain the level adjustment amount that needs to be recovered; and then subtract the level adjustment amount from the standardized risk level value to finally obtain the stabilized label stability value of the i-th sample.
[0016] Furthermore, the specific measures for performing label shrinkage correction, anchor point backfilling, and sample diversion based on the analysis results are as follows: This involves real-time comparison of label stability values and stability thresholds, such as... Figure 3This is a flowchart of the mental health assessment label stabilization process in this embodiment. When the label stability value is less than the stability threshold, the label anchor point backfilling strategy is executed and a reproducible repair link is initiated. Under the same student anonymity identifier, forward and backward anchor points are retrieved in chronological order. The forward anchor point is the most recent record that meets the conditions before the current record; the backward anchor point is the most recent record that meets the conditions after the current record. When both forward and backward anchor points exist and have the same risk level, the consistent risk level is written to the revised label field and the anchor point reference relationship is written to the log. The original unified risk level is written to the reserved field for auditing. When the forward anchor point is missing, the backward anchor point is missing, or the risk levels of the two anchor points are inconsistent, ... The revised label field is written with the pending verification mark and a supplementary collection task field is generated. The supplementary collection task field includes retest requests for the same questionnaire version and supplementary collection requests for missing items. At the same time, the records are written to the representation training pool. The rule for constructing sample pairs in the representation training pool is that records of the same student in adjacent time windows constitute positive sample pairs, and records of different students in the same batch constitute control sample pairs. During the training phase, only feature vectors are used and no supervised labels are used to avoid unstable labels from entering supervised training but still retain the sample utilization value. When the label stability value is greater than or equal to the stability threshold, the risk level corresponding to the label stability value is written to the supervised label field and written to the stable sample table. The stable sample table maintains representative sample records according to questionnaire version number, boundary version number, and risk level.
[0017] Furthermore, the specific measures for generating structured features based on stable samples and constructing a mental health risk prediction model according to version are as follows: The item score vectors and answer behavior fields in the stable sample table are used as basic inputs. Dimensional aggregation features and temporal difference fluctuation features are generated in the mental health database based on the feature dictionary. A unified caliber comparable total score, item distribution deviation value, risk level value, and label stability value are combined to form an input set. The risk level written into the supervision label field is used as the training label. The correlation direction between input features and risk levels is determined by the mental health risk stratification mechanism, forming structural constraints. These structural constraints are implemented using training data stratified by questionnaire version number and boundary version number, risk level judgment consistency verification rules, and stable sample table representative sample comparison rules. Gradient boosting decision trees are used for training. Within each questionnaire version and boundary version stratification, the discriminant relationship between item scores and unified caliber comparable total scores on risk levels is learned. A mental health risk prediction model is constructed, and the contribution ranking of key features is output. The risk level in the supervision label field is used as a reference to perform consistency comparison of the prediction results. The number of misjudgments, misjudgment ratio, and the proportion of level deviation direction under each questionnaire version number and boundary version number stratification are statistically analyzed to obtain the risk prediction error.
[0018] Furthermore, the specific measures for achieving closed-loop monitoring of mental health risk level prediction, error tracking, and version drift are as follows: Risk prediction results are statistically analyzed on a rolling basis according to questionnaire version number and risk level stratification. When the risk level of any stratum changes abnormally n times or the stratified risk prediction error exceeds the error threshold, a drift alarm is generated and written into the drift monitoring log, recording the affected version number, boundary number, sample stratification, and drift type. This triggers a handling and diversion process, transferring relevant records to a queue requiring review for recalculation and supplementary data collection. Based on representative samples from a stable sample table, the deviation values of item distribution, label stability values, and comparable total scores and risk level distributions under the same time window before and after the switch are compared and evaluated. The changes in sample size, the degree of drift of key indicators, and the changes in risk prediction error are output, and a control set report is generated to track the performance trend of the mental health risk prediction model. Version update and mapping relationship review instructions are given to the mental health risk prediction model, and monitoring records and evaluation results are archived in the mental health database, forming a traceable and iterative dynamic optimization closed loop.
[0019] Beneficial effects
[0020] The present invention has the following beneficial effects:
[0021] (1) This invention, by iteratively supplementing missing items under the total score constraint, and combining the average vector of item scores with item fluctuation statistics, achieves accurate supplementation of missing data and ensures data integrity, effectively solving the problem of inaccurate processing of missing data in the prior art.
[0022] (2) This invention utilizes multidimensional statistical features to quantify and supplement the differences between the sample and the same version distribution, thereby realizing the screening and hierarchical processing of reliable data, effectively solving the problem of inaccurate analysis caused by differences in data quality in the prior art.
[0023] (3) This invention performs label stabilization operation by analyzing the deviation of risk level and item distribution data with unified caliber, thereby realizing automatic label correction and consistency of risk prediction results, effectively solving the problem of unstable risk labels in the prior art.
[0024] (4) This invention generates structured features based on stable samples and constructs a risk prediction model, thereby achieving accurate prediction of students' mental health risks and monitoring of version drift, effectively solving the problem that existing technologies cannot cope with different versions of data and version changes.
[0025] Of course, any product implementing this invention does not necessarily need to achieve all of the advantages described above at the same time. Attached Figure Description
[0026] Figure 1This is a flowchart of the student mental health data processing method based on feature extraction and model analysis according to the present invention.
[0027] Figure 2 This is a kernel density analysis chart showing the deviation values of the psychological health assessment items in this invention.
[0028] Figure 3 This is a flowchart of the mental health assessment label stabilization process of the present invention. Detailed Implementation
[0029] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0030] Please see Figures 1-3 This invention provides a technical solution: a student mental health data processing method based on feature extraction and model analysis, comprising: S1, collecting detailed and summary information of items and simultaneously binding the effective caliber of version ledgers and boundary ledgers, followed by preprocessing to construct a mental health database; S2, calculating and storing the statistical quantities of items according to questionnaire version numbers, and iteratively supplementing missing items under the constraint of total score to obtain the score vector of supplemented items; S3, quantifying the differences between the supplemented samples and the distribution of the same version using multidimensional statistical features, screening reliable data and processing it hierarchically to achieve historical data alignment, anomaly correction, and consistent mapping of risk levels; S4, performing deviation and stability analysis through unified risk level and item distribution related data, and performing label shrinkage correction, anchor point backfilling, and sample diversion operations based on the analysis results; S5, generating structured features based on stable samples and constructing a mental health risk prediction model according to version to achieve mental health risk level prediction, error tracking, and closed-loop monitoring of version drift.
[0031] Specifically, the preprocessing steps for constructing the mental health database are as follows: When collecting detailed and summary data, and simultaneously binding it to the version and boundary ledgers for effective status, the following measures are taken: First, when collecting detailed data, a series of basic information is obtained from the mental health assessment entry point, including student anonymity identifier, response time, questionnaire version number, boundary version number, item identifier, item score, response time, missed answer location, submission method, raw total score, full score scale, and raw risk level. Simultaneously, during the collection phase, version and boundary effectiveness information matching the response time is extracted from the questionnaire version and scoring boundary ledgers to ensure that all collected records accurately reflect the corresponding version and boundary information. During this collection phase, a unified indexing mechanism ensures that each response record is accurately associated with a unique questionnaire version number and a unique boundary version number. Next, the fields of the detailed and summary data packages are standardized, including unified field naming, data types, missing data representation, time format, and encoding rules, thereby providing consistent support for subsequent data processing. For item scores, normalization is performed using the maximum / minimum value method based on the range of item values. Simultaneously, the original total score is normalized according to the full score scale, generating item completeness markers and a list of missing items. This step also incorporates a scoring boundary log, mapping text labels for different risk levels to discrete values in order of risk level, and generating a risk level value coding table, where the intermediate level code is selected as the risk-neutral value. After all standardization and normalization operations are completed, the data will be stored in a mental health database, preparing the data for subsequent analysis and modeling.
[0032] This implementation plan ensures the consistency and accuracy of data collected from the mental health assessment entry point through unified indexing and standardization. Item scores and total scores are normalized to make data comparable across different questionnaire versions and scoring standards. Simultaneously, generating complete item markers and a list of missing items facilitates subsequent data completion and quality control. Risk level mapping and coding convert text labels of different levels into discrete values, facilitating subsequent analysis and modeling. Ultimately, these processes construct a standardized mental health database, laying the foundation for further data analysis, risk assessment, and the development of mental health risk prediction models.
[0033] Specifically, the measures for calculating and storing item statistics according to questionnaire version number are as follows: Detailed item records are collected in the mental health database according to questionnaire version number. Complete sample sets are selected based on item completeness. The mean of each item is calculated using the maximum likelihood estimation method and assembled into an item score mean vector. The pairwise covariance of each item is calculated using the sample covariance estimation method, and an item score covariance matrix is constructed. To reduce the impact of extreme responses on the statistics, the quantile truncation method is used to obtain the upper and lower bounds of the actual item distribution. The 1% and 99% quantiles are selected as the upper and lower bounds of the item scores to limit the influence of extreme values and serve as the source of subsequent pruning boundaries. Simultaneously, the square root of the diagonal elements of the covariance matrix is used to obtain item fluctuation statistics for residual allocation, reflecting the fluctuation of each item in the sample. The item score mean vector, item score covariance matrix, item fluctuation statistics, upper and lower bounds of the actual item distribution, statistical sample count, and calculation batch identifier are written into the item statistics table within the version for efficient application in subsequent data processing and modeling. This process helps improve the stability of mental health risk prediction models and reduce the interference of extreme responses on analysis results, ensuring that mental health risk prediction models are accurately adapted and optimized for data.
[0034] In this implementation scheme, by performing maximum likelihood estimation, sample covariance estimation, and quantile truncation on the item score data, the interference of extreme values on the statistical results can be effectively eliminated, ensuring a more stable and reliable data distribution. The calculated item score mean vector, covariance matrix, and fluctuation statistics provide accurate statistical basis for subsequent missing data imputation and risk level prediction. Simultaneously, by setting upper and lower bounds on the item distribution and performing pruning operations, the rationality and consistency of the data are further guaranteed, reducing the impact of unstable data on the modeling and analysis process, thereby improving the accuracy and robustness of the mental health data processing method.
[0035] Specifically, the measures taken to iteratively complete missing items under the total score constraint to obtain the completed item score vector are as follows: For records marked as incomplete (item complete), firstly, the mean vector of item scores, item fluctuation statistics, and upper and lower bounds of the actual item distribution corresponding to the questionnaire version number are read. A constraint projection iteration method is used for completion, starting with the following steps: First, the mean vector of item scores is used to initially fill in the missing items, ensuring that the observed item scores do not change; then, the total score constraint is formed by calculating the sum of the original total score and the observed item scores, and the Lagrange multiplier update method is applied to project the completed item score vector onto the total score constraint hyperplane. In each iteration, based on the item fluctuation statistics, the scores of the missing items are updated using a residual allocation strategy based on the fluctuation ratio, and the updated item scores are truncated based on the upper and lower bounds of quantiles to avoid the spread of outliers; the truncated data is used again to calculate the residuals and enters the next iteration until the residual change in two consecutive iterations is less than a preset residual change threshold. Finally, the converged result is used as the score vector of the completed items, and this result is used for subsequent training and analysis of the mental health risk prediction model to ensure that the quality of the completed data meets expectations.
[0036] In this implementation plan, missing items are iteratively filled in. The mean score of each item in the questionnaire version, fluctuation statistics, and upper and lower bounds of the actual distribution are used to ensure that missing data is reasonably filled in within the total score constraint. In each iteration, residual amortization and data pruning effectively control the filling error, preventing outlier propagation until convergence. Ultimately, the filled item score vector maintains data consistency and stability, providing high-quality data support for subsequent risk assessment and mental health risk prediction model training.
[0037] Specifically, the measures taken to quantify the difference between the supplemented sample and the distribution of the same version using multidimensional statistical features are as follows: After obtaining the item score vector after supplementation for the i-th sample, the mean item score vector under version v, and the item score covariance matrix under version v, the difference vector is first calculated by subtracting the mean item score vector under version v from the item score vector after supplementation for the i-th sample. The inverse covariance matrix of the item score under version v is then performed to obtain the inverse covariance matrix. The difference vector is then transposed and multiplied sequentially with the inverse covariance matrix and the difference vector to obtain a quadratic form. The square root of the quadratic form is then performed to obtain the deviation value of the item distribution between the i-th sample and the item distribution of the same version. By binding with the version ledger's effective information, the stability and accuracy of data processing are ensured, thus avoiding simple replacement of conventional methods and guaranteeing consistency and comparability between versions. This provides a stable and sustainable data foundation for subsequent risk level assessment and mental health risk prediction model training, effectively improving the processing capacity of mental health data. It plays an important role in screening stable samples with consistent version criteria and in rolling updates of mapping relationships, ensuring that the impact of version drift can be effectively suppressed when processing cross-version data.
[0038] The specific calculation method for the deviation value of the item distribution is as follows:
[0039] ;
[0040] In the formula, This represents the deviation of the i-th sample from the item distribution of the same version, reflecting the degree of deviation between the supplemented sample and the overall distribution, and is used for subsequent risk level adjustment and credibility screening; Let represent the item score vector after the i-th sample is completed, reflecting the score of the completed sample on each item; This represents the vector of average scores for all items in version v, reflecting the average score of all items within the same questionnaire version. This represents the covariance matrix of item scores under version v, reflecting the distributional correlation among items within the same questionnaire version. Representing vectors The transpose operation is used to perform matrix operations with the inverse of the covariance matrix, thereby calculating the degree of deviation of the sample from the population distribution.
[0041] In this embodiment, the first group has a confidence level of 0.85, a normalized total score of 0.75, a risk level of 2, 0 missing items, a completion error of 0.00, and an item distribution deviation of 1.12; the second group has a confidence level of 0.45, a normalized total score of 0.87, a risk level of 4, 3 missing items, a completion error of 0.17, and an item distribution deviation of 2.45; the third group has a confidence level of 0.92, a normalized total score of 0.63, and a risk level of... Group 1 has 0 missing items, a completion error of 0.00, and a item distribution deviation of 0.85; Group 4 has a confidence level of 0.60, a normalized total score of 0.80, a risk level of 3, 2 missing items, a completion error of 0.38, and an item distribution deviation of 1.68; Group 5 has a confidence level of 0.25, a normalized total score of 0.92, a risk level of 5, 6 missing items, a completion error of 1.25, and an item distribution deviation of 3.25.
[0042]
[0043] like Figure 2The figure shows a kernel density analysis chart of the deviation values of the item distribution in the mental health assessment provided in this application embodiment. The chart clearly shows the distribution characteristics of the item distribution deviation values of the three questionnaire versions. Version 1 is represented by the blue curve, which has the most concentrated distribution; Version 2 is represented by the red curve, which has a wider distribution range and contains more samples with high deviation values; Version 3 is represented by the green curve, which has the most concentrated distribution and a lower overall deviation value. Combined with the data in Table 1, it can be seen that the item distribution deviation value of the sample shows a clear correlation with its reliability, risk level, and other key indicators. Taking the first group of samples S001 shown in the figure and table as an example, this sample comes from Version 1. Its item distribution deviation value is 1.12, which is lower than the deviation threshold of 1.530. At the same time, its reliability is 0.85, its risk level is low risk, the number of missing items is 0, and the completion error is 0.00. These data together indicate that the sample has high response quality and stable pattern, and can be directly used as a stable sample for subsequent analysis and modeling. Sample group S003, from version 3, has an item distribution deviation of only 0.85, a confidence level as high as 0.92, and a very low risk level, further confirming the high quality of the data from version 3. In contrast, samples group S002 (group 2) and S005 (group 5), both from version 2, have item distribution deviations of 2.45 and 3.25, respectively, significantly higher than the deviation threshold. Their confidence levels are low (0.45 and 0.25), with high and very high risk levels, respectively. Furthermore, these samples contain many missing items and have relatively large completion errors, indicating significant deviations from the overall distribution pattern, necessitating a secondary completion or manual review process. The kernel density distribution curve shown in the figure corroborates the specific sample indicator data in the table, intuitively revealing the distribution patterns of data across different questionnaire versions and the effectiveness of the deviation threshold classification. This provides a reliable visual basis and analytical foundation for assessing the quality of mental health assessment data, identifying abnormal response patterns, and developing scientific data cleaning and completion strategies.
[0044] This implementation plan ensures the consistency and comparability of cross-version data by quantifying and filling the differences between the sample and the item distribution within the same version. By combining the difference vector with the inverse covariance matrix and performing matrix operations, the deviation value of the sample's item distribution is obtained, thus providing a basis for "screening stable samples with consistent version standards" and "rolling updates to mapping relationships." This process effectively suppresses the impact of cross-version drift and is linked to the version ledger's effective information, ensuring the stability and accuracy of data processing. This lays a reliable foundation for risk level assessment of mental health data and subsequent training of mental health risk prediction models.
[0045] Specifically, the measures for screening reliable data and processing it hierarchically to achieve historical data alignment, anomaly correction, and consistent risk level mapping are as follows: By comparing the deviation value of the item distribution with the deviation threshold in real time, the system can dynamically identify and process data that does not meet expectations. When the deviation value of the item distribution is less than the deviation threshold, the relevant record is considered a stable sample and written into the stable sample table. The record is marked as aligned and used as a benchmark sample for subsequent updates of the total score percentile curve and mapping relationship, thereby generating a unified caliber comparable total score and a unified caliber risk level. When the deviation value of the item distribution is greater than or equal to the deviation threshold, the relevant record is transferred to the restricted alignment pool, and the restricted reason code is recorded. The k-means clustering method is used, with the molecular vector of the observed items of the current record as the clustering feature. Local clustering is performed in combination with the benchmark sample set to perform secondary completion of missing items. Here, k is the molecular vector of the items in the mental health data, with a value ranging from 3 to 5, so as to effectively group the data and complete the data. The value of k can be adjusted according to the size and complexity of the dataset until a value with good clustering effect is reached. The mode of the item distribution variances, superimposed on the diagonal of the version item distribution covariance matrix, is used as the stabilizing term. The item distribution deviation is recalculated and repeated until it meets the standard. Records with still large deviations are no longer included in the quantile summary update; instead, the upper and lower bounds of the quantile position are determined based on the cumulative count of the original total score in the version quantile bucket. The corresponding comparable total score and risk level value are then updated in the reference caliber quantile mapping table. Furthermore, records are marked for review, and after successful review, the revised record re-enters the stable sample table to participate in subsequent mapping and statistical updates.
[0046] This implementation plan ensures data stability and consistency by monitoring the deviation of the item distribution in real time. When the deviation is less than a threshold, the relevant records are marked as stable samples and used to construct a unified risk level and total score mapping relationship, ensuring continuous data updates and accuracy. For records with large deviations, clustering methods are used to fill in missing items, and the deviation is recalculated to ensure data correction and stability. Finally, records are added to the stable sample table or a restricted alignment pool based on the correction status. Through continuous correction and updates, the risk prediction capability of the data and the accuracy of the mental health risk prediction model are improved, effectively solving the problems of historical data alignment and outlier data handling.
[0047] Specifically, the measures for deviation and stability analysis using standardized risk levels and item distribution data are as follows: Obtain the risk level value of the i-th sample after standardized alignment, the corresponding item distribution deviation value, and the risk neutral value; calculate the difference between the risk level value and the risk neutral value of the i-th sample to obtain the level deviation; calculate the contraction term by ratioing the sum of the item distribution deviation values and their sums; multiply the contraction term by the level deviation to calculate the required level adjustment; subtract the level adjustment from the standardized risk level value to obtain the stable label value for the i-th sample after stabilization. This method allows for dynamic adjustments based on the degree of deviation between the risk level and item distribution. The larger the item distribution deviation, the larger the contraction term, resulting in a more significant adjustment to the risk level. This method ensures monotonicity and stability by contracting extreme behaviors, avoids the impact of data anomalies on labels, and makes label adjustments smoother and more consistent with the actual data distribution, thereby improving the accuracy and robustness of the prediction results.
[0048] The specific calculation method for the tag stability value is as follows:
[0049]
[0050] In the formula, This represents the stable label value of the i-th sample after stabilization, which is used for subsequent feature extraction, statistical analysis and mental health risk prediction model modeling, and can realize automatic conservative adjustment of the risk label of abnormally supplemented samples; This represents the risk level value calculated for the i-th sample after alignment with a unified standard, reflecting the risk level calculated based on the current unified boundary line and comparable total score; This represents the deviation of the i-th sample from the item distribution of the same version, used to measure the reliability of the supplementary sample and the degree of deviation from the overall distribution; This represents the risk-neutral value, used to fill in the baseline point when the sample level converges.
[0051] In this implementation scheme, the label stability value is dynamically adjusted by calculating the difference between the risk level and the risk neutral value, combined with the item distribution deviation. As the item distribution deviation increases, the value of the contraction term also increases, thus providing a stronger adjustment to the risk level. This method effectively suppresses the influence of extreme data, ensuring the stability and monotonicity of the mental health risk prediction model. By smoothly contracting and adjusting the risk level, the label changes are ensured to better reflect the actual data distribution, improving the accuracy and reliability of the mental health risk prediction model, while avoiding label fluctuations caused by data anomalies.
[0052] Specifically, the measures for label shrinkage correction, anchor point backfilling, and sample diversion based on the analysis results are as follows: By comparing the label stability value and stability threshold in real time, when the label stability value is less than the stability threshold, the label anchor point backfilling strategy is executed and a reproducible repair link is initiated. Forward and backward anchor points are retrieved in chronological order under the same student anonymity identifier. The forward anchor point is the most recent record that met the conditions before the current record, and the backward anchor point is the most recent record that met the conditions after the current record. If both forward and backward anchor points exist and have the same risk level, the consistent risk level is written to the revised label field, and the anchor point reference relationship is recorded in the log. The original unified risk level is written to a reserved field for auditing. If the forward anchor point is missing, the backward anchor point is missing, or the risk levels of the two anchor points are inconsistent, a pending verification mark is written to the revised label field, and a supplementary sampling task field is generated. The supplementary sampling task field is managed in the system as a task queue, and is processed by setting trigger conditions and defining field structures. After the task is generated, the relevant records are written to the representation training pool. In this pool, the construction rules for sample pairs are as follows: records from adjacent time windows of the same student constitute positive sample pairs, and records from different students in the same batch constitute control sample pairs. The time window length and sliding step size will be determined in practice to ensure the temporal consistency of the training data. During the training phase, only feature vectors are used without supervised labels to avoid unstable labels entering supervised training, while still retaining the value of the samples. When the label stability value is greater than or equal to the stability threshold, the risk level corresponding to the label stability value is written to the supervised label field and then to the stable sample table. The stable sample table maintains representative sample records according to the questionnaire version number, boundary version number, and risk level. This process effectively ensures the accuracy and consistency of the data and promotes the dynamic optimization of data quality.
[0053] In this implementation scheme, data correction is performed when data is unstable through a label anchor backfilling strategy and a reproducible repair path. When both forward and backward anchors exist simultaneously and have the same risk level, the label is updated and recorded in the log; if the conditions are not met, a supplementary data collection task is generated and the relevant records are added to the representation training pool for subsequent data processing and training. This process ensures data stability and dynamically manages data supplementation and correction through task queues and trigger conditions, thereby effectively improving data quality and avoiding interference from unstable labels on the training of the mental health risk prediction model.
[0054] Specifically, the measures for generating structured features based on stable samples and constructing a mental health risk prediction model by version are as follows: The item score vectors and response behavior fields from the stable sample table are used as basic inputs, and dimensional aggregation features and temporal difference fluctuation features are generated in the mental health database based on the feature dictionary. The feature dictionary includes the following dimensional aggregation rules: calculating the sum, mean, and variance of each dimension after grouping by dimension; the temporal difference fluctuation features are represented by calculating the magnitude of score changes within a continuous time window. The input set is composed of a unified, comparable total score, item distribution deviation, risk level value, and label stability value, and the risk level of the supervision label field is used as the training label. Combined with the mental health risk stratification mechanism, the correlation direction from input features to risk levels is clarified, forming structural constraints. These structural constraints include: a stratified index based on questionnaire version number and boundary version number, a consistency verification rule for risk level determination, and a rule for comparing representative samples from the stable sample table. Based on this, a gradient boosting decision tree is used for training. Within each questionnaire version and boundary version stratum, the relationship between item scores and comparable total scores with risk levels is learned, constructing a mental health risk prediction model and outputting a ranking of key feature contributions. Consistency is compared against the risk levels in the supervision label field. The number of misjudgments, misjudgment ratio, and percentage of level deviation are statistically analyzed for each questionnaire version number and boundary version number stratum to obtain the risk prediction error and optimize it.
[0055] In this implementation scheme, by using the item score vectors and response behavior fields from the stable sample table as input, and combining dimensional aggregation and temporal difference fluctuation features, the generated feature set is trained using a gradient boosting decision tree model, effectively identifying the relationship between item scores and risk levels. This process, guided by hierarchical training and structural constraints, ensures the stability and consistency of the mental health risk prediction model across various questionnaire versions and boundary versions, thereby optimizing the accuracy of mental health risk prediction and continuously tracking and optimizing the prediction results.
[0056] Specifically, the measures for achieving closed-loop monitoring of mental health risk level prediction, error tracking, and version drift are as follows: In the rolling statistics of risk prediction results, the risk level is first monitored by questionnaire version number and risk level. If the risk level of that stratum changes abnormally n times (the rule for determining the change is that the difference between adjacent predicted levels is greater than or equal to m), a drift alarm will be generated and written into the drift monitoring log, recording the affected version number, boundary number, sample stratification, and drift type. Here, n represents the number of abnormal risk level changes, with a value between 2 and 5; m represents the minimum difference between two adjacent risk level predictions, with a value between 1 and 3. Subsequently, the system triggers a processing and triage process, transferring the relevant records to the queue for review and marking, and performing recalculation and supplementary data collection to ensure data accuracy. Based on representative samples in the stable sample table, the system compares and evaluates the deviation of item distribution, stable label values, comparable total scores, and risk level distributions under the same time window before and after the switch. It outputs changes in sample size, the degree of drift in key indicators, and changes in risk prediction error, and generates a control set report to track the performance trends of the mental health risk prediction model. The system updates the mental health risk prediction model and verifies mapping relationships, and archives monitoring records and evaluation results into the mental health database to ensure a traceable and iterative dynamic optimization loop.
[0057] In this implementation plan, by performing rolling statistics on risk prediction results and monitoring changes in risk levels in real time, the system generates a drift alarm when the number of abnormal jumps in the risk level of a certain stratum reaches n or the prediction error exceeds a predetermined range. The system records relevant information, such as the affected version number, boundary number, sample stratum, and drift type. Simultaneously, the system triggers subsequent review and supplementary data collection to ensure data quality and accuracy. During this process, based on representative samples from the stable sample table, changes in risk prediction error are evaluated, and the deviation values of the evaluation item distribution and the stability values of the labels are compared to generate a control set report, helping to track the performance trends of the mental health risk prediction model. Finally, through version updates and mapping relationship reviews, the system ensures real-time optimization and forms a traceable and iterative dynamic optimization closed loop within the mental health database.
[0058] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.
[0059] The preferred embodiments of the present invention disclosed above are merely illustrative of the invention. These preferred embodiments do not exhaustively describe all details, nor do they limit the invention to the specific implementations described. Clearly, many modifications and variations can be made based on the content of this specification. This specification selects and specifically describes these embodiments to better explain the principles and practical applications of the invention, thereby enabling those skilled in the art to better understand and utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Claims
1. A method for processing student mental health data based on feature extraction and model analysis, characterized in that, Includes the following steps: S1. Collect detailed and summary information of the items and bind the effective scope of the version ledger and boundary ledger simultaneously for preprocessing to build a mental health database; S2, calculate and include the statistics of items in the database according to the questionnaire version number, and iteratively fill in the missing items under the total score constraint to obtain the item score vector after completion; The specific measures for calculating and recording the statistical quantity of items in the database according to the questionnaire version number are as follows: Detailed records of items were collected from the mental health database according to questionnaire version number. Complete sample sets were selected based on item completeness. The mean of each item was calculated using maximum likelihood estimation, and the results were assembled into an item score mean vector. The pairwise covariance of each item was calculated using sample covariance estimation, and an item score covariance matrix was constructed. To reduce the impact of extreme responses on the statistics, quantile truncation was used to obtain the upper and lower bounds of the actual item distribution, which served as the source of subsequent pruning boundaries. Simultaneously, the square root of the diagonal elements of the covariance matrix was used to obtain item fluctuation statistics for residual allocation. The item score mean vector, item score covariance matrix, item fluctuation statistics, upper and lower bounds of the actual item distribution, statistical sample count, and calculation batch identifier were written into the item statistics table within the version. The specific measures for iteratively filling in missing items under the total score constraint to obtain the completed item score vector are as follows: For records marked as incomplete when the items are complete, the mean vector of item scores corresponding to the questionnaire version number, the item fluctuation statistics, and the upper and lower bounds of the actual item distribution are read. The constraint projection iteration method is used to fill in the missing items. First, the mean of item scores is used to fill in the missing items initially while keeping the observed item scores unchanged. Then, the sum of the original total score and the observed item scores is used to form the total score constraint. The projection of the filled vector on the total score constraint hyperplane is realized through Lagrange multiplier update. In each iteration, the missing items are updated using the residual amortization strategy based on the fluctuation ratio. The updated item scores are truncated based on the upper and lower bounds of quantiles to suppress the spread of outliers. After truncating, the residuals are recalculated and the next iteration is entered until the residual change of two consecutive iterations is less than the residual change threshold. The converged filled result is recorded as the filled item score vector. S3 uses multidimensional statistical features to quantify the differences between the supplementary sample and the same version distribution, screens reliable data and processes it in layers, and achieves historical data alignment, anomaly correction and consistent mapping of risk level. S4. Deviation and stability analysis is performed using data related to risk levels and item distribution with unified standards. Based on the analysis results, label shrinkage correction, anchor point backfilling, and sample diversion are performed. S5 generates structured features based on stable samples and builds a mental health risk prediction model according to version, realizing closed-loop monitoring of mental health risk level prediction, error tracking, and version drift.
2. The student mental health data processing method based on feature extraction and model analysis according to claim 1, characterized in that: The specific measures for constructing a mental health database are as follows: After collecting detailed and summarized question items and simultaneously binding the effective scope of the version ledger and boundary ledger, preprocessing is performed. Collect detailed data on each item: Obtain student anonymity identifier, response time, questionnaire version number, boundary version number, item identifier, item score, response time, missed answer location, submission method, raw total score, total score scale, and raw risk level from the mental health assessment portal. Simultaneously collect version effectiveness information and boundary effectiveness information matching the response time from the questionnaire version ledger and scoring boundary ledger. Complete unified indexing during the collection phase to ensure that each response record can be associated with a unique questionnaire version number and a unique boundary version number after collection. Standardize the fields in the item detail data package and summary data package, unifying field naming, data types, missing item representation, time format and encoding rules; normalize the item scores based on the item value range using the maximum and minimum value method, normalize the original total score by dividing by the full score, and generate item completeness markers and missing item lists; By combining the scoring boundary ledger, the text labels of each risk level are mapped to discrete values in order of level, generating a risk level value coding table. The intermediate level code is selected from the coding table and registered as the risk-neutral value. After standardization and normalization, the data is stored and a mental health database is constructed.
3. The student mental health data processing method based on feature extraction and model analysis according to claim 1, characterized in that: The specific measures for quantifying the differences between the supplementary sample and the same version distribution using multidimensional statistical features are as follows: Obtain the item score vector after completion for the i-th sample, the mean item score vector under version v, and the item score covariance matrix under version v; subtract the mean item score vector under version v from the item score vector after completion for the i-th sample to obtain the difference vector; perform the inverse operation on the item score covariance matrix under version v to obtain the inverse covariance matrix; transpose the difference vector and perform matrix multiplication operations with the inverse covariance matrix and the difference vector in sequence to obtain the quadratic form result; perform the square root operation on the quadratic form result to finally obtain the deviation value of the item distribution between the i-th sample and the item distribution of the same version.
4. The student mental health data processing method based on feature extraction and model analysis according to claim 1, characterized in that: The specific measures for filtering and stratifying reliable data to achieve historical data alignment, anomaly correction, and consistent mapping with risk levels are as follows: The system compares the deviation values of the item distribution with the deviation threshold in real time. When the deviation value of the item distribution is less than the deviation threshold, the record is written into the stable sample table and the record is marked with an alignment pass mark. This part of the record is selected as the benchmark sample. The system updates the total score percentile curve and mapping relationship using the stable sample according to the rolling time window, and generates a unified caliber comparable total score and a unified caliber risk level. When the deviation value of the item distribution is greater than or equal to the deviation threshold, it is written into the restricted alignment pool and the restricted reason code is recorded. The k-means clustering method is used with the molecular vector of the observed item in the current record as the clustering feature. Local clustering is performed using the benchmark sample set and missing items are filled in twice according to the item statistics in the cluster. After the mode of the item variance is superimposed on the diagonal of the version item distribution covariance matrix as a stabilizing term, the item distribution deviation value is recalculated. If the item distribution deviation value is still greater than or equal to the deviation threshold after recalculation, this record stops participating in the quantile summary update. Only the upper and lower bounds of the quantile position are given based on the cumulative count of the original total score in the version quantile bucket. The upper and lower bounds of the comparable total score of the unified caliber and the risk level value of the corresponding unified caliber risk level interval are output in the reference caliber quantile mapping table. At the same time, a review mark is written. If the revised record deviates less than the threshold, it is transferred to the stable sample table to participate in subsequent mapping relationships and statistical updates.
5. The student mental health data processing method based on feature extraction and model analysis according to claim 1, characterized in that: The specific measures for conducting deviation and stability analysis using standardized risk levels and item distribution data are as follows: Obtain the risk level value, corresponding item distribution deviation value, and risk neutrality value of the i-th sample after alignment with a unified standard; The grade deviation is obtained by calculating the difference between the risk grade value and the risk neutral value of the i-th sample; the contraction term is obtained by comparing the item distribution deviation value with the sum of the item distribution deviation values plus one; the grade adjustment amount to be recovered is obtained by multiplying the contraction term by the grade deviation; and the grade adjustment amount is obtained by subtracting the grade adjustment amount from the uniform risk grade value. Finally, the stabilized label value of the i-th sample is obtained.
6. The student mental health data processing method based on feature extraction and model analysis according to claim 1, characterized in that: The specific measures for performing label shrinkage correction, anchor point backfilling, and sample diversion based on the analysis results are as follows: By comparing the stable value of the tag with the stable threshold in real time, when the stable value of the tag is less than the stable threshold, a tag anchor point backfilling strategy is executed and a reproducible repair link is initiated. Under the same student anonymity identifier, forward and backward anchor points are retrieved in chronological order. The forward anchor point is the most recent record that met the conditions before the current record; the backward anchor point is the most recent record that met the conditions after the current record. When both forward and backward anchor points exist and have the same risk level, the consistent risk level is written to the revised tag field and the anchor point reference relationship is written to the log, and the original unified risk level is maintained. Write the reserved field for auditing; when the forward anchor is missing, the backward anchor is missing, or the risk levels of the two anchors are inconsistent, the revised label field writes the mark to be verified and generates the supplementary collection task field. The supplementary collection task field includes the retest request for the same questionnaire version and the supplementary collection request for missing items. At the same time, the record is written into the representation training pool. The rule for constructing sample pairs in the representation training pool is that the records of the same student in adjacent time windows constitute positive sample pairs, and the records of different students in the same batch constitute control sample pairs. During the training phase, only feature vectors are used and no supervised labels are used to avoid unstable labels from entering supervised training, but still retain the sample utilization value. When the stable value of the label is greater than or equal to the stable threshold, the risk level corresponding to the stable value of the label is written into the supervision label field and into the stable sample table. The stable sample table maintains representative sample records according to the questionnaire version number, boundary version number and risk level.
7. The student mental health data processing method based on feature extraction and model analysis according to claim 1, characterized in that: The specific measures for generating structured features based on stable samples and constructing a mental health risk prediction model according to version are as follows: The system uses the item score vector and response behavior field from the stable sample table as basic inputs. Based on the feature dictionary, it generates dimensional aggregation features and temporal difference fluctuation features in the mental health database. These are combined with the unified caliber comparable total score, item distribution deviation value, risk level value, and label stability value to form the input set. The risk level written into the supervision label field is used as the training label. A mental health risk stratification mechanism is used to determine the correlation direction between input features and risk levels, forming structural constraints. These constraints are implemented using training data stratified by questionnaire version number and boundary version number, risk level judgment consistency verification rules, and stable sample table representative sample comparison rules. A gradient boosting decision tree is used for training. Within each questionnaire version and boundary version stratification, the system learns the discriminative relationship between item scores and the unified caliber comparable total score on risk levels, constructing a mental health risk prediction model and outputting a ranking of key feature contributions. The risk level in the supervision label field is used as a reference to perform consistency comparisons of the prediction results. The system continuously calculates the number of misjudgments, misjudgment ratio, and the proportion of level deviation direction under each questionnaire version number and boundary version number stratification to obtain the risk prediction error.
8. The student mental health data processing method based on feature extraction and model analysis according to claim 1, characterized in that: The specific measures for achieving closed-loop monitoring of mental health risk level prediction, error tracking, and version drift are as follows: The risk prediction results are statistically analyzed in a rolling manner according to the questionnaire version number and risk level stratification. When the risk level of any stratum changes abnormally n times or the risk prediction error of the stratum is greater than the error threshold, a drift alarm is generated and written into the drift monitoring log. The affected version number, boundary number, sample stratification and drift type are recorded, and the handling diversion is triggered. The relevant records are transferred to the queue of those that need to be reviewed for recalculation and supplementary sampling. Based on representative samples from a stable sample table, the deviation of item distribution, label stability, and comparative evaluation of the distribution of comparable total scores and risk levels under the same time window before and after the switch are assessed. The results output the changes in sample size, the degree of drift of key indicators, and the changes in risk prediction error. A control set report is generated to track the performance trend of the mental health risk prediction model. The system provides version update and mapping relationship verification instructions for the mental health risk prediction model, and archives monitoring records and assessment results into the mental health database, forming a traceable and iterative dynamic optimization closed loop.