A handwriting psychological data intelligent analysis method, system and device

By constructing a handwriting psychology knowledge base and fine-tuning a large model, the problem of insufficient mapping between handwriting features and psychological states in existing technologies has been solved, achieving high coverage and interpretability of mental health analysis and improving the effectiveness of intelligent analysis of handwriting psychology data.

CN122201769APending Publication Date: 2026-06-12UNIV OF SCI & TECH BEIJING

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
UNIV OF SCI & TECH BEIJING
Filing Date
2026-03-11
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies lack fine-grained mapping relationships between handwriting features and psychological states in mental health assessments. The analysis results of traditional deep learning models lack interpretability, and the knowledge bases do not cover enough in the field of handwriting psychology, making it difficult to meet the needs of multi-dimensional features.

Method used

A handwriting psychology knowledge base was constructed. The text knowledge base was formed by OCR and M3E vectorization. The Qwen2-14B large model was fine-tuned using the LoRA framework to generate multiple types of retrieval questions for parallel retrieval of the text knowledge base. Semantic consistency merging and contradiction removal were performed, and the results were verified by psychological survey questionnaire data.

🎯Benefits of technology

It achieves a high coverage mapping between handwriting features and psychological states, improves the objectivity and interpretability of analysis results, enhances the completeness and generation quality of the knowledge base, and achieves a generation quality of 70.48% according to the Rouge-1 index.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122201769A_ABST
    Figure CN122201769A_ABST
Patent Text Reader

Abstract

The application provides a handwriting psychology data intelligent analysis method, system and device. The method comprises the following steps: constructing a handwriting psychology vector knowledge base, forming structured domain knowledge through data preprocessing and knowledge extraction; designing a multi-branch parallel retrieval enhancement generation mechanism, generating a dynamic query strategy based on the psychological classification results and the handwriting overall, local, split and comparison features, enhancing the prompt through multi-path retrieval and optimizing the generation quality of the large model, and realizing the automatic analysis of the handwriting psychological characteristics. The application innovatively combines the handwriting structure features with the psychology knowledge base, significantly improves the accuracy and interpretability of the psychological health analysis through the retrieval enhancement generation technology, and provides an intelligent solution for the handwriting psychology data processing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of handwriting psychology data processing, knowledge base and large model processing technology, and in particular to a method and system for intelligent analysis of handwriting psychology data based on knowledge base and large model. Background Technology

[0002] Traditional methods of mental health assessment rely primarily on subjective approaches such as questionnaires and clinical interviews. These methods are not only limited by the self-reporting biases of those being assessed and differences in expert experience, but also suffer from inherent limitations such as difficulty in large-scale implementation and insufficient timeliness. Especially when it comes to the dynamic changes in underlying psychological states, these methods often fail to capture subtle but crucial early signs, lacking effective collection and processing of objective data, which frequently leads to significant errors in judgment.

[0003] Breakthroughs in Natural Language Processing (NLP) and Computer Vision (CV) technologies have opened up new possibilities for psychological data recognition. Deep learning-based cross-modal models can uncover the correlation between behavioral characteristics and psychological states from diverse data sources such as speech, text, and images. For example, analyzing language expressions can infer emotional tendencies, while visual features in handwriting images, such as character structure and stroke pressure, can reflect an individual's emotional stability and cognitive state. However, existing technologies still face fundamental challenges: general models lack deep integration of knowledge in the field of handwriting psychology, making it difficult to establish fine-grained mappings between handwriting features (such as overall layout, local deformation, and contrast shift) and psychological traits (such as depressive tendencies and anxiety levels); the black-box nature of traditional deep learning models leads to a lack of interpretability in the analysis results; mainstream retrieval enhancement generation techniques use a single retrieval path, failing to collaboratively handle the multi-dimensional feature requirements of holistic, local, and contrastive aspects present in psychological data analysis; more critically, existing knowledge bases have severely insufficient coverage in the field of handwriting psychology, with document experiments showing significant gaps in core indicators such as the coverage of handwriting feature definitions and the correlation between psychological features. Summary of the Invention

[0004] To address the limitations of existing solutions in processing and analyzing psychological data, this invention provides a method, system, and device for intelligent analysis of handwriting psychological data based on a knowledge base and a large model. The specific technical solution is as follows: On the one hand, a method for intelligent analysis of handwriting psychological data is provided, including the following steps: S1. Construct a handwriting psychology knowledge base, collect handwriting psychology literature for recognition and text cleaning, and form a text knowledge base through vectorization processing. The text knowledge base contains a question-answer pair dataset. S2. Fine-tune the large model based on the LoRA framework, convert the question-answer pair dataset into a fine-tuning format, and train to obtain a large handwriting psychology model. S3. Generate multiple types of retrieval questions based on the handwriting features to be detected, and retrieve psychological association knowledge from the text knowledge base in parallel. The types of retrieval questions include psychological classification results, overall handwriting features, local handwriting features, handwriting comparison features, and handwriting splitting features. S4. Perform semantic consistency merging and contradiction removal on the psychological association knowledge obtained by parallel retrieval, and construct a large-scale psychological model of enhanced prompt word input handwriting. S5. Generate a comprehensive analysis report that includes handwriting feature interpretation and mental health assessment using a large handwriting psychology model, and verify the results by combining psychological survey questionnaire data.

[0005] On the other hand, a handwriting psychological data intelligent analysis system is provided, including: The system comprises the following modules: a knowledge base construction module, which collects literature data and constructs a text knowledge base using OCR and M3E vectorization, including a question-answer pair dataset; a large model fine-tuning module, which performs domain-specific fine-tuning of the Qwen model based on the LoRA framework to obtain a large handwriting psychology model; a multi-branch retrieval enhancement module, which generates multiple types of retrieval questions and executes parallel queries of the text knowledge base; a semantic processing module, which performs semantic consistency merging and contradiction removal on the psychologically related knowledge obtained from parallel queries, implements cross-category semantic merging, and constructs enhanced prompt words for inputting into the large handwriting psychology model; and an analysis result generation module, which integrates the retrieval knowledge and the output results of the large handwriting psychology model to generate a comprehensive analysis report, which includes handwriting feature interpretation and mental health assessment.

[0006] Furthermore, the knowledge base construction module includes: The data preprocessing unit performs image grayscale conversion and text regularization cleaning; the vectorization engine uses the M3E model to achieve high-precision conversion from text to vectors; the quality verification unit calculates the knowledge base integrity index: integrity score = 0.3 × feature coverage + 0.3 × psychological relevance + 0.2 × timeliness + 0.2 × terminology consistency.

[0007] Furthermore, the multi-branch retrieval module includes: The parallel retrieval unit is used to categorize and search the knowledge base in parallel for each question and extract handwriting category and psychological emotion information from the top k similar texts; the multi-level semantic merging unit sequentially performs part-of-speech verification and synonym replacement at the unary syntax layer, phrase semantic disambiguation at the bigram syntax layer, and sentence consistency verification at the L syntax layer; the cross-domain association analysis unit is responsible for analyzing the semantic associations of results from different categories and performing cross-category semantic merging; the conflict resolution unit implements conflict elimination and contradiction removal strategies based on the priority of psychological classification; and the dynamic sorting output unit sorts the processed comprehensive results by relevance and splices them to generate the final knowledge text.

[0008] Furthermore, the large model fine-tuning module includes: The model loading and configuration unit is used to load the weights and architecture of the Qwen2-14B pre-trained model, configure the LoRA fine-tuning parameters, and define the task type. The data formatting unit is used to concatenate the handwriting psychology question-and-answer dataset according to the dataset format of the large model fine-tuning, such as id, conversations, from, value, etc., and convert it into JSON format to form the training set for handwriting psychology large model fine-tuning. The model fine-tuning training unit is used to fine-tune the model on the training set using fine-tuning strategies and hyperparameters.

[0009] Furthermore, the system also includes an application interface module for providing a mental health analysis API and a front-end interactive interface. The application interface module includes: The handwriting data receiving unit receives handwritten handwriting images uploaded by users through the handwriting image upload interface, and performs format verification and preprocessing operations; the analysis engine scheduling unit schedules the handwriting structured feature extraction algorithm, mental health classification algorithm and knowledge base through the mental health analysis result acquisition interface, and activates the multi-branch parallel retrieval enhancement generation mechanism; the result return unit encapsulates the analysis results into structured data through the mental health analysis result return interface and pushes it to the front-end display page; the user management unit handles identity authentication and permission verification through the user registration interface and user login interface, and associates with the user data table; the report generation unit integrates the mental health analysis results through the health report generation interface and generates a standardized downloadable report document.

[0010] Furthermore, the present invention also provides an intelligent handwriting psychological data analysis device, comprising: processor; The memory stores computer-executable instructions, which, when executed by a processor, implement the handwriting psychological data intelligent analysis method as described above.

[0011] In addition, the present invention also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the intelligent handwriting psychological data analysis method described above.

[0012] The beneficial effects of the technical solutions provided in the embodiments of the present invention include at least the following: This solution utilizes OCR technology and domain knowledge extraction to construct a handwriting psychology vector knowledge base containing 11,017 text rules and 22,034 question-answer pairs. It comprehensively covers the mapping relationship between 58 features, including character shape, character spacing, and stroke pressure, and psychological states. After weighted evaluation based on seven indicators, including theoretical coverage and terminology consistency, the knowledge base achieved a completeness score of 0.84. Building upon this, a multi-branch parallel retrieval enhancement generation mechanism was designed—dynamically generating query questions based on four structural features of handwriting (overall, local, comparative, and split) linked to the initial psychological screening results. The knowledge retrieval process is optimized through semantic consistency merging and contradiction removal strategies. Simultaneously, LoRA technology was used to fine-tune the Qwen2-14B large model, achieving a generation quality of 70.48% on the handwriting psychology professional dataset, significantly improving the objectivity, professionalism, and interpretability of the analysis results. Attached Figure Description

[0013] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0014] Figure 1 This is a flowchart of a mental health intelligent analysis method based on a knowledge base and a large model provided by an embodiment of the present invention; Figure 2 This is a flowchart illustrating the construction of a knowledge base for handwriting psychology text provided in this embodiment of the invention; Figure 3 This is a flowchart illustrating the construction of a knowledge base for handwriting psychology question-and-answer pairs provided in an embodiment of the present invention; Figure 4 This is a basic process diagram of fine-tuning a large model using LoRA provided in an embodiment of the present invention; Figure 5 This is a basic process diagram of classification parallel retrieval provided in the implementation examples of this invention; Figure 6 This is a schematic diagram of the structure generated by the large model retrieval enhancement provided in the implementation examples of this invention. Detailed Implementation

[0015] In embodiments of the present invention, words such as "exemplarily," "for example," etc., are used to indicate that something is an example, illustration, or description. Any embodiment or design described as "exemplary" in the present invention should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of the word "exemplary" is intended to present the concept in a concrete manner. Furthermore, in embodiments of the present invention, the meaning expressed by "and / or" can be both, or either one.

[0016] In the embodiments of this invention, the terms "image" and "picture" may sometimes be used interchangeably. It should be noted that, without emphasizing the distinction between them, they convey the same meaning. Similarly, the terms "of," "corresponding (relevant)," and "corresponding" may sometimes be used interchangeably. It should be noted that, without emphasizing the distinction between them, they convey the same meaning.

[0017] To make the technical problems, technical solutions and advantages of the present invention clearer, a detailed description will be given below in conjunction with the accompanying drawings and specific embodiments.

[0018] This invention provides a method, system, and device for intelligent analysis of handwriting psychological data. This solution can be implemented by an intelligent handwriting psychological data analysis system, which can be a terminal or a server. Figure 1 As shown in the figure, the method is given in the form of steps S101 to S105, and its overall process mainly includes the following steps: S1. Construct a handwriting psychology knowledge base, collect handwriting psychology literature for OCR recognition and text cleaning, and form a text knowledge base and a question-answering pair knowledge base through M3E vectorization processing; S2. Based on the LoRA framework, fine-tune the Qwen large model for the domain, convert the handwriting psychology question-answering pair dataset into a fine-tuning format, and train to obtain a large model for the handwriting psychology domain; S3. Generate multiple types of retrieval questions based on user handwriting characteristics, and retrieve psychologically related knowledge from the knowledge base in parallel. The retrieval question types include psychological classification results, overall handwriting features, local handwriting features, handwriting comparison features, and handwriting splitting features; S4. Perform semantic consistency merging and contradiction removal on multi-branch retrieval results, and construct a large model after fine-tuning enhanced prompt word input; S5. Generate a comprehensive analysis report containing handwriting feature interpretation and mental health assessment through the domain large model, and verify the results by combining psychological survey questionnaire data. The implementation methods of each step are described in detail below.

[0019] Step S1 is the construction step of the handwriting psychology knowledge base, which generally includes: collecting handwriting psychology literature and image data, obtaining handwriting psychology text data through OCR recognition and text cleaning; performing semantic vectorization processing on the text data, using the M3E model to convert the text into 768-dimensional vectors and establishing a mapping relationship between the vectors and the original text, storing the vectorized data in a vector database such as Milvus to form a text knowledge base; on this basis, transforming the original knowledge into question-answer key-value pairs through prompting engineering to construct a handwriting psychology question-answer pair dataset; and calculating the knowledge base integrity score based on indicators such as feature coverage, psychological relevance, timeliness, and terminology consistency to verify the construction quality of the handwriting psychology knowledge base.

[0020] In step S1, combined Figure 2 As shown, the knowledge base for handwriting psychology texts is constructed as follows: images are collected from classic and well-known books and journals on handwriting psychology, such as *Graphology—Personality Through Handwriting*, *Handwriting Psychology*, *Chinese Character Handwriting Psychology*, and *Handwriting Analysis: The Key to Exploring the Inner World*. These documents describe, for example, how to infer psychological and personality traits through handwriting analysis. The documents also detail the correlation between different handwriting characteristics, such as character shape, spacing, strokes, and stress levels, and an individual's personality and emotions.

[0021] In this embodiment, the paddle_v3_OCR recognition algorithm is used to extract handwriting psychology knowledge and store it in a database, forming a handwriting psychology text dataset. This dataset contains over 11,017 pieces of handwriting psychology text knowledge, used for classification and retrieval. The handwriting psychology knowledge mainly includes the correlation between character shape, character spacing, strokes, etc., and an individual's personality, emotions, and mental state; the definition of handwriting structured features; and the mapping relationship between handwriting structured features and an individual's psychological emotions and personality state.

[0022] In this embodiment, M3E is used to process a large amount of textual knowledge and construct it into vectors. Potential semantic features are extracted from the text and converted into low-dimensional vectors; for example, each piece of text can be represented as a 768-dimensional vector. This vector representation contains the semantic information and psychological features of the text. Finally, it is stored in the Milvus vector retrieval database.

[0023] In step S1, such as Figure 3 As shown, a handwriting psychology question-answering pair dataset is constructed based on the handwriting psychology text knowledge base. Each piece of text knowledge in the handwriting psychology text knowledge base is treated as a question's answer value. A corresponding question is generated by summarizing prompts from a large model, serving as the key value, and stored in key-value pair format. During the construction of the question-answering pair knowledge base, prompting engineering is used to transform the original knowledge into "question-answer" key-value pairs. The question generation formula is:

[0024] Where Q represents the generated question text; The template for generating prompts for questions is represented by "LLM"; "Large Language Model Function" is represented by "Large Language Model Function"; and "Answer" represents the answer text that serves as the semantic basis for the question. This indicates string concatenation.

[0025] The prompting project includes: constructing prompt templates. This is used to constrain the large language model to infer the corresponding question text from the answer text. For example, This includes: role setting (e.g., 'Imagine you are an expert in handwriting psychology'), task instructions (e.g., 'You need to extract the original question from the following answers so that the answer can be obtained from the original question'), example comparison (providing a set of 'answer-original question' examples to guide the generation style), and output constraints (e.g., 'Please answer the original question directly; no extra text is required'). After being concatenated with Answer, it is input into a large language model to obtain the question text Q, which is then stored together with Answer as a question-answer pair knowledge in key-value pair form.

[0026] The following is a demonstrative example of a large-scale model for building question-answer pairs: "Imagine you are an expert in handwriting psychology, and you need to extract the original question from the following answers so that the answer can be derived from the original question. For example: The answer is that a significant rightward slant in handwriting indicates that the writer is emotionally rich, enthusiastic, and cheerful, full of curiosity and interest in the surrounding world. The original question is: What personality trait does a significant rightward slant in handwriting usually reflect in the writer?" Here is my answer: When the pressure of writing strokes is too heavy, it indicates that the writer is under considerable psychological pressure and experiencing anxiety. They face many difficulties in real life, and their stubbornness makes it difficult to change their minds once they've made up their minds about something.

[0027] Please answer the original question directly; no additional text is required. The generated questions, such as "Excessive pressure during handwriting strokes usually reflects what psychological state and personality traits of the writer?", are obtained through a large model and stored as keys. After generating the original questions, they are vectorized using M3E. The vectorized question-answer pairs are then stored. M3E can handle large amounts of textual knowledge, and vectorization is performed using M3E. M3E can extract semantic features from the text and convert them into low-dimensional vectors. These low-dimensional vectors contain semantic information and psychological features of the text. The low-dimensional vectors are then stored in the Milvus database. The handwriting psychology question-answer pair dataset contains a total of 22,034 handwriting psychology question-answer pairs, used for classification and retrieval of knowledge information. Each question-answer pair consists of a question key and an answer value, covering handwriting psychology content such as overall handwriting features, local handwriting features, handwriting comparison features, psychological and emotional characteristics, and mental health status.

[0028] Step S2 is the large model fine-tuning step, which generally includes: loading the Qwen2-14B pre-trained large model based on the LoRA framework, configuring fine-tuning parameters such as task type, LoRA rank, and Dropout ratio; converting the handwriting psychology question-and-answer dataset into a JSON training set according to the large model fine-tuning format, wherein the JSON includes at least fields such as dialogue ID, dialogue list, dialogue source (from), and dialogue content (value); training Qwen2-14B on the training set using a preset fine-tuning strategy and hyperparameters to obtain a large model in the field of handwriting psychology.

[0029] In step S2, such as Figure 4 As shown, fine-tuning the Qwen2-14B pre-trained large model using LoRA requires steps such as collecting and preprocessing handwriting psychology knowledge data, loading the pre-trained model, defining fine-tuning strategies and hyperparameters, fine-tuning the large model, and evaluating and optimizing the large model.

[0030] The collection and preprocessing of handwriting psychology knowledge data includes data cleaning and annotation. This includes removing handwriting noise and duplicate handwriting images. Text is converted into word sequences, and data segmentation is performed to form a training set for fine-tuning the handwriting psychology large-scale language model. The handwriting psychology question-and-answer pair dataset is mapped according to the fine-tuning requirements of the large-scale model, such as key-value pairs like id, conversations, from, and value, and converted to JSON format. The handwriting psychology large-scale model fine-tuning training set contains over 11,017 question-and-answer pairs, empowering the handwriting psychology large-scale model. The data is concatenated and formatted according to requirements, as follows: a unique identifier id is assigned to each question-and-answer pair, and an id identifier is added. The id is a sequentially generated numerical number, mainly used to distinguish different question-and-answer records, facilitating subsequent data management and retrieval. The questions and answers in the question-and-answer pairs are organized according to a specific structure under the "conversations" field, using a list format. Each element contains a dictionary structure with "from" indicating the source, "user" representing the questioner, "assistant" representing the answerer, and "value" representing the answer text. This constructs the content of the "conversations" field. After the above processing, all question-and-answer pairs with added key-value pairs such as id and conversations are organized into a JSON-compliant format. Finally, the converted JSON data is checked (e.g., using the json module in Python) to verify for syntax errors and missing data. Data corrections are made to form a fine-tuning training set for the handwriting psychology model.

[0031] The large-scale model fine-tuning specifically includes: loading the weights of the pre-trained Qwen2-14B model; configuring relevant information and defining the LoRA configuration, including parameters such as task type, LoRA rank, and Dropout ratio; fine-tuning the Qwen-2-14B weights using the LoRA method and optimizing the hyperparameters, such as learning rate, batch size, and number of training epochs; during fine-tuning, converting the question-answer pair data into a JSON format training set, with data fields including: dialogue ID "id", dialogue content list "conversations", dialogue source "from", and dialogue content "value"; and fine-tuning the model on the training set using the fine-tuning strategy and hyperparameters. During training, updating the model parameters through backpropagation and evaluating the model's performance on the validation set using metrics such as accuracy, F1 score, and ROUGE.

[0032] Step S3 is the multi-type retrieval question generation and parallel retrieval step, which generally includes: extracting structured features from the handwriting image uploaded by the user to obtain the user's handwriting structured features; obtaining the user's mental health classification result through a mental health classification model based on the handwriting structured features; generating mental health classification questions and handwriting structured feature questions through a preset template according to the mental health classification result and handwriting structured features, forming five types of retrieval questions: mental health classification result, overall handwriting features, local handwriting features, handwriting comparison features, and handwriting splitting features; and performing parallel retrieval of the above five types of questions in the handwriting psychology knowledge base (i.e., the text knowledge base) to obtain handwriting category information and psychological and emotional related knowledge from the top k similar texts.

[0033] The calculation method for handwriting structure features is as follows.

[0034] (1) Input data. The input for the handwriting structured features is the handwriting image uploaded by the user; in order to generate comparative features, standard Chinese character images can be further introduced as a comparison benchmark. The standard Chinese character images are, for example, from the GB2312 standard Chinese character dataset (for example, containing 3755 categories of standard Chinese character images, which can be used as a reference for calculating comparative features).

[0035] (2) Preprocessing and Layered Calculation Process. After preprocessing the handwriting image, the handwriting feature extractor is called to calculate the structured features in layers. The handwriting feature extractor includes, for example, seven modules: single character decomposition extraction module, single character feature extraction module, single character feature comparison module, line character feature extraction module, line character feature comparison module, overall character feature extraction module, and overall character feature comparison module, thereby realizing feature extraction and comparison calculation at the single character level, line-level local level, and overall layout level.

[0036] (3) Output format. The output of the handwriting feature extractor is a structured feature vector, for example, the output is a (1×35) handwriting structured feature vector, and the structured feature data is continuous data.

[0037] (4) Specific composition of structured features. The handwriting structured features include at least the features of individual characters in the handwriting, local features of the handwriting (line-level features), overall features of the handwriting, and features comparing the handwriting with standard Chinese characters. Among them: a) Individual character features: such as slant, absolute size, squareness, degree of connection, coordinates of the character's center of gravity, length of the character image outline, outline area, perimeter of the convex hull, curvature, degree of stroke curvature, etc.

[0038] b) Comparison features: For example, under the same character category, handwritten Chinese characters are aligned and compared with corresponding standard Chinese characters, and comparison features such as relative size, relative tilt, degree of centroid coordinate offset, relative contour length, relative contour area, relative perimeter of convex hull, relative curvature, and relative curvature of strokes are calculated.

[0039] c) Line-level local features: such as baseline slope and line centroid coordinates; and can further calculate the degree of offset of line centroid coordinates relative to the standard character.

[0040] d) Overall features: such as top margin, bottom margin, left margin, right margin, and multi-line center of gravity coordinates; and can further calculate the degree of offset between the overall center of gravity coordinates and the standard character's overall center of gravity coordinates.

[0041] (5) Example Algorithm Explanation. To facilitate understanding, several feature calculation examples are given: the line baseline slope can be obtained by extracting the coordinates of the bottom points of each Chinese character in a line of text and using least squares fitting to obtain the slope of the straight line equation; the line centroid coordinates can be obtained by weighted averaging of the centroid coordinates of each character in the line. The "centroid offset degree" in the comparison feature can be obtained by calculating the Euclidean distance between the centroid coordinates of handwritten Chinese characters and the centroid coordinates of standard Chinese characters and normalizing it to obtain the offset score.

[0042] The results of the mental health classification are obtained as follows.

[0043] (1) Input data and label sources. The category labels of the mental health classification results are derived from the mapping results after the subjects were assessed by psychological scales. For example, the Big Five Personality Model Questionnaire and the Eysenck Personality Questionnaire were used to test and assess the subjects' recent mood, personality, and character (e.g., a total of 120 questionnaire items), and the assessment results were divided into four levels: no psychological problems, level one psychological problems, level two psychological problems, and level three psychological problems, which were further mapped to discrete labels 0, 1, 2, and 3.

[0044] (2) Example of label mapping rules. The mapping principle for the "no psychological problem" label can be as follows: when the scores of each dimension in the Big Five Personality Model Questionnaire and the Eysenck Personality Questionnaire are within the normal range, and no persistent negative emotions are shown in the recent mood-related questions, it is mapped to "no psychological problem". Among them, the normal range is, for example, the scores of each dimension of the Big Five Personality Model are between 40 and 60, and the scores of each dimension of the Eysenck Personality Questionnaire are between 30 and 70, etc.; The mapping principle for Level 1 psychological problem labels can be as follows: when a certain dimension of the questionnaire scores slightly higher (for example, the neuroticism dimension in the Big Five personality disorder or the neuroticism dimension in the Eysenck Personality Questionnaire scores higher than 60), or when the recent mood-related questions include the description of "occasional negative emotions but can be adjusted relatively quickly", and other dimensions are not obviously abnormal, it is mapped to "Level 1 psychological problem". The mapping principle for the Level 2 psychological problem label can be as follows: when two or more dimensions show significant abnormalities, and recent mood-related questions show frequent negative emotions, it is mapped to "Level 2 psychological problem". Examples of significant abnormalities include: a neuroticism score of 70–80 and an extraversion score of 30–40 in the Big Five personality disorder questionnaire; a neuroticism score of 70–80 and a psychoticism score of 50–60 in the Eysenck Personality Questionnaire; and a description in recent mood-related questions stating "I experience low moods several times a week and find it difficult to adjust on my own". The mapping principle for Level 3 psychological problems can be as follows: when multiple dimensions show severe abnormalities, and recent mood-related questions indicate severe emotional problems, it is mapped to a "Level 3 psychological problem." Examples of severe abnormalities include: a neuroticism score exceeding 80 in the Big Five personality traits, while agreeableness and conscientiousness scores are below 40; a neuroticism score exceeding 80 and a psychoticism score exceeding 60 in the Eysenck Personality Questionnaire; and recent mood-related questions containing severe descriptions such as "almost constantly in a state of anxiety or depression, affecting normal life." The above mapping rules are used to generate the ground truth labels in the training data.

[0045] (3) Classification model construction during the training phase. Using a dataset of labeled handwritten images as training samples (e.g., containing 4412 handwritten images, which can be divided into training set / validation set / test set, e.g., 7:2:1), the corresponding handwriting structured feature vector (e.g., a 1×35 continuous numerical feature vector) is first calculated through a handwriting feature extractor for each sample. Then, the structured feature vector is input into the mental health classification model for training, so that the classification model learns the correspondence between the structured features and the above four-level mental health labels.

[0046] The mental health classification model can be implemented in various ways, such as: connecting a fully connected layer after the structured feature vector to form a classifier; or using a machine learning classification algorithm; or using a multi-model ensemble voting algorithm to obtain more robust classification results.

[0047] (4) Calculation of classification results in the reasoning stage. For the handwriting image currently uploaded by the user, the handwriting structured feature extraction is performed first to obtain the structured feature vector; then the structured feature vector is input into the trained mental health classification model to output the mental health classification result. The classification result is one of the four categories (e.g., no mental health problem, level one mental health problem, level two mental health problem, level three mental health problem), and the corresponding confidence level can be output simultaneously.

[0048] (5) Confidence calculation. The confidence score can be obtained from the category score output by the classification model, such as from the posterior probability or normalized score of each category output by the classification model; when using an ensemble voting algorithm, the confidence score can also be obtained from the voting consistency ratio.

[0049] Combination Figure 5 As shown, in step S3, for the five categories of questions—psychological classification results, overall handwriting features, local handwriting features, handwriting comparison features, and handwriting splitting features—a parallel search of the handwriting psychology knowledge base is performed for each category. The top k most similar answer texts are retrieved, from which handwriting category information, psychological emotions, and other related knowledge are extracted. This information is then merged into the overall result. During the retrieval process, when merging the search results, it is determined whether the questions are of the same type. If they are of the same type, semantic consistency merging is performed; if they are of different types, contradiction removal is performed. After processing, the results are merged into handwriting psychology knowledge.

[0050] In S3, multi-type retrieval question generation includes: The mental health classification model is used to generate mental health classification results (including, for example, no mental health problems, level 1 mental health problems, level 2 mental health problems, and level 3 mental health problems), and handwriting structure features (including single-character features of handwritten Chinese characters, partial features of a line of characters, overall features, and features compared to standard Chinese characters). The mental health classification results and handwriting structure features are used to generate mental health classification questions and handwriting structure feature questions through a template.

[0051] The template is a pre-defined question generation template library used to convert mental health classification results and handwriting structured features into natural language retrieval questions. The template library includes at least two categories: mental health classification question templates and handwriting structured feature question templates, and generates question text through variable filling.

[0052] (1) Mental Health Classification Question Template. For example, a mental health classification question template can be represented as: In handwriting psychology analysis, what target characteristics does {Class} reflect in the writer? Here, {Class} represents the label text for the mental health classification result, with examples including no mental health problems, Level 1 mental health problems, Level 2 mental health problems, and Level 3 mental health problems; {Target} represents the dimensions of concern, with examples including psychological characteristics, personality traits, and emotions / mood. During generation, the classification results are populated into {Class}, and one or more items are selected from the preset enumeration of {Target} to generate corresponding questions. For example, when {Class} = no mental health problems and {Target} = psychological characteristics, the generated question is "In handwriting psychology analysis, what psychological characteristics does the absence of mental health problems reflect in the writer?"; when {Class} = no mental health problems and {Target} = personality traits, the generated question is "In handwriting psychology analysis, what personality traits does the absence of mental health problems reflect in the writer?" (2) Template for Handwriting Structure Feature Problem. For example, the template for a handwriting structure feature problem can be represented as: In handwriting psychology analysis, what does {FeatDesc}{CompareClause} represent in terms of the writer's {Target}? Here, {FeatDesc} is a feature description phrase generated from structured features; {CompareClause} is an optional comparison clause used to compare standard Chinese character features, for example, "Compare with {StdRef} standard Chinese characters {CompareAspect}", where {StdRef} is GB2312, and {CompareAspect} is a description of the comparison dimension; {Target} is the issue focus dimension, examples include psychological, personality traits, recent emotions or meanings, etc. During generation, the system constructs {FeatDesc} according to the type of the structured feature (global feature, local feature, comparative feature, or split feature), and fills in {CompareClause} if it belongs to a comparative feature, otherwise {CompareClause} is left empty; then {Target} is filled in to obtain the corresponding issue. For example, when {FeatDesc} = large right-side margin, {CompareClause} is empty, and {Target} = meaning, the question "What does a large right-side margin mean in handwriting analysis?" is generated; when {FeatDesc} = large baseline slope of Chinese characters, {CompareClause} is empty, and {Target} = psychology, the question "In handwriting psychology analysis, what psychology does a large baseline slope of Chinese characters represent for the writer?" is generated; when {FeatDesc} = center of gravity is high, {CompareClause} = center of gravity is high compared to GB2312 standard Chinese characters, and {Target} = recent emotions and moods, the question "In handwriting psychology analysis, what recent emotions and moods does a center of gravity being high compared to GB2312 standard Chinese characters represent for the writer?" is generated; when {FeatDesc} = large squareness of Chinese characters, {CompareClause} is empty, and {Target} = psychology, the question "In handwriting psychology, what psychology does large squareness of Chinese characters represent?" is generated. Step S4 is the semantic consistency merging and contradiction removal step, which generally includes: merging similar search results using unary syntax to remove duplicate and redundant expressions; merging similar and dissimilar search results at multiple levels of unary, bigram, and L-gram syntax to ensure semantic consistency; for search results of different categories, defining the mapping relationship between features based on the cross-category semantic association rule base, using an attention mechanism to calculate semantic relevance scores, and retaining only the associated knowledge with scores higher than a preset threshold; in the contradiction removal stage, the relevant descriptions of psychological classification results are given the highest priority, and high-confidence psychological classification results are retained while feature interpretations with lower confidence are discarded, thereby obtaining comprehensive search knowledge with good consistency.

[0053] In step S4, during the knowledge processing for retrieving questions of the same category, semantic conflicts in the knowledge are removed through unary grammar merging, bigram grammar merging, and L-grammar merging. Unary grammar merging involves determining the consistency of parts of speech and semantics, such as synonym substitution, grammatical merging of individual words, and removal of redundant words and repetitive expressions. At the grammatical level, two words or phrases are merged to eliminate ambiguity through bigram grammar merging. For example, the emotional states of "excitement" and "excitement" can be classified into the same category. At the sentence level, syntactic and semantic conflicts are eliminated through L-grammar merging to ensure consistency between sentences.

[0054] In handling different categories of questions, multi-semantic understanding technology is used to establish a cross-category semantic association rule base, define the mapping relationship between different handwriting features, and analyze the potential associations between search results of different categories. For example, there is a relationship between "writing fluency" in the overall handwriting features and "stroke continuity" in the local handwriting features. An attention mechanism can be used to calculate the semantic relevance score between results of different categories.

[0055] Among them, S ij Q represents the semantic relevance score between the i-th "query item" and the j-th "candidate item". i Let K represent the query vector of the i-th search result. j Let d represent the key vector of the j-th candidate search result. k Represents the key vector K j The size of the dimension.

[0056] Only relevant knowledge with scores above a preset threshold (e.g., 0.7) is retained. Semantic merging is performed following the steps of unary grammar merging, bigram grammar merging, and L-grammar merging, integrating results from different categories that are semantically related, allowing cross-category information to complement and improve each other. Inconsistent information is reviewed and removed again to avoid logical conflicts between results from different categories; if contradictions exist between results from different subcategories, they must be resolved. A priority strategy is adopted, setting priorities: results from psychological classification questions are the most important, followed by results from questions on overall handwriting features, local handwriting features, handwriting comparison features, and handwriting splitting features. Finally, the comprehensive search results obtained after the above processing are sorted by relevance, concatenated into a string, providing comprehensive, accurate, and consistent handwriting psychology knowledge for handwriting psychological health analysis.

[0057] In step S5, based on the comprehensive search results obtained in step S4, a comprehensive analysis report including handwriting feature interpretation and mental health assessment is generated through a domain-wide model, and the results are verified by combining psychological survey questionnaire data. Specifically, the user's handwriting features and handwriting psychology knowledge after semantic merging and contradiction removal are fused through prompt word engineering to construct a handwriting psychology domain-wide model with enhanced prompt word input and fine-tuning. The domain-wide model generates a mental health analysis report, and the analysis conclusions can be compared with the user's psychological survey questionnaire results to verify the rationality and consistency of the generated results.

[0058] like Figure 6 As shown, in steps S3 to S5, the overall architecture of the large model multi-branch retrieval enhancement generation is as follows: knowledge is extracted from the handwriting psychology text dataset and question-answer pair dataset through the large model and stored as a text knowledge base and a question-answer pair knowledge base; the psychological health classification results are combined with the structural features of handwriting as a whole, part, split, and comparison to generate query sub-questions, and multi-branch knowledge information retrieval is performed using the handwriting psychology knowledge base to obtain knowledge in the field of handwriting psychology; the large model is fine-tuned on the dataset using the framework; the questions are fused with knowledge information in the field of handwriting psychology, and the prompt words are enhanced and input into the fine-tuned handwriting psychology large model to generate psychological health analysis results.

[0059] The experimental comparison models mainly include the large Qwen2-14b model and the fine-tuned large Qwen2-14b model.

[0060] Large-scale model evaluation primarily uses metrics such as ROUGE-1, ROUGE-2, ROUGE-L, F-score, Recall, and Precision to measure and evaluate the quality of Chinese text generated by the large-scale model. Below are the concepts and specific calculation formulas for ROUGE-1, ROUGE-2, ROUGE-L, F-score, Recall, and Precision.

[0061] ROUGE is used to evaluate the similarity between generated and reference text. ROUGE calculates n-gram overlap and the longest common subsequence between the generated and reference texts. ROUGE-1 measures the quality of the generated text based on the overlap of individual words between the two texts. ROUGE-1-Precision primarily measures how many words in the generated text appear in the reference text, as shown in the formula below.

[0062]

[0063] ROUGE-1-Recall primarily measures how many words from the reference text appear in the generated text, as follows:

[0064] ROUGE-2 measures the overlap of tuples between generated and reference text, specifically whether two consecutive words match. ROUGE-2-Precision primarily measures how many tuples in the generated text appear in the reference text, as shown in the formula below:

[0065] ROUGE-2-Recall primarily measures the number of tuples from the reference text that appear in the generated text. The calculation formula is as follows:

[0066] ROUGE-L measures the longest common subsequence (LCS) match between generated and reference text. The LCS is the longest subsequence in both texts that is not required to be consecutive but must be in the same order. The ratio of the LCS length to the total length of the generated text in ROUGE-L-Precision is as follows:

[0067] The ratio of the LCS length to the total length of the reference text in the ROUGE-L-Recall reference text is given by the following formula:

[0068] Experiments were conducted on a large model fine-tuning dataset, mainly comparing the differences in metrics between models such as Qwen2-14b and Qwen2-14b fine-tuned large models. The comparison results are shown in Table 1.

[0069] Table 1 Comparison results of large model evaluation experiments

[0070] The Qwen2-14b fine-tuned large model achieved a Rouge-1 Precision value of 69.88%, a Rouge-1 Recall value of 70.48%, a Rouge-2 Precision value of 58.64%, a Rouge-2 Recall value of 58.38%, a Rouge-L Precision value of 67.63%, a Rouge-L Recall value of 66.66%, a Precision value of 86.67%, a Recall value of 84.65%, and an F-score of 86.85%. Therefore, the experimental comparison table above shows that the Qwen2-14b fine-tuned large model performs better; the fine-tuned model outperforms the original large model in metrics such as Rouge-1, Rouge-2, Rouge-L, F-score, recall, and precision, resulting in better quality Chinese text generation. The fine-tuned handwriting model also demonstrates superior performance compared to the original large model, reflecting fundamental knowledge of handwriting psychology in the vertical domain.

[0071] The results show that the fine-tuned Qwen2-14b model exhibits strong comprehensive performance in the field of handwriting psychology. In the generation task, among the Rouge series metrics, Rouge-1's precision (69.88%) and recall (70.48%) are close, indicating that the model has high completeness and consistency in word semantic generation. Rouge-L's precision (67.63%) and recall (66.66%) are slightly lower than Rouge-1, indicating that the model still has room for optimization in the semantic coherence of long texts, limited by the complex symbolic expressions and unstructured features in the field of handwriting psychology. In the classification task, the model performs exceptionally well in precision (86.67%), recall (84.65%), and F-score (86.85%). The high recall, in particular, verifies the model's sensitivity and coverage of psychological features such as pen pressure and letter spacing, effectively reducing the risk of missed detections of key psychological labels. Compared to traditional small models, Qwen2-14b, through large-scale parameter fine-tuning, is better at capturing the implicit correlation between handwriting features and psychological states, and enhances its fine-grained feature parsing capabilities through domain knowledge injection. Overall, this demonstrates that the model has reached a practical level in structured classification tasks in handwriting psychology.

[0072] The above embodiments can be implemented, in whole or in part, by software, hardware (such as circuits), firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more sets of available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. A semiconductor medium can be a solid-state drive.

[0073] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0074] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention.

[0075] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method for intelligent analysis of handwriting psychological data, characterized in that, The method includes: S1. Construct a handwriting psychology knowledge base, collect handwriting psychology literature for recognition and text cleaning, and form a text knowledge base through vectorization processing. The text knowledge base contains a question-answer pair dataset. S2. Fine-tune the large model based on the LoRA framework, convert the question-answer pair dataset into a fine-tuning format, and train to obtain a large handwriting psychology model. S3. Generate multiple types of retrieval questions based on the handwriting features to be detected, and retrieve psychological association knowledge from the text knowledge base in parallel. The types of retrieval questions include psychological classification results, overall handwriting features, local handwriting features, handwriting comparison features, and handwriting splitting features. S4. Perform semantic consistency merging and contradiction removal on the psychological association knowledge obtained by parallel retrieval, and construct a large-scale psychological model of enhanced prompt word input handwriting. S5. Generate a comprehensive analysis report that includes handwriting feature interpretation and mental health assessment using a large handwriting psychology model, and verify the results by combining psychological survey questionnaire data.

2. The method according to claim 1, characterized in that, The handwriting psychology knowledge base constructed in S1 includes: Scanned images of classic works on handwriting psychology were collected, and the text data was extracted and standardized. The standardized text data underwent semantic vectorization, specifically: the M3E model was used to convert the text into 768-dimensional vectors; a mapping relationship was established between the vectors and the original text; the vectorized data was stored in a vector database to form a text knowledge base; during the construction of the question-answering dataset, the original knowledge was converted into "question-answer" key-value pairs through prompts, and the question generation formula was: Where Q represents the generated question text; The template for generating prompts for questions is represented by "LLM"; "Large Language Model Function" is represented by "Large Language Model Function"; and "Answer" represents the answer text that serves as the semantic basis for the question. This indicates string concatenation.

3. The method according to claim 1, characterized in that, The fine-tuning of the large model in S2 specifically includes: Load the Qwen2-14B pre-trained model and configure the LoRA parameters, which include: task type, LoRA rank, and Dropout ratio. Convert the question-answer pair dataset into a JSON format training set. The data fields of the training set include: dialogue ID "id", dialogue content list "conversations", dialogue source "from", and dialogue content "value".

4. The method according to claim 1, characterized in that, In S3, the methods for generating multi-type retrieval questions include: The system acquires user handwriting images and performs handwriting structured feature extraction to obtain handwriting structured features; based on the handwriting structured features, it obtains mental health classification results through a mental health classification model; the mental health classification results and the handwriting structured features are used to generate mental health classification questions and handwriting structured feature questions through a preset template; the handwriting structured features include: single-character features of handwritten Chinese characters, local features of a line of characters, overall features, and features compared with standard Chinese characters.

5. The method according to claim 1, characterized in that, In S4, semantic consistency merging and contradiction removal include: performing unary syntax merging on similar search results and deleting duplicate expressions; and performing L-syntax merging on different types of search results, including: establishing a cross-category semantic association rule base and defining the mapping relationship between features; and using an attention mechanism to calculate semantic relevance scores. Retain relevant knowledge with scores above the threshold; when removing contradictions, adopt a priority strategy: the results of psychological classification questions have the highest priority, followed by the results of handwriting overall features, handwriting local features, handwriting comparison features, and handwriting splitting features questions.

6. A handwriting psychological data intelligent analysis system, characterized in that, The system can be used to perform the method described in any one of claims 1-5. The system includes: a knowledge base construction module for collecting literature data and constructing a text knowledge base through OCR and M3E vectorization, the text knowledge base containing a question-answer pair dataset; a large model fine-tuning module for performing domain fine-tuning of the Qwen model based on the LoRA framework to obtain a large handwriting psychology model; a multi-branch retrieval enhancement module for generating multiple types of retrieval questions and executing parallel queries of the text knowledge base; a semantic processing module for performing semantic consistency merging and contradiction removal on the psychologically related knowledge obtained from parallel queries, implementing cross-category semantic merging, and constructing enhanced prompt words for inputting into the large handwriting psychology model; and an analysis result generation module for integrating the retrieved knowledge and the output results of the large handwriting psychology model to generate a comprehensive analysis report, the comprehensive analysis report including handwriting feature interpretation and mental health assessment.

7. The system according to claim 6, characterized in that, The knowledge base construction module includes: The data preprocessing unit performs grayscale conversion of scanned images and text regularization cleaning; the vectorization engine unit uses the M3E model to achieve high-precision text-to-vector conversion; the quality verification unit calculates the knowledge base integrity index. Completeness score = 0.3 × Feature coverage + 0.3 × Psychological relevance + 0.2 × Timeliness + 0.2 × Terminology consistency.

8. The system according to claim 6, characterized in that, The multi-branch retrieval enhancement module includes: The parallel retrieval unit is used to classify and retrieve text knowledge bases in parallel for each question, and extract handwriting category and psychological emotion information from the top k similar texts; the multi-level semantic merging unit sequentially performs part-of-speech verification and synonym replacement at the unary syntax layer, phrase semantic disambiguation at the binary syntax layer, and sentence consistency verification at the L syntax layer. The cross-domain association analysis unit is used to analyze the semantic association of search results of different categories and perform cross-category semantic merging; the conflict resolution unit implements conflict elimination and contradiction removal according to the priority of psychological classification; the dynamic sorting output unit sorts the comprehensive results processed by the conflict resolution unit according to their relevance and splices them to generate the final knowledge text.

9. The system according to claim 6, characterized in that, The semantic processing module includes: a multi-semantic association analysis unit, which uses multi-semantic understanding to analyze the potential associations between search results of different categories; and a semantic merging unit, which merges search results of different categories but semantically related according to the steps of unary grammar merging, bigram grammar merging, and L-grammar merging. The contradictory information screening and removal unit screens contradictory information between different subcategories and eliminates logical conflicts; the priority strategy unit sets the priority of search results and determines the importance of different categories; the result synthesis and output unit sorts the synthesized search results after the above processing according to relevance and concatenates them into a string for output.

10. A handwriting psychological data intelligent analysis device, characterized in that, include: processor; A memory storing computer-executable instructions that, when executed by a processor, implement the method of any one of claims 1-5.