An autism child psychological time travel ability evaluation method and system based on multi-modal fusion
By using magnetic storyboards and multimodal data analysis, the standardization and multidimensional quantification of the assessment of autistic children's psychological time travel ability were solved, achieving the repeatability of assessment results and the accuracy of intervention guidance, and adapting to cross-cultural assessment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANDONG MANAGEMENT UNIV
- Filing Date
- 2026-03-16
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies cannot effectively assess the psychological time travel ability of children with autism. The assessment scenarios are not standardized, non-verbal behaviors are ignored, the assessment results cannot guide intervention practices, and there is a lack of multimodal integrated assessment programs.
A standardized assessment scenario was constructed using a magnetic time series storyboard. Combined with multimodal data collection and analysis, features were extracted using speech recognition, dynamic time warping, gesture and eye detection algorithms. A random forest model was used for multidimensional assessment and personalized intervention suggestions were generated.
It enables objective and quantitative assessment of psychological time travel capabilities, ensuring the repeatability and consistency of assessment results across assessors, providing precise intervention guidance, and adapting to different cultural backgrounds.
Smart Images

Figure CN122201822A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of special education assessment technology and artificial intelligence integration, specifically to a multimodal quantitative assessment method and system for the mental time travel (MTT) ability of children with autism spectrum disorder (ASD), and particularly to a standardized assessment technology linked with physical time-series training teaching aids. Background Technology
[0002] The ability to travel through time is a core indicator of higher-order cognitive development in children with autism. Its core manifestations are the ability to organize autobiographical memories in a time-logical manner and the ability to imagine cross-temporal scenarios.
[0003] In existing technologies, the assessment of this ability mainly relies on manual interviews and subjective scales, which has three major drawbacks: First, the assessment scenarios are not standardized, and different assessors use different prompts and stimulus materials, resulting in a lack of repeatability of assessment results; second, it only focuses on the performance of the language dimension, ignoring the synergistic relationship between non-verbal behaviors such as gestures and eye contact in autistic children and language, which is an important external manifestation of the psychological time travel ability; third, the assessment is disconnected from intervention, and the assessment results cannot directly guide specific intervention practices.
[0004] Meanwhile, existing autism assessment technologies are mostly single-modal (such as speech recognition only, facial expression analysis only), and there is no assessment scheme that combines "linguistic temporal logic + non-verbal multimodal + standardized physical scenarios". The applicant's previous academic research has confirmed that autistic children have significant temporal sequence disorder and lack of gesture-verbal synchronization in their autobiographical memory, and existing technologies cannot accurately capture these subtle features, nor do they have technical solutions that combine physical training scenarios with digital assessment algorithms. Summary of the Invention
[0005] The purpose of this invention is to provide a method and system for assessing the psychological time travel ability of children with autism based on multimodal fusion. By linking with a magnetic time sequence storyboard, a standardized assessment scenario is constructed to achieve an objective, quantitative, and multidimensional assessment of psychological time travel ability. At the same time, a closed loop of assessment and intervention is established to overcome the shortcomings of existing technologies.
[0006] This invention adopts a technical approach of "standardized scenarios using physical teaching aids + quantification of features using multimodal algorithms + model evaluation output results," and the specific technical solution is as follows: Standardized assessment scenario construction: Utilizing a magnetic time-series storyboard (whose physical structure is protected by a utility model patent), a unified autobiographical memory recall scenario is provided for the participating children. The storyboard is divided into three functional areas: "Past - Present - Future." Children must first arrange event cards from their personal experiences on the storyboard in chronological order, and then narrate the events based on the arrangement. This process transforms the abstract ability of psychological time travel into a standardized behavior of "physical sequencing + verbal narration," providing a unified benchmark for multimodal data collection. Multimodal data acquisition: High-definition cameras are used to capture children's facial expressions and hand gestures while operating the storyboard, and directional microphones are used to capture children's narration, achieving simultaneous acquisition of video streams (face and body) and audio streams; Language feature extraction: Audio streams are converted into text using ASR speech recognition technology. Based on a pre-set time marker dictionary, the frequency of occurrence of time conjunctions (such as "firstly" and "then") is counted. The accuracy of verb tenses is judged through grammatical analysis. The dynamic time warping (DTW) algorithm is used to compare the event sequence narrated by the child with the standard arrangement sequence on the storyboard, and the edit distance is calculated as the narrative timeline offset to quantify the degree of temporal logical confusion. Non-verbal feature extraction: The MediaPipe algorithm is used to extract key points of the child's hand bones, locate the moment of the gesture when operating the storyboard card, compare it with the time of appearance of time-related words in speech, and calculate the gesture-speech synchronization difference; the eye key point detection algorithm is used to generate an eye gaze heat map, and the time spent by the child on the assessor and the storyboard is statistically analyzed; the emotion recognition algorithm is used to analyze the consistency between facial emotion valence and narrative content (such as whether happy things are accompanied by positive expressions). Multimodal feature fusion assessment: Language feature vectors and non-language feature vectors are input into a pre-trained random forest assessment model, which is trained on multimodal data from 80 subjects (40 children with ASD and 40 children with TD, aged 4-6 years) and outputs standardized scores for language dimension (narrative coherence), time dimension (temporal logic), and non-language dimension (multimodal collaboration). Evaluation results output: The comprehensive evaluation index is calculated based on the preset weights (40% for time dimension, 35% for language dimension, and 25% for non-language dimension). The index is divided into low, medium, and high risk levels based on the norm database. Based on the shortcomings of each dimension, personalized intervention suggestions based on magnetic storyboards are generated (e.g., if the time dimension is weak, it is recommended to strengthen the time sequencing training of storyboards).
[0007] The beneficial effects of this invention include: Scenario standardization: By linking with physical magnetic storyboards, the subjective scenarios of human interviews are transformed into standardized physical operation scenarios, ensuring the repeatability of assessment results and consistency across assessors. Precise assessment: For the first time, “narrative timeline offset” and “gesture-verbal synchronization rate” are used as core assessment features, realizing multi-dimensional quantification of psychological time travel ability and making up for the limitations of single language assessment; Intervention integration: Assessment results directly target intervention points based on storyboards, achieving a closed loop of "assessment-intervention" and providing precise guidance for special education practice; Technical scalability: The system is compatible with time-marked word dictionaries from different cultural backgrounds, facilitating cross-cultural assessment (such as adapting to bilingual children in Hawaii) and possessing value for international cooperation and promotion. Detailed Implementation
[0008] The present invention will be further described in detail below with reference to specific embodiments.
[0009] Example 1: A Method for Assessing the Psychological Time Travel Ability of Children with Autism Based on Multimodal Fusion This example targets children with autism aged 4-6 years and utilizes a magnetic time-series storyboard to perform the following assessment steps: 1. Scene Preparation: Prepare a magnetic time-series storyboard (including a base, 10 personal experience event cards, and gesture prompt magnets). Mark the three sections of the storyboard as "Past" (red blocks), "Present" (yellow markers), and "Future" (green blocks). Preset the autobiographical memory task as "tell your weekend experiences from last week" and determine the standard event sequence as "wake up - eat breakfast - go to the park - go home - go to sleep". 2. Data Acquisition: Activate the high-definition camera (1080P, 30 frames / second) and directional microphone, and have the children arrange event cards from their weekend experiences on a storyboard. They then narrate the events based on the arrangement. Collect 5 minutes of video and audio streams. 3. Language Feature Extraction: Speech-to-text transcription: Using iFlytek's ASR technology, audio is converted into text, resulting in the narrative "Go to the park... then get up... eat breakfast"; Frequency of time-related conjunctions: 1 instance of "then" was found, with a frequency of 0.2 instances per minute; Narrative timeline offset: The children's narrative sequence was compared with the standard sequence using the DTW algorithm. The edit distance was 2, and the offset score was 60 points (out of 100 points, the smaller the edit distance, the higher the score). 4. Non-linguistic feature extraction: Gesture-speech synchronization rate: When a child says "then", the time difference between the gesture and the speech on the storyboard card is 200ms, which is considered to be synchronized, with a synchronization rate of 50%. Eye contact duration: Children spent 70% of their time looking at the storyboard, while the time spent looking at the assessor accounted for 10%. Emotional consistency: When describing "going to the park", the facial expression was neutral, which was inconsistent with the positive narrative content, and the consistency score was 50 points; 5. Model Evaluation: The above features were input into the pre-trained random forest model, which outputs 70 points for the language dimension, 60 points for the time dimension, and 55 points for the non-language dimension. The comprehensive evaluation index is 62 points, which is judged as medium risk.
[0010] 6. Results Output: Generate a 3D capability radar chart, identify the time dimension and non-verbal dimension as the main intervention targets, and provide the following suggestion: "Use the magnetic storyboard for 15 minutes daily for time sequencing + gesture-assisted narration training."
[0011] Example 2: A Multimodal Fusion-Based Assessment System for the Psychological Time Travel Ability of Children with Autism This embodiment provides a system for implementing the above method, comprising both hardware and software components: Hardware components: The data acquisition unit uses a high-definition wide-angle camera (covering the storyboard and the child's face) and a lavalier directional microphone; the processor uses a Core i7 processor to ensure real-time data processing; the display unit uses a 27-inch touchscreen to display evaluation results; Software component: Feature extraction unit: Built-in Python-based NLP module (equipped with jieba word segmentation and time word dictionary) and CV module (equipped with MediaPipe and DTW algorithms); Model storage unit: stores a pre-trained random forest model (89% accuracy) and a Chinese-English bilingual time marker dictionary (adapted for cross-cultural assessment). Linkage and adaptation unit: Automatically obtains standard event sequence data by reading the card identification code of the magnetic storyboard via Bluetooth; Results output unit: Uses Matplotlib to draw a 3D radar chart and generates an exportable PDF assessment report, including an intervention recommendation module. Attached Figure Description
[0012] Figure 1 (Summary Appendix): illustrates the complete closed-loop process from collecting multimodal data through a magnetic storyboard to feature extraction, model evaluation, and finally generating three-dimensional scores and personalized intervention recommendations.
[0013] Figure 2 (Method Flowchart): Rectangles are arranged sequentially, labeled with steps S1 to S5, and arrows connect to indicate the process order; S1 is labeled "Linking the magnetic storyboard to collect multimodal data", S2 is labeled "Extracting language features (time words / offsets)", S3 is labeled "Extracting non-language features (gestures-speech / eye contact)", S4 is labeled "Inputting the evaluation model and outputting dimensional scores", and S5 is labeled "Generating a comprehensive evaluation report + intervention recommendations".
[0014] Figure 3 (System Architecture Diagram): The center is the "Evaluation Core Module", surrounded by data acquisition units, feature extraction units, model storage units, linkage adaptation units, and result output units; arrows indicate the data flow direction, such as "Data Acquisition Unit → Feature Extraction Unit → Evaluation Core Module → Result Output Unit", "Linkage Adaptation Unit → Evaluation Core Module" (input storyboard standard sequence).
[0015] Figure 4 (Feature Extraction Diagram): The left side is "Video Stream Input", which is divided into two branches: "Facial Expressions" and "Hand Gestures (Storyboard Operation)", with the extracted features labeled; the right side is "Audio Stream Input", which is divided into two branches: "Time Connectors" and "Narrative Sequence", with the extracted features labeled; the bottom is summarized as "Multimodal Feature Vectors", pointing to "Evaluation Model".
[0016] Figure 5 (Radar chart example): The three-dimensional coordinate axes are labeled with "language dimension", "time dimension" and "non-language dimension" respectively, and the coordinate axis scale is 0-100; plot the ASD children's score curve (60, 70, 55) and the TD children's norm curve (85, 90, 88) to compare and show the differences.
Claims
1. A method for assessing the psychological time travel ability of autistic children based on multimodal fusion, characterized in that, Includes the following steps: S1: Establish a standardized assessment scenario, using a magnetic time-series storyboard as the core stimulus material to present a pre-set autobiographical memory task to the children, and simultaneously record the children's reaction video and audio streams through the acquisition device; S2: Preprocess the audio stream and perform speech recognition and transcription. Extract language feature vectors based on a preset time marker dictionary. The language feature vectors include the frequency of time connectors, tense accuracy, and narrative timeline offset. The narrative timeline offset is obtained by comparing the edit distance between the event sequence narrated by the child and the standard time sequence on the storyboard using a dynamic time warping algorithm. S3: Perform frame-level key point detection on the video stream and extract non-verbal feature vectors. The non-verbal feature vectors include eye gaze heatmap distribution, time synchronization difference between hand gestures and speech keywords, and consistency score between facial emotion valence and narrative content. The time synchronization difference is the absolute value of the time difference between the moment when the child speaks time concept words and the moment when the child performs gestures on the storyboard cards. S4: Input the linguistic feature vector and non-linguistic feature vector into a pre-trained psychological time travel assessment model, which is trained based on comparative data of typical developmental children and autistic children, and outputs standardized scores for the linguistic dimension, time dimension, and non-linguistic dimension. S5: Calculate the comprehensive evaluation index based on preset weights, and generate an evaluation report including a 3D radar chart, risk rating, and storyboard-based intervention targets by combining the norm database.
2. The method according to claim 1, characterized in that, The magnetic time-series storyboard described in step S1 includes a base plate divided into "past-present-future" functional areas, a linear magnetic guide rail, event cards with magnetic bumps, and a multimodal prompting component. The autobiographical memory task is to allow children to arrange and narrate their personal experience events through the storyboard.
3. The method according to claim 1, characterized in that, The time marker dictionary mentioned in step S2 contains three categories of entries: first-level time conjunctions, second-level time adverbs, and causal conjunctions. The accuracy of the tense is calculated by comparing the matching degree between the verb tenses in the transcribed text and the time sequence of standard events.
4. The method according to claim 1, characterized in that, The criterion for determining gesture-speech synchronization in step S3 is: when the time synchronization difference is ≤500ms, it is determined to be synchronized, and the synchronization rate is included in the non-verbal dimension score calculation.
5. A multimodal fusion-based assessment system for the psychological time travel ability of autistic children, used to implement the method described in any one of claims 1-4, characterized in that, include: The data acquisition unit includes a high-definition camera and a directional microphone, used to collect multimodal data from the children as they manipulate the magnetic time-series storyboard. The feature extraction unit has a built-in natural language processing engine and computer vision algorithm module, which respectively extract linguistic features and non-linguistic features; The model storage unit stores a pre-trained random forest evaluation model and a standardized time-stamped word dictionary; The linkage adaptation unit is used to read standard event sequence data from the magnetic time-series storyboard as a benchmark for comparing narrative timeline offsets; the results output unit is used to visualize the 3D capability radar chart and evaluation index, and generate a report containing storyboard intervention suggestions.