A deep learning-based intelligent assessment method for children's visual health
By using the improved Depth Anything V2 model and reversible visual stimulus modulation, the problems of single stimulus and insufficient stability in children's visual health assessment are solved, enabling a refined and dynamic assessment of children's visual development status, and improving the reliability and early identification ability of the assessment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGDAO JINGMEI VISION EYE HOSPITAL CO LTD
- Filing Date
- 2026-03-09
- Publication Date
- 2026-06-12
Smart Images

Figure CN122201764A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent visual health assessment technology, and in particular to an intelligent assessment method for children's visual health based on deep learning. Background Technology
[0002] With the increasing time children spend using their eyes at close range and the widespread adoption of digital learning, children's visual health problems are showing a trend of occurring at younger ages and becoming more insidious. Objective and continuous intelligent assessment of children's visual development has become a research hotspot. Current technologies for assessing children's visual health mainly rely on visual acuity charts, refractive errors, or eye behavior analysis based on single visual stimuli. Some methods introduce video images or deep learning models to assist in the analysis of fixation behavior and eye movement characteristics, but these still have many shortcomings in practical applications. On the one hand, existing technologies typically analyze visual stimuli based on single stimulus conditions or static stimulus processes, lacking systematic modeling of changes in visual stimulus parameters. In particular, they fail to consider the reversibility and consistency of visual response behavior under both positive and negative changes in stimulus parameters, making it difficult to distinguish between temporary behavioral fluctuations and potential developmental abnormalities at the level of visual development mechanisms, resulting in insufficient stability of assessment results. On the other hand, existing image-based visual assessment methods largely rely on two-dimensional features, which have limited ability to characterize spatial changes in eye structures. When children experience head displacement or posture changes, key features such as pupillary changes, gaze deviation, and micro-eye movements are easily interfered with, affecting the reliability of the analysis results.
[0003] Furthermore, while some existing technologies incorporate deep learning models for feature extraction, they lack stable constraint mechanisms for the ocular region and continuous processing across time frames. The model output is prone to transient fluctuations when the stimulus sequence changes, making it difficult to support consistency analysis under multiple stimulus conditions. Therefore, existing technologies in children's visual health assessment generally suffer from problems such as simplistic stimulus modeling, insufficient spatial and temporal stability, and a lack of developmental mechanism determination methods based on positive and negative stimulus consistency analysis, making it difficult to meet the needs of refined and long-term visual health assessment.
[0004] Therefore, how to provide a deep learning-based intelligent assessment method for children's visual health is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention
[0005] One objective of this invention is to propose a deep learning-based intelligent assessment method for children's visual health. This invention fully utilizes a reversible visual stimulus modulation method oriented towards children's visual development, introduces an improved DepthAnything V2 model to stably extract depth information of the eye region, and combines time alignment and consistency calculation of non-obvious visual response features to achieve intelligent assessment of the stability of children's visual development mechanism and the evolution of visual health risks. It has the advantages of not requiring children's explicit cooperation, objective and stable assessment process, sensitivity to early visual developmental abnormalities, applicability to young children, and strong traceability of assessment results.
[0006] A deep learning-based intelligent assessment method for children's visual health according to an embodiment of the present invention includes the following steps: Step 1: Collect facial and eye image sequences of children under visual stimulus presentation conditions, and record the corresponding time information to form raw visual data; Step 2: Based on the original visual data, construct a reversible visual stimulation modulation method for children's visual development, so that the visual stimulation forms positive and negative stimulation sequences in terms of spatial frequency, contrast, rhythm or binocular input conditions. Step 3: Input the facial and eye image sequences into the improved Depth Anything V2 model for monocular depth estimation, introduce an eye region perception stabilization mechanism, apply cross-frame temporal continuity constraints to the depth results of the eye region, and perform instantaneous depth mutation suppression processing in the forward and reverse stimulus sequences to obtain eye depth information; Step 4: Perform geometric compensation on the child's head displacement based on eye depth information, and perform scale uniform processing on pupil diameter changes, fixation point displacement, micro-eye movement amplitude and blinking behavior to construct a set of non-obvious visual response features. Step 5: For both positive and negative stimulus sequences, perform time alignment on the formation process of the non-dominant visual response feature set, and calculate the consistency of the temporal sequence, duration, and stability of key visual response processes. Step 6: Determine the stability of the visual development mechanism based on the consistency calculation results. When the temporal sequence of key visual response processes under the reverse stimulus sequence cannot be reconstructed or the stability is reduced, it is determined that there is an abnormality in the development mechanism of the corresponding visual function, and a visual health risk evolution assessment result is generated. Step 7: Based on the results of the visual health risk evolution assessment, classify the children's visual development status and generate corresponding visual health assessment conclusions by combining the consistent changes under different stimulus conditions.
[0007] Optionally, step one specifically includes: The face and eye color images are captured frame by frame according to the preset acquisition frequency, and corresponding timestamp information is assigned to each captured image. The face and eye color images are sorted according to the timestamp information to form a face and eye image sequence. Data integrity processing is performed on the facial and eye image sequence. The data integrity processing includes: calculating the brightness distribution, contrast distribution, and effective pixel area ratio of each image frame; when the effective pixel area ratio of an image frame is lower than a preset threshold, or the brightness distribution and contrast distribution exceed a preset range, the corresponding image frame is removed from the facial and eye image sequence to obtain the original visual data.
[0008] Optionally, step two specifically includes: Within the acquisition time range corresponding to the original visual data, the start and end times of the visual stimulus presentation are set, and the acquisition time range is divided into multiple consecutive stimulus periods. Visual stimulation parameters are set for each stimulation period, including spatial frequency parameters, contrast parameters, rhythm parameters, and binocular input parameters. The positive stimulus sequence is a sequence of visual stimulus parameters that changes gradually between adjacent stimulus periods according to a preset parameter change order, forming a stimulus sequence in which the visual stimulus parameters change monotonically. The reverse stimulation sequence is based on the changing order of visual stimulation parameters in the positive stimulation sequence, so that the visual stimulation parameters change gradually between adjacent stimulation periods in the opposite order to the positive stimulation sequence, forming a stimulation sequence that corresponds to the positive stimulation sequence in terms of the number of stimulation periods and parameter values. The stimulation time periods in the positive and negative stimulation sequences are correlated with the facial and eye image sequences based on timestamp information.
[0009] Optionally, the improved Depth Anything V2 model includes a depth estimation processing module, an eye region constraint processing module, a temporal continuity constraint processing module, and a depth stabilization processing module. The depth estimation processing module performs feature extraction and ReLU nonlinear mapping processing on the face and eye image sequences, and performs depth regression processing to generate pixel-level depth results corresponding to each image frame. The pixel-level depth results constitute the initial depth results. The eye region constraint processing module takes the initial depth result as input and introduces an eye region perception stabilization mechanism. In the depth result processing, the eye region is explicitly constrained. An eye region identifier matrix with the same size as the image frame is generated, and the eye region identifier matrix is fused with the initial depth result by element-wise multiplication to obtain the eye region depth result.
[0010] The temporal continuity constraint processing module applies cross-frame temporal continuity constraints to the eye region depth results. The temporal continuity constraint processing module calculates cross-frame difference values for the eye region depth results corresponding to adjacent timestamps, applies an exponential decay function to the cross-frame difference values to generate temporal continuity weights, and applies the temporal continuity weights to the eye region depth results corresponding to the current timestamp in a weighted manner to obtain the temporally constrained eye region depth results. The depth stabilization processing module takes the temporally constrained eye region depth result as input and performs instantaneous depth mutation suppression processing in the forward and reverse stimulus sequences. The instantaneous depth mutation suppression processing refers to the processing method that suppresses the cross-frame change amplitude in the temporally constrained eye region depth result that exceeds a preset range. The cross-frame change amplitude is mapped to a suppression weight between zero and one by the Sigmoid function, and the suppression weight is applied to the corresponding temporally constrained eye region depth result in a weighted manner to generate stable eye depth information.
[0011] Optionally, step four specifically includes: By comparing the eye depth information corresponding to adjacent timestamps, the relative change of the eye region in spatial position is calculated, and the position of the eye depth information is corrected based on the relative change. The ocular response quantities related to ocular behavior in the ocular depth information are subjected to scale uniform processing. The ocular response quantities include pupil diameter change, fixation point displacement, micro-eye movement amplitude, and blink behavior parameters. After the scale-unified processing is completed, the various eye response quantities are combined in the order of timestamps to construct a set of non-obvious visual response features.
[0012] Optionally, step five specifically includes: The non-dominant visual response feature subsequences corresponding to the positive stimulus sequence and the non-dominant visual response feature subsequences corresponding to the negative stimulus sequence are time-aligned to form time-aligned positive and negative response sequences. Key visual response processes are extracted from time-aligned positive and negative response sequences. The key visual response process refers to a continuous time segment in which the eye response features show significant changes during the stimulus change process. The significant changes are determined by the fact that the feature change amplitude under adjacent timestamps exceeds a preset threshold. For key visual response processes, the time sequence parameters, duration parameters, and stability parameters of the response process are calculated for both the forward and reverse response sequences. Based on time sequence parameters, duration parameters, and stability parameters, consistency calculations are performed on the key visual response processes of the positive and negative response sequences to obtain consistency calculation results.
[0013] Optionally, step six specifically includes: The consistency calculation result is compared with the preset stability judgment interval. The stability judgment interval is three consecutive intervals divided according to the numerical range of the consistency calculation result. When the consistency calculation result falls into the first stability judgment interval, the visual development mechanism is determined to be in a stable state. When the consistency calculation result falls into the second stability judgment interval, the visual development mechanism is determined to be in a suspicious stable state. When the consistency calculation result falls into the third stability judgment interval, the visual development mechanism is determined to be in an unstable state. When the visual development mechanism is determined to be in a suspicious stable or unstable state, the trend of the consistency calculation result within a continuous time window is analyzed. The trend is the direction and magnitude of the change of the consistency calculation result within multiple adjacent time windows. When the trend shows a continuous decline or the fluctuation exceeds a preset trend threshold, it is determined that the visual development mechanism has abnormal evolutionary characteristics. Based on the stability assessment results of visual development mechanism and abnormal evolution characteristics, visual health risk evolution assessment results are generated.
[0014] Optionally, step seven specifically includes: The visual health risk evolution assessment results are integrated over a time period within a preset assessment period. The time integration process involves summarizing the visual health risk evolution assessment results obtained in multiple consecutive assessment periods over a time series to obtain the risk status result after period integration. Based on the risk status results after periodic integration, the visual health assessment conclusions are graded and labeled. The visual health assessment conclusions after hierarchical labeling are associated with the corresponding assessment timestamps and stored to form a visual health assessment record.
[0015] The beneficial effects of this invention are: This invention introduces an improved Depth Anything V2 model for monocular depth estimation during the visual information perception stage. Compared to traditional two-dimensional image feature analysis methods, it can directly acquire spatial depth information of the eye region, thereby more accurately depicting the spatial changes of the pupil, eyelids, and gaze behavior. The Depth Anything V2 model has good cross-scene generalization ability, is suitable for complex lighting and posture change conditions, and can achieve stable depth perception without relying on additional hardware, providing more reliable basic data for children's visual behavior analysis.
[0016] This invention introduces an eye region perception stabilization mechanism, cross-frame temporal continuity constraints, and instantaneous depth mutation suppression processing during model inference to specifically improve the model output. On the one hand, by explicitly limiting the eye region, the depth estimation results are focused on the pupil, iris, and eyelid regions closely related to visual function, reducing the impact of noise from non-critical regions on the analysis results. On the other hand, through cross-frame temporal continuity constraints and instantaneous depth mutation suppression processing, cross-frame jumps caused by changes in stimulus parameters, minor head displacements, or model uncertainties are effectively suppressed, improving the stability and continuity of eye depth information in the temporal dimension and providing smoother and more reliable data support for subsequent visual response analysis.
[0017] This invention introduces a reversible visual stimulus modulation mechanism by constructing positive and negative stimulus sequences. It analyzes children's visual response behavior from the perspective of the consistency between positive and negative stimuli in the visual stimulus change process, overcoming the limitations of existing technologies that rely solely on a single stimulus condition. By calculating the consistency of key visual response processes across multiple structural features such as temporal sequence, duration, and stability, it can identify whether visual response behavior possesses reversible consistency characteristics at the mechanistic level. This allows for a more effective distinction between temporary behavioral fluctuations and potential visual developmental abnormalities, improving the sensitivity and reliability of visual health assessment.
[0018] This invention introduces methods for determining the stability of visual development mechanisms and analyzing risk evolution during the assessment phase. It combines consistency calculation results with trends over time to achieve graded assessment and evolutionary evaluation of children's visual health risks. By further combining consistency changes under different stimulus conditions to generate visual health assessment conclusions, it not only reflects the current state of visual development but also depicts trends in visual function changes, providing more valuable assessment results for long-term monitoring and early intervention of children's visual health.
[0019] In summary, this invention, through the organic combination of the improved Depth Anything V2 model and the stimulus reversibility analysis mechanism, outperforms existing technologies in terms of spatial perception accuracy, temporal stability, mechanism-level judgment ability, and long-term evolution assessment. It can achieve a more objective, stable, and refined intelligent assessment of children's visual health status, and has significant technological advancement and application value. Attached Figure Description
[0020] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings: Figure 1 This is a flowchart of a deep learning-based intelligent assessment method for children's visual health proposed in this invention; Figure 2 This is a schematic diagram of a deep learning-based intelligent assessment method for children's visual health proposed in this invention. Figure 3 This is a framework diagram of the improved Depth Anything V2 model in the deep learning-based intelligent assessment method for children's visual health proposed in this invention. Detailed Implementation
[0021] The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.
[0022] refer to Figure 1-3 A deep learning-based intelligent assessment method for children's visual health includes the following steps: Step 1: Collect facial and eye image sequences of children under visual stimulus presentation conditions, and record the corresponding time information to form raw visual data; Step 2: Based on the original visual data, construct a reversible visual stimulation modulation method for children's visual development, so that the visual stimulation forms positive and negative stimulation sequences in terms of spatial frequency, contrast, rhythm or binocular input conditions. Step 3: Input the facial and eye image sequences into the improved Depth Anything V2 model for monocular depth estimation, introduce an eye region perception stabilization mechanism, apply cross-frame temporal continuity constraints to the depth results of the eye region, and perform instantaneous depth mutation suppression processing in the forward and reverse stimulus sequences to obtain eye depth information; Step 4: Perform geometric compensation on the child's head displacement based on eye depth information, and perform scale uniform processing on pupil diameter changes, fixation point displacement, micro-eye movement amplitude and blinking behavior to construct a set of non-obvious visual response features. Step 5: For both positive and negative stimulus sequences, perform time alignment on the formation process of the non-dominant visual response feature set, and calculate the consistency of the temporal sequence, duration, and stability of key visual response processes. Step 6: Determine the stability of the visual development mechanism based on the consistency calculation results. When the temporal sequence of key visual response processes under the reverse stimulus sequence cannot be reconstructed or the stability is reduced, it is determined that there is an abnormality in the development mechanism of the corresponding visual function, and a visual health risk evolution assessment result is generated. Step 7: Based on the results of the visual health risk evolution assessment, classify the children's visual development status and generate corresponding visual health assessment conclusions by combining the consistent changes under different stimulus conditions.
[0023] In this embodiment, step one specifically includes: The face and eye color images are captured frame by frame according to the preset acquisition frequency, and corresponding timestamp information is assigned to each captured image. The face and eye color images are sorted according to the timestamp information to form a face and eye image sequence. Data integrity processing is performed on the facial and eye image sequence. The data integrity processing includes: calculating the brightness distribution, contrast distribution, and effective pixel area ratio of each image frame; when the effective pixel area ratio of an image frame is lower than a preset threshold, or the brightness distribution and contrast distribution exceed a preset range, the corresponding image frame is removed from the facial and eye image sequence to obtain the original visual data.
[0024] In this embodiment, step two specifically includes: Within the acquisition time range corresponding to the original visual data, the start and end times of the visual stimulus presentation are set, and the acquisition time range is divided into multiple consecutive stimulus periods. Visual stimulation parameters are set for each stimulation period. These parameters include spatial frequency parameters, contrast parameters, rhythm parameters, and binocular input parameters. The spatial frequency parameters are used to characterize the spatial distribution density of brightness variations in the visual stimulation pattern. The contrast parameters are used to characterize the relative difference between the maximum and minimum brightness values in the visual stimulation. The rhythm parameters are used to characterize the rate of change of the visual stimulation parameters over time. The binocular input parameters are used to characterize the consistency or difference in visual stimulation received by both eyes. The positive stimulus sequence is a sequence of visual stimulus parameters that changes gradually between adjacent stimulus periods according to a preset parameter change order, forming a stimulus sequence in which the visual stimulus parameters change monotonically. The reverse stimulation sequence is based on the changing order of visual stimulation parameters in the positive stimulation sequence, so that the visual stimulation parameters change gradually between adjacent stimulation periods in the opposite order to the positive stimulation sequence, forming a stimulation sequence that corresponds to the positive stimulation sequence in terms of the number of stimulation periods and parameter values. The stimulation time periods in the positive and negative stimulation sequences are correlated with the facial and eye image sequences based on timestamp information.
[0025] In this embodiment, the improved Depth Anything V2 model includes a depth estimation processing module, an eye region constraint processing module, a temporal continuity constraint processing module, and a depth stabilization processing module. The depth estimation processing module performs feature extraction and ReLU nonlinear mapping processing on the face and eye image sequences, and performs depth regression processing to generate pixel-level depth results corresponding to each image frame. The pixel-level depth results are composed of continuous values and are used to represent the depth estimation value of each pixel position. The pixel-level depth results constitute the initial depth results. The eye region constraint processing module takes the initial depth result as input and introduces an eye region perception stabilization mechanism to explicitly limit the eye region during depth result processing. The eye region is the set of pixels corresponding to the pupil region, iris region, and eyelid region in the face and eye image sequence. An eye region identification matrix with the same size as the image frame is generated. The eye region identification matrix is a two-dimensional numerical matrix. The matrix elements indicate whether the corresponding pixel position belongs to the eye region. When the matrix element is a valid identification value, it means that the corresponding pixel belongs to the eye region. When the matrix element is an invalid identification value, it means that the corresponding pixel does not belong to the eye region. The eye region identification matrix is fused with the initial depth result by element-wise multiplication to obtain the eye region depth result processed by the eye region perception stabilization mechanism.
[0026] The temporal continuity constraint processing module applies cross-frame temporal continuity constraints to the eye region depth results. The cross-frame temporal continuity constraint refers to a constraint method that limits the magnitude of depth change over time by associating depth results corresponding to adjacent timestamps. The temporal continuity constraint processing module calculates cross-frame difference values for the eye region depth results corresponding to adjacent timestamps. The cross-frame difference value is the depth difference at the same pixel position in adjacent frames, which represents the magnitude of change of the eye region depth results in the time dimension. An exponential decay function is applied to the cross-frame difference value to generate temporal continuity weights, and the temporal continuity weights are applied to the eye region depth results corresponding to the current timestamp in a weighted manner to obtain the temporally constrained eye region depth results after cross-frame temporal continuity constraint processing. The depth stabilization processing module takes the temporally constrained eye region depth results as input and performs instantaneous depth mutation suppression processing in the forward and reverse stimulus sequences. The instantaneous depth mutation suppression processing refers to the processing method that suppresses the cross-frame variation amplitude in the temporally constrained eye region depth results that exceeds a preset range. The cross-frame variation amplitude is mapped to a suppression weight between zero and one by the Sigmoid function, and the suppression weight is applied to the corresponding temporally constrained eye region depth results in a weighted manner to reduce the impact of instantaneous abnormal changes on the depth results and generate stable eye depth information.
[0027] In this embodiment, step four specifically includes: By comparing the eye depth information corresponding to adjacent timestamps, the relative change of the eye region in spatial position is calculated, and the position of the eye depth information is corrected based on the relative change to eliminate the depth shift caused by head movements in the front-back, left-right, or up-down directions. The eye response quantities related to eye behavior in the eye depth information are subjected to scale unification processing. The scale unification processing includes mapping eye response quantities with different dimensions and different value ranges to a unified numerical range. The eye response quantities include pupil diameter change, fixation point displacement, micro-eye movement amplitude, and blink behavior parameters. The pupil diameter change is a numerical representation of the change in pupil area size at adjacent time stamps. The fixation point displacement is the displacement distance of the eye fixation center in the image plane. The micro-eye movement amplitude is the displacement amplitude of small eye movements in a short period of time. The blink behavior parameters are time series parameters characterizing changes in eyelid opening and closing state. After the scale-unified processing is completed, the various eye response quantities are combined in the order of timestamps to construct a set of non-obvious visual response features.
[0028] In this embodiment, step five specifically includes: The non-dominant visual response feature subsequences corresponding to the positive stimulus sequence and the non-dominant visual response feature subsequences corresponding to the negative stimulus sequence are time-aligned to form time-aligned positive and negative response sequences. Key visual response processes are extracted from time-aligned positive and negative response sequences. The key visual response process refers to a continuous time segment in which the eye response features show significant changes during the stimulus change process. The significant changes are determined by the fact that the feature change amplitude under adjacent timestamps exceeds a preset threshold. For key visual response processes, the temporal sequence parameter, duration parameter, and stability parameter of the response process are calculated for the forward response sequence and the reverse response sequence, respectively. The temporal sequence parameter represents the sequential arrangement of key visual response processes on the time axis, the duration parameter represents the duration of key visual response processes on the time axis, and the stability parameter represents the degree of fluctuation of the amplitude of changes in eye response features during key visual response processes. Based on time sequence parameters, duration parameters, and stability parameters, consistency calculations are performed on the key visual response processes of the positive and negative response sequences. The consistency calculations include: Compare the temporal sequence parameters of the corresponding key visual response processes in the positive and negative response sequences, and count the proportion of response process pairs with consistent temporal sequence to the total number of response process pairs to obtain the temporal sequence consistency value. The duration parameters of the corresponding key visual response processes in the positive response sequence and the negative response sequence are calculated by difference. When the difference falls within the preset duration tolerance range, it is determined that the duration is consistent. The proportion of the number of response processes with consistent duration to the total number of response processes is counted to obtain the duration consistency value. The stability parameters of the corresponding key visual response processes in the positive response sequence and the negative response sequence are calculated by difference. When the difference is less than the preset stability deviation threshold, the stability is determined to be consistent. The proportion of the number of response processes with consistent stability to the total number of response processes is counted to obtain the stability consistency value. The consistency values of time sequence, duration, and stability are weighted and aggregated according to preset weights to obtain the consistency calculation result.
[0029] In this embodiment, step six specifically includes: The stability of the visual development mechanism is determined based on the consistency calculation results. The stability of the visual development mechanism is a stability index that characterizes whether the visual response behavior structure of children remains reversibly consistent under positive and negative stimulus sequences. The consistency calculation result is compared with the preset stability judgment interval. The stability judgment interval is three consecutive intervals divided according to the numerical range of the consistency calculation result. When the consistency calculation result falls into the first stability judgment interval, the visual development mechanism is determined to be in a stable state. When the consistency calculation result falls into the second stability judgment interval, the visual development mechanism is determined to be in a suspicious stable state. When the consistency calculation result falls into the third stability judgment interval, the visual development mechanism is determined to be in an unstable state. When the visual development mechanism is determined to be in a suspicious stable or unstable state, the trend of the consistency calculation result within a continuous time window is analyzed. The trend is the direction and magnitude of the change of the consistency calculation result within multiple adjacent time windows. When the trend shows a continuous decline or the fluctuation exceeds a preset trend threshold, it is determined that the visual development mechanism has abnormal evolutionary characteristics. Based on the stability assessment results of the visual development mechanism and the abnormal evolution characteristics, a visual health risk evolution assessment result is generated. The visual health risk evolution assessment result includes a risk level indicator and a risk development trend indicator corresponding to the stability of the visual development mechanism.
[0030] In this embodiment, step seven specifically includes: The visual health risk evolution assessment results are integrated over a preset assessment period. The time integration process involves summarizing the visual health risk evolution assessment results obtained in multiple consecutive assessment periods in a time series to eliminate the impact of fluctuations in a single assessment on the overall conclusion. By weighting and statistically analyzing the risk level indicators corresponding to different assessment periods and combining them with risk development trend indicators for consistency judgment, the risk status result after period integration is obtained. Based on the risk status results after periodic integration, the visual health assessment conclusions are graded and labeled, including normal level, warning level and intervention recommendation level. The visual health assessment conclusions after hierarchical labeling are associated and stored with the corresponding assessment timestamps to form a visual health assessment record.
[0031] Example 1: To verify the feasibility of this invention in practice, it was applied to the screening, assessment, and follow-up management of children's visual health at a maternal and child health hospital. This hospital has long been responsible for preschool physical examinations, preschool health assessments, and visual health follow-ups for children in its jurisdiction, serving a wide age range with significant differences in children's cooperation levels. Previously, in practice, assessments of children's visual health mainly relied on standard visual acuity charts, refractive error testing, and doctors' experience. While these methods are effective in detecting significant vision decline, they are often difficult to identify in a timely manner for children who have not yet shown vision decline but only exhibit abnormalities in visual development mechanisms or trends of risk evolution, especially in younger children. This embodiment introduces the deep learning-based intelligent assessment method for children's visual health described in this invention. Without altering children's natural behavioral patterns, it quantifies and analyzes the non-obvious responses of children's visual system during stimulus changes, thereby overcoming the shortcomings of traditional methods.
[0032] In practical application, the hospital deploys standardized visual stimulus presentation terminals and image acquisition equipment in the visual screening room. During the testing process, children only need to watch the visual stimuli displayed on the screen while sitting naturally. The stimuli consist of graphics or animations that conform to children's visual cognitive characteristics, and their parameters, such as spatial frequency, contrast, rhythm, and binocular input conditions, vary according to a preset scheme. While the stimuli are presented, the system continuously acquires images of the child's face and eyes and records the corresponding time information. The entire acquisition process lasts approximately 3 to 5 minutes and does not cause fatigue or psychological burden to the child. After acquisition, the system automatically performs quality control processing on the image sequence, removing invalid frames caused by the child briefly turning their head, occlusion, excessive blinking, or abnormal lighting, ensuring the stability of subsequent data analysis.
[0033] Based on the original visual data, the system constructs forward and reverse stimulus sequences, ensuring that the visual stimulus parameters exhibit a completely symmetrical modulation pattern in both directions but with opposite directions of change. This design allows the child's visual system to receive structurally consistent stimulus inputs under two conditions, enabling the assessment of the stability of visual development mechanisms by comparing the consistency of the visual response process. Subsequently, the system uses an improved deep learning monocular depth estimation model to process facial and eye images, focusing on extracting depth information of the eye region closely related to visual behavior. Through temporal continuity constraints and transient mutation suppression mechanisms, a smooth and reliable eye depth sequence is obtained. Furthermore, geometric compensation is applied to the child's head displacement to eliminate the influence of posture changes on visual response analysis.
[0034] The system further extracts multidimensional non-obvious visual response features from stable eye depth information, such as pupil diameter changes, fixation point displacement, micro-eye movement amplitude, and blinking behavior. These features are then scaled and time-aligned to form a continuous set of non-obvious visual response features. By calculating the consistency of the temporal sequence, duration, and stability of key visual response processes under both positive and negative stimulus conditions, the system obtains a comprehensive consistency result reflecting the stability of children's visual development mechanisms.
[0035] To fully validate the effectiveness of the method of this invention, 180 children who underwent continuous screening at the hospital were selected as the experimental sample, including 60 children aged 3–5 years, 70 children aged 6–8 years, and 50 children aged 9–10 years. All children underwent routine vision examinations during the same period and simultaneously received a visual health intelligent assessment using the method of this invention. Based on the comprehensive judgment given by clinicians combined with past medical history, refractive examination results, and follow-up experience, the children were divided into a normal visual development group, a suspected abnormality group, and an abnormal risk group. The consistency calculation results automatically generated by the system are statistically shown below.
[0036] Table 1. Statistical table of consistency indicators for children at different visual developmental stages.
[0037] Table 1 clearly shows significant differences in concordance indices among children with different visual developmental stages. The visual response structure of the normal visual development group was highly consistent under both positive and negative stimulus conditions, with all concordance indices remaining at high levels, indicating good stability and coordination of their visual system under reversible stimulus changes. The concordance indices of the suspected abnormal group were significantly lower than those of the normal group overall, especially in terms of consistency in stability; some children showed increased response fluctuations under negative stimulus conditions. The concordance indices of the abnormal risk group were the lowest, with significant decreases in both temporal sequence concordance and duration concordance, reflecting that their visual response process struggled to maintain its original structure after stimulus reversal, indicating significant abnormalities in their visual development mechanism.
[0038] Further analysis revealed that approximately 31% of children in the suspected abnormality group still had normal vision results on routine vision chart tests, but their overall consistency mean was significantly lower than the average level of the normal group. These children were often classified as "temporarily normal" under previous screening methods, but the method of this invention can reveal their potential risks in advance, providing a basis for early intervention.
[0039] To verify the ability of this invention to assess the evolution of visual health risks, a six-month continuous follow-up assessment was conducted on 80 children, with monthly intelligent assessments and consistent results recorded. The system performed time-based integration and trend analysis on the data within the continuous assessment period, and some representative results are as follows.
[0040] Table 2. Examples of changes in overall consistency within consecutive evaluation periods.
[0041] As shown in Table 2, children with stable visual development exhibited relatively small fluctuations in their consistency index across multiple assessment periods, consistently remaining within a high range. In contrast, children with suspected abnormalities showed a continuous downward trend in their consistency index, with children at risk of abnormalities exhibiting not only lower values but also a more significant decline. Clinical follow-up examinations revealed a higher proportion of children with continuously declining or significantly fluctuating consistency indices who developed signs of visual dysfunction or refractive deterioration, further validating the effectiveness of this invention in risk evolution assessment.
[0042] Comparative analysis revealed that, compared to single vision screening results, the method of this invention provides richer and more stable data support for children's visual health assessment. On one hand, this method is based on a large amount of objective visual response data, reducing the uncertainty caused by subjective judgment; on the other hand, through consistency analysis under positive and negative stimulus conditions, the assessment results directly reflect the stability of visual development mechanisms, rather than focusing solely on outcome indicators. Especially in young children and children with attention deficit disorder, the method of this invention demonstrates higher usability and reliability.
[0043] In summary, the above embodiments demonstrate that this invention, through large-sample application verification in real medical scenarios, proves its ability to conduct refined, systematic, and dynamic assessments of children's visual health using deep learning and reversible visual stimulation mechanisms. This method achieves early identification of visual developmental abnormalities and judgment of risk evolution trends without requiring explicit cooperation from children. It exhibits significant beneficial effects such as strong assessment objectivity, sufficient data support, wide applicability, and high clinical auxiliary value, making it suitable for widespread application in children's visual health screening, assessment, and long-term follow-up management.
[0044] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.
Claims
1. A deep learning-based intelligent assessment method for children's visual health, characterized in that, Includes the following steps: Step 1: Collect facial and eye image sequences of children under visual stimulus presentation conditions, and record the corresponding time information to form raw visual data; Step 2: Based on the original visual data, construct a reversible visual stimulation modulation method for children's visual development, so that the visual stimulation forms positive and negative stimulation sequences in terms of spatial frequency, contrast, rhythm or binocular input conditions. Step 3: Input the facial and eye image sequences into the improved Depth Anything V2 model for monocular depth estimation, introduce an eye region perception stabilization mechanism, apply cross-frame temporal continuity constraints to the depth results of the eye region, and perform instantaneous depth mutation suppression processing in the forward and reverse stimulus sequences to obtain eye depth information; Step 4: Perform geometric compensation on the child's head displacement based on eye depth information, and perform scale uniform processing on pupil diameter changes, fixation point displacement, micro-eye movement amplitude and blinking behavior to construct a set of non-obvious visual response features. Step 5: For both positive and negative stimulus sequences, perform time alignment on the formation process of the non-dominant visual response feature set, and calculate the consistency of the temporal sequence, duration, and stability of key visual response processes. Step 6: Determine the stability of the visual development mechanism based on the consistency calculation results. When the temporal sequence of key visual response processes under the reverse stimulus sequence cannot be reconstructed or the stability is reduced, it is determined that there is an abnormality in the development mechanism of the corresponding visual function, and a visual health risk evolution assessment result is generated. Step 7: Based on the results of the visual health risk evolution assessment, classify the children's visual development status and generate corresponding visual health assessment conclusions by combining the consistent changes under different stimulus conditions.
2. The method for intelligent assessment of children's visual health based on deep learning according to claim 1, characterized in that, Step one specifically includes: The face and eye color images are captured frame by frame according to the preset acquisition frequency, and corresponding timestamp information is assigned to each captured image. The face and eye color images are sorted according to the timestamp information to form a face and eye image sequence. Data integrity processing is performed on the facial and eye image sequence. The data integrity processing includes: calculating the brightness distribution, contrast distribution, and effective pixel area ratio of each image frame; when the effective pixel area ratio of an image frame is lower than a preset threshold, or the brightness distribution and contrast distribution exceed a preset range, the corresponding image frame is removed from the facial and eye image sequence to obtain the original visual data.
3. The method for intelligent assessment of children's visual health based on deep learning according to claim 1, characterized in that, Step two specifically includes: Within the acquisition time range corresponding to the original visual data, the start and end times of the visual stimulus presentation are set, and the acquisition time range is divided into multiple consecutive stimulus periods. Visual stimulation parameters are set for each stimulation period, including spatial frequency parameters, contrast parameters, rhythm parameters, and binocular input parameters. The positive stimulus sequence is a sequence of visual stimulus parameters that changes gradually between adjacent stimulus periods according to a preset parameter change order, forming a stimulus sequence in which the visual stimulus parameters change monotonically. The reverse stimulation sequence is based on the changing order of visual stimulation parameters in the positive stimulation sequence, so that the visual stimulation parameters change gradually between adjacent stimulation periods in the opposite order to the positive stimulation sequence, forming a stimulation sequence that corresponds to the positive stimulation sequence in terms of the number of stimulation periods and parameter values. The stimulation time periods in the positive and negative stimulation sequences are correlated with the facial and eye image sequences based on timestamp information.
4. The method for intelligent assessment of children's visual health based on deep learning according to claim 1, characterized in that, The improved Depth Anything V2 model includes a depth estimation processing module, an eye region constraint processing module, a temporal continuity constraint processing module, and a depth stabilization processing module. The depth estimation processing module performs feature extraction and ReLU nonlinear mapping processing on the face and eye image sequences, and performs depth regression processing to generate pixel-level depth results corresponding to each image frame. The pixel-level depth results constitute the initial depth results. The eye region constraint processing module takes the initial depth result as input and introduces an eye region perception stabilization mechanism. In the depth result processing, the eye region is explicitly constrained. An eye region identifier matrix with the same size as the image frame is generated, and the eye region identifier matrix is fused with the initial depth result by element-wise multiplication to obtain the eye region depth result.
5. The temporal continuity constraint processing module applies cross-frame temporal continuity constraints to the eye region depth results. The temporal continuity constraint processing module calculates cross-frame difference values for the eye region depth results corresponding to adjacent timestamps, applies an exponential decay function to the cross-frame difference values to generate temporal continuity weights, and applies the temporal continuity weights to the eye region depth results corresponding to the current timestamp in a weighted manner to obtain the temporally constrained eye region depth results. The depth stabilization processing module takes the temporally constrained eye region depth result as input and performs instantaneous depth mutation suppression processing in the forward and reverse stimulus sequences. The instantaneous depth mutation suppression processing refers to the processing method that suppresses the cross-frame change amplitude in the temporally constrained eye region depth result that exceeds a preset range. The cross-frame change amplitude is mapped to a suppression weight between zero and one by the Sigmoid function, and the suppression weight is applied to the corresponding temporally constrained eye region depth result in a weighted manner to generate stable eye depth information.
6. The method for intelligent assessment of children's visual health based on deep learning according to claim 1, characterized in that, Step four specifically includes: By comparing the eye depth information corresponding to adjacent timestamps, the relative change of the eye region in spatial position is calculated, and the position of the eye depth information is corrected based on the relative change. The ocular response quantities related to ocular behavior in the ocular depth information are subjected to scale uniform processing. The ocular response quantities include pupil diameter change, fixation point displacement, micro-eye movement amplitude, and blink behavior parameters. After the scale-unified processing is completed, the various eye response quantities are combined in the order of timestamps to construct a set of non-obvious visual response features.
7. The method for intelligent assessment of children's visual health based on deep learning according to claim 1, characterized in that, Step five specifically includes: The non-dominant visual response feature subsequences corresponding to the positive stimulus sequence and the non-dominant visual response feature subsequences corresponding to the negative stimulus sequence are time-aligned to form time-aligned positive and negative response sequences. Key visual response processes are extracted from time-aligned positive and negative response sequences. The key visual response process refers to a continuous time segment in which the eye response features show significant changes during the stimulus change process. The significant changes are determined by the fact that the feature change amplitude under adjacent timestamps exceeds a preset threshold. For key visual response processes, the time sequence parameters, duration parameters, and stability parameters of the response process are calculated for both the forward and reverse response sequences. Based on time sequence parameters, duration parameters, and stability parameters, consistency calculations are performed on the key visual response processes of the positive and negative response sequences to obtain consistency calculation results.
8. The method for intelligent assessment of children's visual health based on deep learning according to claim 1, characterized in that, Step six specifically includes: The consistency calculation result is compared with the preset stability judgment interval. The stability judgment interval is three consecutive intervals divided according to the numerical range of the consistency calculation result. When the consistency calculation result falls into the first stability judgment interval, the visual development mechanism is determined to be in a stable state. When the consistency calculation result falls into the second stability judgment interval, the visual development mechanism is determined to be in a suspicious stable state. When the consistency calculation result falls into the third stability judgment interval, the visual development mechanism is determined to be in an unstable state. When the visual development mechanism is determined to be in a suspicious stable or unstable state, the trend of the consistency calculation result within a continuous time window is analyzed. The trend is the direction and magnitude of the change of the consistency calculation result within multiple adjacent time windows. When the trend shows a continuous decline or the fluctuation exceeds a preset trend threshold, it is determined that the visual development mechanism has abnormal evolutionary characteristics. Based on the stability assessment results of visual development mechanism and abnormal evolution characteristics, visual health risk evolution assessment results are generated.
9. The method for intelligent assessment of children's visual health based on deep learning according to claim 1, characterized in that, Step seven specifically includes: The visual health risk evolution assessment results are integrated over a time period within a preset assessment period. The time integration process involves summarizing the visual health risk evolution assessment results obtained in multiple consecutive assessment periods over a time series to obtain the risk status result after period integration. Based on the risk status results after periodic integration, the visual health assessment conclusions are graded and labeled. The visual health assessment conclusions after hierarchical labeling are associated and stored with the corresponding assessment timestamps to form a visual health assessment record.