Automatic detection system and method for eye lens tremor and iris tremor

By acquiring dynamic video through the anterior segment system of a slit lamp, and using a deep learning model to automatically segment and distinguish lens and iris tremors, the problem of time-consuming, labor-intensive, and inaccurate detection in existing technologies is solved, and efficient automated detection is achieved.

WO2026123626A1PCT designated stage Publication Date: 2026-06-18JOINT SHANTOU INT EYE CENT OF SHANTOU UNIV & THE CHINESE UNIV OF HONG KONG

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
JOINT SHANTOU INT EYE CENT OF SHANTOU UNIV & THE CHINESE UNIV OF HONG KONG
Filing Date
2025-06-12
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Current methods for detecting lens and iris tremors rely on the doctor's experience, resulting in time-consuming and labor-intensive testing with inaccurate results, making it difficult to automate and achieve efficient testing.

Method used

Dynamic video was acquired using a slit-lamp anterior segment system. An automatic segmentation model was constructed using the Mask-RCNN instance segmentation model based on the PyTorch Detection2 framework and the ResNeXt-101 residual network. By combining the Siamese convolutional neural network and the ResNet50 backbone network, the system automatically determined whether the lens and iris were experiencing tremors by calculating the similarity index of the lens and corneal contours and the centroid motion vector.

🎯Benefits of technology

It enables automatic and rapid detection of nystagmus in the eye's lens and iris, with the advantages of being non-contact and non-destructive, thus improving the accuracy and efficiency of detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025100652_18062026_PF_FP_ABST
    Figure CN2025100652_18062026_PF_FP_ABST
Patent Text Reader

Abstract

The present invention relates to the technical fields of image processing and pattern recognition. Disclosed are an automatic detection system and method for eye lens tremor and iris tremor. The system comprises: an acquisition module, a segmentation module, a calculation module, a first discrimination module, a second discrimination module, and a third discrimination module. The acquisition module is used for acquiring an eyeball dynamic video of a subject; the segmentation module is used for segmenting lens regions and cornea regions in adjacent video frames; the calculation module is used for calculating similarity indices of lens contours and cornea contours in the adjacent video frames; the first discrimination module is used for constructing a lens displacement discrimination model to discriminate whether the lens has shifted relative to the iris in the adjacent video frames; the second discrimination module is used for calculating the magnitude of a motion vector of the centroid of a lens contour relative to the centroid of a cornea contour in the adjacent video frames, and discriminating, on the basis of the magnitude of the motion vector, whether the lens has shifted relative to the cornea; and the third discrimination module is used for calculating and discriminating whether iris tremor or lens tremor is present in the entire dynamic video.
Need to check novelty before this filing date? Find Prior Art

Description

An automated detection system and method for lens and iris tremors. Technical Field

[0001] This invention relates to the field of medical image processing and pattern recognition technology, specifically to an automatic detection system and method for nystagmus of the eye lens and iris. Background Technology

[0002] Nystagmus is an important reference for understanding ocular anatomy and guiding surgery, and is usually related to the relaxation of the lens suspensory ligaments. Nystagmus can occur with age, eye injuries, and other conditions. Iridomalacia manifests as involuntary, minute vibrations of the iris during eye movement or at rest. It can be caused by a variety of factors, including nerve damage, eye trauma, complications after ophthalmic surgery, or side effects of certain medications. In most cases, both lens nystagmus and iris nystagmus may not cause obvious symptoms and are usually only detected through a specialized ophthalmological examination.

[0003] Currently, the detection of lens nystagmus and iris tremor relies primarily on the experience and subjective judgment of physicians. This is not only time-consuming and labor-intensive, but also prone to inaccuracy due to physician fatigue or differences in experience. With the development of computer vision and artificial intelligence technologies, automated image analysis methods offer the possibility of improving the accuracy and efficiency of lens nystagmus and iris tremor detection. Therefore, developing a method and system that can accurately and rapidly detect lens nystagmus and iris tremor automatically from video data has significant clinical implications and application value. Summary of the Invention

[0004] To address the technical problems mentioned above, this invention aims to provide an automatic detection system and method for lens and iris tremors. This invention can automatically determine the presence of lens and iris tremors by reading dynamic video of the eye. The lens and iris are connected by suspensory ligaments. If the suspensory ligaments are loose, the lens will shift relative to the iris, resulting in lens tremors, which manifests in images as changes in the visible features of the lens region. If the suspensory ligaments are properly connected, the lens does not shift relative to the iris, and the two move as a single unit. In this case, it is necessary to further determine whether this unit shifts relative to the outermost cornea. If shifting occurs, it indicates that iris tremors are present along with lens tremors, which manifests in images as a shift in the center of mass of the lens relative to the center of mass of the cornea.

[0005] To achieve the above objectives, the present invention provides an automatic detection system for nystagmus of the lens and iris, comprising: an acquisition module, a segmentation module, a calculation module, a first discrimination module, a second discrimination module, and a third discrimination module;

[0006] The acquisition module is used to acquire dynamic videos of the subject's eyes moving upward, downward, nasal, and temporal, respectively. The dynamic videos contain multiple video frames.

[0007] The segmentation module is used to construct an automatic segmentation model and use the automatic segmentation model to segment the lens and corneal regions in adjacent video frames. The structure of the automatic segmentation model includes: based on the Pytorch Detection2 framework, using the Mask-RCNN instance segmentation model, using the pyramid feature extraction network FPN as the backbone architecture, and using the residual network ResNeXt-101 as the convolutional structure.

[0008] The calculation module is used to calculate the similarity index of the lens and corneal contours in adjacent video frames;

[0009] The first discrimination module is used to construct a lens displacement discrimination model to determine whether the lens has shifted relative to the iris in adjacent video frames. The lens displacement discrimination model includes a ResNet50 backbone network, a fully connected layer and a sigmoid function, and uses binary cross-entropy as the loss function.

[0010] The second discrimination module is used to calculate the magnitude of the motion vector of the lens outline centroid relative to the corneal outline centroid in adjacent video frames, and to determine whether the lens has shifted relative to the cornea based on the magnitude of the motion vector.

[0011] The third discrimination module is used to calculate and determine whether there is iris tremor or lens tremor in the entire dynamic video.

[0012] Preferably, the workflow of the segmentation module includes: using the automatic segmentation model to segment the lens region and corneal region on the video frame to obtain the lens contour map in two adjacent frames.

[0013] Preferably, the workflow of the calculation module includes: obtaining the similarity between two lens contour objects in two adjacent frames using a similarity index calculation formula.

[0014] Wherein, similarity index represents the similarity index; The sign indicates the sign of the returned input value. It returns 1 if the input value is greater than 0, 0 if the input value is equal to 0, and -1 if the input value is less than 0. and Let i represent the i-th Hu moment of contour object A and object B, respectively;

[0015] The same process was used to calculate the similarity of corneal contours in adjacent frames.

[0016] Preferably, when the similarity index of the lens contour between two adjacent frames is less than 0.1, the difference in the global image features of the lens between these two adjacent frames is further determined.

[0017] When the corneal contour similarity index of two adjacent frames is less than 0.1, the magnitude of the motion vector of the lens contour centroid relative to the corneal contour centroid of the two adjacent frames is further calculated.

[0018] Preferably, the workflow of the first discrimination module includes:

[0019] Preprocessing is performed on the lens contour map in two adjacent frames to obtain processed data;

[0020] The processed data is then annotated to obtain a training sample set;

[0021] Construct a lens displacement discrimination model, and train the constructed lens displacement discrimination model using the training sample set;

[0022] The trained lens displacement discrimination model is used to determine whether the lens has shifted relative to the iris.

[0023] Preferably, the process by which the second discrimination module calculates the magnitude of the motion vector includes: dist(α-b)=|dist(α)-dist(b)|,

[0024] Where a and b represent two adjacent frames; Represents the coordinates of the lens centroid in frame a; Represents the corneal centroid coordinates of frame a; Represents the coordinates of the lens centroid in the b-frame image; The coordinates of the corneal centroid in frame b are represented as follows: dist(a) and dist(b) represent the Euclidean distance between the coordinates of the lens centroid and the corneal centroid in the two frames, respectively; dist(ab) represents the magnitude of the motion vector of the lens centroid relative to the corneal contour centroid.

[0025] This invention also provides an automatic detection method for nystagmus of the eye's lens and iris, the method being applied to the aforementioned system, the steps of which include:

[0026] The subject's eyes were captured in dynamic videos of moving upward, downward, nasal, and temporal directions, and the dynamic videos contained multiple video frames.

[0027] An automatic segmentation model is constructed and used to segment the lens and corneal regions in adjacent video frames. The structure of the automatic segmentation model includes: based on the Pytorch Detection2 framework, using the Mask-RCNN instance segmentation model, using the pyramid feature extraction network FPN as the backbone architecture, and using the residual network ResNeXt-101 as the convolutional structure.

[0028] Calculate the similarity index of the lens and corneal contours in adjacent video frames;

[0029] A lens displacement discrimination model is constructed to determine whether the lens has shifted relative to the iris in adjacent video frames. The discrimination model includes a ResNet50 backbone network, a fully connected layer, and a sigmoid function, and uses binary cross-entropy as the loss function.

[0030] Calculate the magnitude of the motion vector between the centroid of the lens contour and the centroid of the corneal contour in adjacent video frames, and determine whether the lens has shifted relative to the cornea based on the magnitude of the motion vector;

[0031] Calculate and determine whether iris tremor or lens tremor exists in the entire dynamic video.

[0032] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0033] This invention utilizes dynamic video imaging technology based on slit-lamp anterior segment imaging to automatically and rapidly detect whether lens nystagmus and iris nystagmus are occurring in the eye within the video. This has significant clinical implications and application value for assessing the state of the eye's structure. Furthermore, this invention has the advantages of being non-contact and non-destructive. Attached Figure Description

[0034] To more clearly illustrate the technical solution of the present invention, the drawings used in the embodiments are briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0035] Figure 1 is a schematic diagram of the system structure according to an embodiment of the present invention;

[0036] Figure 2 is a system workflow diagram of an embodiment of the present invention;

[0037] Figure 3 is a schematic diagram of the segmentation model according to an embodiment of the present invention;

[0038] Figure 4 is a diagram of the twin convolutional neural network structure according to an embodiment of the present invention. Detailed Implementation

[0039] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0040] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0041] Example 1

[0042] Figure 1 shows a schematic diagram of the system structure of the present invention, including: an acquisition module, a segmentation module, a calculation module, a first discrimination module, a second discrimination module, and a third discrimination module. The following will, with reference to Figure 2, describe in detail the workflow of the system of the present invention and how it solves technical problems in practical work.

[0043] First, a dynamic video of the subject's eyeballs rotating upwards, downwards, nasally, and temporally is acquired using an acquisition module. This dynamic video contains multiple video frames. Specifically, in this embodiment, the acquisition module is a slit-lamp anterior segment imaging device. This embodiment collected a total of 200 dynamic videos, of which 100 videos showed lens and / or iris tremors, and 100 videos did not. Each video frame refers to a single two-dimensional image frame within the dynamic video.

[0044] Next, an automatic segmentation model is built using the segmentation module, and this model is used to segment the lens and corneal regions in adjacent video frames. The process includes data annotation, image preprocessing, algorithm design, training, and prediction. The specific workflow is as follows:

[0045] Data annotation: The dynamic video is divided into several two-dimensional video frames. One frame is randomly selected from each video, for a total of 200 frames. The lens region and corneal region of each frame image are annotated using the labelme software.

[0046] Image preprocessing: To adapt to the input of the segmentation model, the resolution of these 200 frames of 2D images was adjusted to 1024*1024.

[0047] Automatic segmentation model design: Based on the PyTorch Detection2 framework, a Mask-RCNN instance segmentation model is adopted, using a pyramid feature extraction network (FPN) as the backbone architecture and a residual network (ResNeXt-101) as the convolutional structure. ResNeXt-101 has a 101-layer network structure, with its core component, the Residual Block, being a bottleneck module. Each bottleneck module consists of three convolutional layers: 1x1, 3x3, and 1x1. The 3x3 convolutional layers use 32 groups of convolutions. The FPN constructs a multi-scale feature map pyramid by applying 3x3 convolutional kernels to the feature maps output from different stages of ResNeXt-101. The pyramid feature maps are used to generate candidate regions through a Region Proposal Network (RPN), and the candidate regions generated by the RPN and the feature maps of the FPN are spatially aligned using the ROIAlign layer. Finally, the model outputs three branches through a fully convolutional network and fully connected layers: a classification branch, a bounding box regression branch, and a mask branch. The loss function for the classification branch is cross-entropy loss, the loss function for the bounding box regression branch is smooth L1 loss, and the loss function for the mask branch is pixel-wise binary classification cross-entropy loss. A schematic diagram of the model design is shown in Figure 3.

[0048] Training Process: First, a pre-trained COCO model was loaded, and then full-layer fine-tuning training was performed on a dataset with labeled lens and corneal regions. The initial learning rate was 0.02, and it was decayed to 1 / 10 of its original value every 10 epochs. The optimizer was stochastic gradient descent (SGD) with a momentum of 0.9 and a weight decay of 0.0001. Data augmentation methods such as random horizontal flipping (50% probability), scaling (0.8 to 1.2 times), and color jittering were used during training. Two segmentation models were trained in total, one for lens region segmentation and the other for corneal region segmentation.

[0049] Prediction process: In the prediction stage, if the confidence score of the Mask R-CNN model output is greater than or equal to 0.75, it means that the target region (lens region or corneal region) has been detected. Then, the contour data of the target region is obtained through OpenCV's findContours. The contour data is a set of continuous coordinate points on the boundary of the target region.

[0050] The similarity index of the lens and corneal contours in adjacent video frames is calculated using a computational module. The process includes: first, obtaining lens contour image data from two adjacent frames using an automatic segmentation model; then, calculating the similarity index using the formula for the similarity index to obtain the similarity between two lens contour objects in the adjacent frames.

[0051] Wherein, similarity index represents the similarity index; The sign indicates the sign of the returned input value. It returns 1 if the input value is greater than 0, 0 if the input value is equal to 0, and -1 if the input value is less than 0. and Let represent the i-th Hu moments of contour objects A and B, respectively. These Hu moments consist of a set of seven invariant moments, possessing rotation, scaling, and translation invariance. In practice, the similarity index of the two contour data can be directly calculated using OpenCV2's `matchshape` function, with the `method` parameter set to `CV_CONTOURS_MATCH_I1`. This function returns a similarity index, a value between 0 and 1. The smaller the index, the better the two contours match, meaning the outer peripheral edges of the lens are more similar. The similarity of corneal contours in adjacent frames is calculated using the same method.

[0052] The first discrimination module is used to construct a lens displacement discrimination model to determine whether the lens has shifted relative to the iris in adjacent video frames. First, it is necessary to determine whether the lens has shifted relative to the iris in adjacent video frames. The criteria for determining whether the lens has shifted relative to the iris in adjacent frames are as follows: 1) The lens outlines of adjacent frames are basically the same; 2) There are significant differences in the global image features of the lens in adjacent frames.

[0053] In this embodiment, only when the similarity index of the lens contour between two adjacent frames is less than 0.1 is the difference in the global image features of the lens in those adjacent frames further determined. This system uses a twin convolutional neural network to construct and train a lens displacement discrimination model to determine the difference in the global image features of the lens between adjacent frames. This is a binary classification task, and the process includes image preprocessing, image annotation, model construction, and model prediction. The specific process includes:

[0054] Image preprocessing: Using an automatic segmentation model, the lens region of all video frames is segmented to obtain a lens segmentation map. Then, the resolution of the segmentation map is further adjusted to 224*224 to adapt to the input of the Siamese convolutional neural network.

[0055] Image annotation: From each dynamic video showing lens nystagmus, 10 pairs of adjacent frames of lens segmentation images were randomly selected. These pairs met the following criteria: contour similarity index less than 0.1 and significant differences in global image features (i.e., lens displacement relative to the iris). Each pair of images was labeled as 1. Then, from each dynamic video showing no lens nystagmus, 10 pairs of adjacent frames of lens segmentation images were randomly selected. These pairs met the following criteria: contour similarity index less than 0.1 and small differences in global image features (i.e., no lens displacement relative to the iris). Each pair of images was labeled as 0. A total of 2000 pairs of adjacent frame lens segmentation image samples were obtained, of which 1000 pairs were positive samples labeled as 1 and 1000 pairs were negative samples labeled as 0.

[0056] Lens displacement discrimination model construction: Siamese convolutional neural networks, as a special type of convolutional neural network, have proven to perform well in similarity measurement tasks by simultaneously processing two input images and comparing their similarity. This embodiment uses a ResNet50 backbone network to extract image features. Two adjacent images from the training set are input into the backbone network to obtain two feature vectors: Gw(X1) and Gw(X2). These two feature vectors are then subtracted to obtain a new vector, which is then input into a fully connected layer (FC) to obtain a scalar. Finally, the sigmoid function is used to map the result to a labeled value of 0 or 1. Binary cross-entropy is used as the loss function. The structure of the Siamese convolutional network is shown in Figure 4.

[0057] Model prediction: During prediction, two lens segmentation images from adjacent frames are input. If the output result is 1, it indicates that the lens has shifted relative to the iris; otherwise, if the output result is 0, it indicates that the lens has not shifted relative to the iris.

[0058] The second discrimination module calculates the magnitude of the motion vector between the lens outline centroid and the corneal outline centroid in adjacent video frames, and determines whether the lens has shifted relative to the cornea based on the magnitude of the motion vector. The specific process includes: if the lens outlines in adjacent frames are basically consistent, but the global image features of the lens do not differ significantly (suspensory ligament connection is normal), it is necessary to further determine whether the iris and lens have shifted relative to the cornea in adjacent video frames. This is to determine whether iris tremor has occurred in both adjacent frames and lens tremor. Since the lens and iris move as a whole in this case, if the lens shifts relative to the cornea, the iris will also shift relative to the cornea. Therefore, in terms of image features, it is only necessary to determine whether the lens centroid has shifted relative to the corneal centroid. The criteria for determining whether the lens centroid has shifted relative to the corneal centroid in adjacent frames are as follows: 1) The corneal outlines in adjacent frames are basically consistent; 2) The magnitude of the motion vector between the lens outline centroid and the corneal outline centroid in adjacent frames exceeds a preset threshold.

[0059] Only when the corneal contour similarity index between two adjacent frames is less than 0.1 is the motion vector magnitude of the lens contour centroid relative to the corneal contour centroid in those adjacent frames further calculated. For two adjacent frames, firstly, using the lens contour image data and corneal contour image data obtained by the automatic segmentation model, the coordinates of the lens centroid and corneal centroid of one frame are calculated, and their Euclidean distance is calculated. Then, the same method is used to calculate the Euclidean distance between the lens centroid and corneal centroid coordinates of the other frame. The absolute value of the difference between these two Euclidean distances is calculated; this value is the motion vector magnitude of the lens contour centroid relative to the corneal contour centroid. If this value is greater than a preset threshold (e.g., 10 pixels), it indicates that the lens centroid relative to the corneal centroid of the adjacent frames has shifted, thus determining that iris tremor and lens tremor have occurred in these two frames. The formula for calculating the motion vector magnitude is as follows: dist(α-b)=|dist(α)-dist(b)|,

[0060] Where a and b represent two adjacent frames; Represents the coordinates of the lens centroid in frame a; Represents the corneal centroid coordinates of frame a; Represents the coordinates of the lens centroid in the b-frame image; The coordinates of the corneal centroid in frame b are represented as follows: dist(a) and dist(b) represent the Euclidean distance between the coordinates of the lens centroid and the corneal centroid in the two frames, respectively; dist(ab) represents the magnitude of the motion vector of the lens centroid relative to the corneal contour centroid.

[0061] The centroid coordinates of the contour can be obtained using the OpenCV moments function. Assuming the contour data object is contour, first calculate the contour moments Mu = cv2.moments(contour), and then obtain the centroid coordinates (Px, Py) of the contour using the following Python programming statements: Px = int(Mu['m10'] / Mu['m00']), Py = int(Mu['m01'] / Mu['m00']).

[0062] Finally, the third discrimination module is used to calculate and determine whether iris tremor or lens tremor exists in the entire dynamic video. The specific process includes:

[0063] An automatic segmentation model is used to extract images of the lens region. Then, the similarity index of the lens contour for each pair of adjacent frames is calculated. Lens segmentation images of adjacent frames with a similarity index less than 0.1 are input into a lens displacement discrimination model to determine whether the lens has shifted relative to the iris. If the model output is 1, it indicates that the lens has shifted relative to the iris in this pair of adjacent frames; if the model output is 0, it indicates that the lens has not shifted relative to the iris in this pair of adjacent frames. For the entire video, if the number of adjacent frames showing lens shift relative to the iris exceeds a preset threshold (e.g., 10 pairs), it is determined that the corresponding eye in the video has lens nystagmus.

[0064] In the third discrimination module, if the similarity index of the lens contour in adjacent frames is less than 0.1, but the output of the twin convolutional neural network model is 0, indicating that the lens has not shifted relative to the iris, further calculation of the motion vector magnitude of the lens centroid relative to the corneal centroid is required. If the corneal contour similarity index in adjacent frames is less than 0.1, and the motion vector magnitude of the lens centroid coordinates relative to the corneal centroid coordinates exceeds a preset threshold (e.g., 10 pixels), it indicates that there is iris shift combined with lens shift in these adjacent frames. For the entire video, if the number of adjacent frames with iris shift combined with lens shift exceeds a preset threshold (e.g., 10 pairs), it is determined that the eye in the corresponding video has iris tremor combined with lens tremor.

[0065] Example 2

[0066] This invention also provides an automatic detection method for nystagmus of the eye's lens and iris, comprising the following steps:

[0067] S1. Collect dynamic videos of the subject's eyeballs moving upward, downward, nasally, and temporally.

[0068] A slit-lamp anterior segment imaging device was used to acquire dynamic videos of the subject's eyeballs rotating upwards, downwards, nasally, and temporally. Each dynamic video contained multiple video frames; specifically, this embodiment collected a total of 200 dynamic videos. Of these, 100 videos showed nystagmus of the lens and / or iris, and 100 videos did not. Each video frame refers to a single two-dimensional image frame within the dynamic video.

[0069] S2. Construct an automatic segmentation model and use the automatic segmentation model to segment the lens and corneal regions in adjacent video frames.

[0070] An automatic segmentation model is constructed and used to segment the lens and corneal regions in adjacent video frames. The process includes data annotation, image preprocessing, algorithm design, training, and prediction. The specific workflow is as follows:

[0071] Data annotation: The dynamic video is divided into several two-dimensional video frames. One frame is randomly selected from each video, for a total of 200 frames. The lens region and corneal region of each frame image are annotated using the labelme software.

[0072] Image preprocessing: To adapt to the input of the segmentation model, the resolution of these 200 frames of 2D images was adjusted to 1024*1024.

[0073] Automatic segmentation model design: Based on the PyTorch Detection2 framework, the Mask-RCNN instance segmentation model is adopted, with the pyramid feature extraction network FPN as the backbone architecture and the residual network ResNeXt-101 as the convolutional structure.

[0074] Training process: First, the COCO pre-trained model was loaded, and then full-layer fine-tuning training was performed on the labeled lens and corneal regions dataset. A total of two segmentation models were trained: one for lens region segmentation and the other for corneal region segmentation.

[0075] Prediction process: In the prediction stage, if the confidence score of the Mask R-CNN model output is greater than or equal to 0.75, it means that the target region (lens region or corneal region) has been detected. Then, the contour data of the target region is obtained through OpenCV's findContours. The contour data is a set of continuous coordinate points on the boundary of the target region.

[0076] S3. Calculate the similarity index of the lens and corneal contours in adjacent video frames.

[0077] Calculate the similarity index of the lens and corneal contours in adjacent video frames. The process includes: first, using an automatic segmentation model to obtain lens contour image data in two adjacent frames; then, using the similarity index calculation formula to obtain the similarity between two lens contour objects in the two adjacent frames.

[0078] Wherein, similarity index represents the similarity index; The sign indicates the sign of the returned input value. It returns 1 if the input value is greater than 0, 0 if the input value is equal to 0, and -1 if the input value is less than 0. and Let represent the i-th Hu moments of contour objects A and B, respectively. The Hu moments consist of a set of seven invariant moments, possessing rotation, scaling, and translation invariance. In practice, the similarity index of the two contour data can be directly calculated using OpenCV2's `matchshape` function, with the `method` parameter set to `CV_CONTOURS_MATCH_I1`. This function returns a similarity index, a value between 0 and 1. The smaller the index, the better the two contours match, meaning the outer peripheral edges of the lens are more similar. The similarity of corneal contours in adjacent frames can be calculated using the same method.

[0079] S4. Construct a lens displacement discrimination model to determine whether the lens has shifted relative to the iris in adjacent video frames.

[0080] First, it is necessary to determine whether the lens has shifted relative to the iris in adjacent video frames. The criteria for determining whether the lens has shifted relative to the iris in adjacent frames are as follows: 1) The lens outlines in adjacent frames are basically the same; 2) There are significant differences in the global image features of the lens in adjacent frames.

[0081] In this embodiment, only when the similarity index of the lens contour between two adjacent frames is less than 0.1 is the difference in the global image features of the lens in those adjacent frames further determined. This system uses a twin convolutional neural network to construct and train a lens displacement discrimination model to determine the difference in the global image features of the lens between adjacent frames. This is a binary classification task, and the process includes image preprocessing, image annotation, model construction, and model prediction. The specific process includes:

[0082] Image preprocessing: Using an automatic segmentation model, the lens region of all video frames is segmented to obtain a lens segmentation map. Then, the resolution of the segmentation map is further adjusted to 224*224 to adapt to the input of the Siamese convolutional neural network.

[0083] Image annotation: From each dynamic video showing lens nystagmus, 10 pairs of adjacent frames of lens segmentation images were randomly selected. These pairs met the following criteria: contour similarity index less than 0.1 and significant differences in global image features (i.e., lens displacement relative to the iris). Each pair of images was labeled as 1. Then, from each dynamic video showing no lens nystagmus, 10 pairs of adjacent frames of lens segmentation images were randomly selected. These pairs met the following criteria: contour similarity index less than 0.1 and small differences in global image features (i.e., no lens displacement relative to the iris). Each pair of images was labeled as 0. A total of 2000 pairs of adjacent frame lens segmentation image samples were obtained, of which 1000 pairs were positive samples labeled as 1 and 1000 pairs were negative samples labeled as 0.

[0084] Lens displacement discrimination model construction: Siamese convolutional neural networks, as a special type of convolutional neural network, have proven to perform well in similarity measurement tasks by simultaneously processing two input images and comparing their similarity. This embodiment uses a ResNet50 backbone network to extract image features. Two adjacent images from the training set are input into the backbone network to obtain two feature vectors. These two feature vectors are then subtracted to obtain a new vector. This new vector is then input into a fully connected layer (FC) to obtain a scalar. Finally, the sigmoid function is used to map the result to a labeled value of 0 or 1. Binary cross-entropy is used as the loss function. The structure of the Siamese convolutional network is shown in Figure 3.

[0085] Model prediction: During prediction, two lens segmentation images from adjacent frames are input. If the output result is 1, it indicates that the lens has shifted relative to the iris; otherwise, if the output result is 0, it indicates that the lens has not shifted relative to the iris.

[0086] S5. Calculate the magnitude of the motion vector of the lens outline centroid relative to the corneal outline centroid in adjacent video frames, and determine whether the lens has shifted relative to the cornea based on the magnitude of the motion vector.

[0087] The motion vector magnitude of the lens centroid relative to the corneal centroid in adjacent video frames is calculated, and the magnitude of the motion vector magnitude is used to determine whether the lens has shifted relative to the cornea. The specific process includes: if the lens contours in adjacent frames are basically consistent, but the global image features of the lens do not differ significantly (suspensory ligament connection is normal), it is necessary to further determine whether the iris and lens have shifted relative to the cornea in adjacent video frames. This is to determine whether iris tremor has occurred in both adjacent frames and lens tremor. Since the lens and iris move as a whole in this case, if the lens shifts relative to the cornea, the iris will also shift relative to the cornea. Therefore, in terms of image features, it is only necessary to determine whether the lens centroid has shifted relative to the corneal centroid. The criteria for determining whether the lens centroid has shifted relative to the corneal centroid in adjacent frames are as follows: 1) The corneal contours in adjacent frames are basically consistent; 2) The magnitude of the motion vector magnitude of the lens centroid relative to the corneal centroid in adjacent frames exceeds a preset threshold.

[0088] Only when the corneal contour similarity index between two adjacent frames is less than 0.1 is the motion vector magnitude of the lens contour centroid relative to the corneal contour centroid in those adjacent frames further calculated. For two adjacent frames, firstly, using the lens contour image data and corneal contour image data obtained by the automatic segmentation model, the coordinates of the lens centroid and corneal centroid of one frame are calculated, and their Euclidean distance is calculated. Then, the same method is used to calculate the Euclidean distance between the lens centroid and corneal centroid coordinates of the other frame. The absolute value of the difference between these two Euclidean distances is calculated; this value is the motion vector magnitude of the lens contour centroid relative to the corneal contour centroid. If this value is greater than a preset threshold (e.g., 10 pixels), it indicates that the lens centroid relative to the corneal centroid of the adjacent frames has shifted, thus determining that iris tremor and lens tremor have occurred in these two frames. The formula for calculating the motion vector magnitude is as follows: dist(α-b)=|dist(α)-dist(b)|,

[0089] Where a and b represent two adjacent frames; Represents the coordinates of the lens centroid in frame a; Represents the corneal centroid coordinates of frame a; Represents the coordinates of the lens centroid in the b-frame image; The coordinates of the corneal centroid in frame b are represented as follows: dist(a) and dist(b) represent the Euclidean distance between the coordinates of the lens centroid and the corneal centroid in the two frames, respectively; dist(ab) represents the magnitude of the motion vector of the lens centroid relative to the corneal contour centroid.

[0090] The centroid coordinates of the contour can be obtained using the OpenCV moments function. Assuming the contour data object is contour, first calculate the contour moments Mu = cv2.moments(contour), and then obtain the centroid coordinates (Px, Py) of the contour using the following Python programming statements: Px = int(Mu['m10'] / Mu['m00']), Py = int(Mu['m01'] / Mu['m00']).

[0091] S6. Calculate and determine whether there is iris tremor or lens tremor in the entire dynamic video.

[0092] An automatic segmentation model is used to extract images of the lens region. Then, the similarity index of the lens contour for each pair of adjacent frames is calculated. Lens segmentation images of adjacent frames with a similarity index less than 0.1 are input into a lens displacement discrimination model to determine whether the lens has shifted relative to the iris. If the model output is 1, it indicates that the lens has shifted relative to the iris in this pair of adjacent frames; if the model output is 0, it indicates that the lens has not shifted relative to the iris in this pair of adjacent frames. For the entire video, if the number of adjacent frames showing lens shift relative to the iris exceeds a preset threshold (e.g., 10 pairs), it is determined that the corresponding eye in the video has lens nystagmus.

[0093] If the similarity index of the lens contour in adjacent frames is less than 0.1, but the output of the twin convolutional neural network model is 0, indicating that the lens has not shifted relative to the iris, further calculation of the motion vector magnitude of the lens centroid relative to the corneal centroid is required. If the corneal contour similarity index in adjacent frames is less than 0.1, and the motion vector magnitude of the lens centroid coordinates relative to the corneal centroid coordinates exceeds a preset threshold (e.g., 10 pixels), it indicates that there is iris shift combined with lens shift in these adjacent frames. For the entire video, if the number of adjacent frames with iris shift combined with lens shift exceeds a preset threshold (e.g., 10 pairs), it is determined that the eye in the corresponding video has iris tremor combined with lens tremor.

[0094] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Various modifications and improvements made to the technical solutions of the present invention by those skilled in the art without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.

Claims

1. An automatic detection system for nystagmus of the eye lens and iris, characterized in that, include: The system comprises an acquisition module, a segmentation module, a calculation module, a first discrimination module, a second discrimination module, and a third discrimination module. The acquisition module is used to acquire dynamic videos of the subject's eyes moving upward, downward, nasal, and temporal, respectively. The dynamic videos contain multiple video frames. The segmentation module is used to construct an automatic segmentation model and use the automatic segmentation model to segment the lens and corneal regions in adjacent video frames. The structure of the automatic segmentation model includes: based on the Pytorch Detection2 framework, using the Mask-RCNN instance segmentation model, using the pyramid feature extraction network FPN as the backbone architecture, and using the residual network ResNeXt-101 as the convolutional structure. The calculation module is used to calculate the similarity index of the lens and corneal contours in adjacent video frames; The first discrimination module is used to construct a lens displacement discrimination model to determine whether the lens has shifted relative to the iris in adjacent video frames. The lens displacement discrimination model includes a ResNet50 backbone network, a fully connected layer and a sigmoid function, and uses binary cross-entropy as the loss function. The second discrimination module is used to calculate the magnitude of the motion vector of the lens outline centroid relative to the corneal outline centroid in adjacent video frames, and to determine whether the lens has shifted relative to the cornea based on the magnitude of the motion vector. The third discrimination module is used to calculate and determine whether there is iris tremor or lens tremor in the entire dynamic video.

2. The automatic detection system for lens and iris tremors according to claim 1, characterized in that, The workflow of the segmentation module includes: using the automatic segmentation model to segment the lens region and corneal region on the video frame, and obtaining the lens contour map in two adjacent frames.

3. The automatic detection system for lens and iris tremors according to claim 2, characterized in that, The workflow of the calculation module includes: obtaining the similarity between two lens contour objects in two adjacent frames using the similarity index calculation formula. Wherein, the similarity index represents the similarity index; The sign indicates the sign of the returned input value. It returns 1 if the input value is greater than 0, 0 if the input value is equal to 0, and -1 if the input value is less than 0. and Let i represent the i-th Hu moment of contour object A and object B, respectively; The same process was used to calculate the similarity of corneal contours in adjacent frames.

4. The automatic detection system for lens and iris tremors according to claim 3, characterized in that, When the similarity index of the lens contour between two adjacent frames is less than 0.1, the differences in the global image features of the lens between these two adjacent frames are further determined. When the corneal contour similarity index of two adjacent frames is less than 0.1, the magnitude of the motion vector of the lens contour centroid relative to the corneal contour centroid of the two adjacent frames is further calculated.

5. The automatic detection system for lens and iris tremors according to claim 3, characterized in that, The workflow of the first discrimination module includes: Preprocessing is performed on the lens contour map in two adjacent frames to obtain processed data; The processed data is then annotated to obtain a training sample set; Construct a lens displacement discrimination model, and train the constructed lens displacement discrimination model using the training sample set; The trained lens displacement discrimination model is used to determine whether the lens has shifted relative to the iris.

6. The automatic detection system for lens and iris tremors according to claim 4, characterized in that, The process by which the second discrimination module calculates the magnitude of the motion vector includes: dist(ab)=|dist(a)-dist(b)|, Where a and b represent two adjacent frames; Represents the coordinates of the lens centroid in frame a; Represents the corneal centroid coordinates of frame a; Represents the coordinates of the lens centroid in the b-frame image; The coordinates of the corneal centroid in frame b are represented as follows: dist(a) and dist(b) represent the Euclidean distance between the coordinates of the lens centroid and the corneal centroid in the two frames, respectively; dist(ab) represents the magnitude of the motion vector of the lens centroid relative to the corneal contour centroid.

7. An automatic detection method for lens and iris tremors, said method being applied to the system according to any one of claims 1-6, characterized in that the steps include... include: The subject's eyes were captured in dynamic videos of moving upward, downward, nasal, and temporal directions, and the dynamic videos contained multiple video frames. An automatic segmentation model is constructed and used to segment the lens and corneal regions in adjacent video frames. The structure of the automatic segmentation model includes: based on the Pytorch Detection2 framework, using the Mask-RCNN instance segmentation model, using the pyramid feature extraction network FPN as the backbone architecture, and using the residual network ResNeXt-101 as the convolutional structure. Calculate the similarity index of the lens and corneal contours in adjacent video frames; A lens displacement discrimination model is constructed to determine whether the lens has shifted relative to the iris in adjacent video frames. The discrimination model includes a ResNet50 backbone network, a fully connected layer, and a sigmoid function, and uses binary cross-entropy as the loss function. Calculate the magnitude of the motion vector between the centroid of the lens contour and the centroid of the corneal contour in adjacent video frames, and determine whether the lens has shifted relative to the cornea based on the magnitude of the motion vector; Calculate and determine whether iris tremor or lens tremor exists in the entire dynamic video.