Eye movement tracking-based man-machine interactive system and working method thereof
An eye-tracking and human-computer interaction technology, applied in the field of human-computer interaction, can solve problems such as destroying immersion, and achieve the effect of improving function visibility, comfort and ease of use
Active Publication Date: 2018-02-02
STATE GRID INTELLIGENCE TECH CO LTD
4 Cites 26 Cited by
AI-Extracted Technical Summary
Problems solved by technology
This kind of interaction greatly destroys the immersion of VR, an...
Method used
[0125] The ORB feature uses the Oriented FAST feature point detection operator and the Rotated BRIEF feature descriptor. The ORB algorithm not only has the detection effect of SIFT features, but also has the characteristics of rotation, scaling, and brightness change invariance. The most important thing is that its time complexity is greatly reduced compared with SIFT.
[0127] The present invention adopts VR/AR eye-tracking technology, which improves the comfort and ease of use of the head-mounted disp...
Abstract
The invention discloses an eye movement tracking-based man-machine interactive system and a working method thereof. The system comprises a processor connected with an AR/VR head display apparatus anda video collection apparatus; an eye movement tracking sensor and an angular movement sensor are arranged on the AR/VR head display apparatus and are used for capturing eye movement information in real time, collecting a current movement state of the AR/VR head display apparatus in real time and transmitting the eye movement information and the current movement state to the processor respectively;and the video collection apparatus is used for collecting a scene image in a sight line range of eyes and transmitting the scene image to the processor. The interactive experience of AR/VR in the engineering application fields can be improved.
Application Domain
Input/output for user-computer interactionGraph reading +1
Technology Topic
Sight lineEye movement +7
Image
Examples
- Experimental program(1)
Example Embodiment
[0077] It should be pointed out that the following detailed descriptions are all illustrative and are intended to provide further explanations for the application. Unless otherwise indicated, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the technical field to which this application belongs.
[0078] It should be noted that the terms used here are only for describing specific implementations, and are not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it should also be understood that when the terms "comprising" and/or "including" are used in this specification, they indicate There are features, steps, operations, devices, components, and/or combinations thereof.
[0079] Eye tracking EyeTribe technology, when the human eye is moving, we can track it to understand its movement trajectory. Eye tracking is a technology that perceives subtle changes in the eye. Depending on the direction of our observation, the eye is Corresponding features will be generated. Through the comparison of these features, a set of references for eye changes can be formed to realize the control function for eye changes. This is the so-called eye tracking technology. Eye tracking can covertly measure user behavior and convert the measurement results into objective quantitative data.
[0080] AR/VR head-mounted display device uses computer simulation to generate a virtual world in a three-dimensional space or superimpose virtual information in real-time positioning into real time to achieve real-time interactive experience and give users a simulation of vision, hearing, touch and other senses. , So that users are as immersive experience devices with immersive experience and empathy.
[0081] Three-dimensional space is three-dimensional and three-dimensional. The three directions of X, Y, and Z constitute a three-dimensional space, which is established by the infinite extension of the three directions.
[0082] Interactive experience: It uses computer technology to simulate a virtual world in a three-dimensional space, providing users with a simulation of vision, hearing and other senses, through keyboard, mouse, handle, steering wheel control or eye tracking technology, combined with AR/VR helmets to let users Just like being on the scene, you can observe things in three-dimensional space in a timely and unlimited manner.
[0083] figure 1 It is a schematic structural diagram of a human-computer interaction system based on eye tracking of the present invention.
[0084] Such as figure 1 As shown, a human-computer interaction system based on eye tracking of the present invention includes:
[0085] The processor is connected to the AR/VR head-mounted display device and the video capture device respectively; the AR/VR head-mounted display device is equipped with an eye tracking sensor and an angular motion sensor. The eye tracking sensor and the angular motion sensor are used to capture the eyes in real time. Partial activity information and real-time collection of the current motion state of the AR/VR head-mounted display device are sent to the processor; the video collection device is used to collect scene images within the eye's line of sight and send it to the processor.
[0086] The angular motion sensor can use a gyroscope to measure the current motion state and angular velocity state of the device. The motion state includes one or more of forward, backward, up, down, left, and right, and the angular velocity state includes acceleration or deceleration.
[0087] Among them, the processor is configured as:
[0088] Construct the current eye movement model according to the eye activity information and the motion state of the AR/VR head-mounted display device, and match it with the pre-stored eye-movement model, and then drive the AR/VR head-mounted display device to perform corresponding actions and locate the visual direction ;
[0089] Determine the eye sight range according to the visual direction of the positioning, receive the inner scene image and locate the eyeball, and then determine the gaze interest area;
[0090] Recognize the image in the gaze area of interest, and then obtain the scene module detection model and locate it;
[0091] The eye movement parameters and the position of the scene module detection model are respectively compared with the preset corresponding cross-reference range to determine whether to interact and issue the corresponding interactive operation control.
[0092] Among them, the pre-stored eye movement model library in the processor, such as figure 2 Shown.
[0093] Eye activity includes basic indicators and composite indicators. The basic indicators refer to activity information such as eye movement, gaze point, number of gazes, saccades, etc., and the composite indicators refer to scan paths and gazes calculated synthetically from the basic indicators. Time length and other information. The scan path is a straight line towards the target, and the longer the scan path, the lower the target result.
[0094] Interactive operations include general actions such as whether to display, forward, reverse, move left and right, and whether to open. It also includes intelligent operations such as collection, scanning, and analysis.
[0095] Specifically, the eye movement parameters include the number of saccades, the number of fixations, and the length of fixation.
[0096] Among them, in addition to the number of saccades, the number of fixations, and the length of fixation, the eye movement parameters may also include the duration of eye closure.
[0097] In a specific implementation, the processor is further configured to use the three-dimensional coordinates of the eyes, eyeballs, and the scene to locate the scene module detection model.
[0098] Wherein, when the eye position (that is, the position in the direct front direction of the eye), the eyeball position and the scene module detection model position become a straight line, the positioning of the target module is completed at this time.
[0099] In a specific implementation, the processor is further configured to:
[0100] Determine the classification of scene images in the eye sight range, and sort the scene images in time series;
[0101] Extracting the characteristic parameters of the scene image after sorting, the characteristic parameters of the scene image including dwell time, motion angle, discrete speed, and eye closure frequency;
[0102] Constructing the evaluation system of the characteristic parameters of the scene image, and then obtaining the optimal gaze point is the eye position point.
[0103] Among them, the classification method of scene images is as follows:
[0104] (1) Based on color characteristics: the same type of objects have similar color characteristics, so we can distinguish objects based on color characteristics and use color characteristics for image classification.
[0105] (2) Based on image texture: classify the image according to the gray-scale spatial distribution law of the neighborhood and the wavelet change of the pixels.
[0106] (3) Based on the image shape: use the combination of regional features and boundary features to classify images similarly.
[0107] According to the classified image, the feature parameters are extracted, the specific method is as follows:
[0108] (1) Feature parameter extraction based on image color: When a person looks at a certain place, the eyeball part of the eye image is much higher than the saturation value of other areas. That is, first convert the eye image to the saturation space to get a saturation map with two peaks, one is the eye area with less saturation, and the other is the larger eye area. Then use the maximum between-class variance method (Otsu) to get the image segmentation threshold, and then segment the image according to the threshold. That is, the eye area with saturation value higher than the saturation threshold in the image is separated to extract the eye area with smaller saturation value.
[0109] (2) Feature parameter extraction based on image texture: Through the comparison of images, four key features of the gray-level co-occurrence matrix are obtained: energy, inertia, entropy and correlation. Through the calculation of the energy spectrum function of the image, characteristic parameters such as texture thickness and directionality are extracted.
[0110] (3) Feature parameter extraction based on the eye movement model: Model-based methods usually use the geometric relationship of the object or the feature points of the object to estimate. Generally, there are two types of representation methods for shape features, one is contour features, and the other is regional features. The contour feature of the image is mainly aimed at the outer boundary of the eyeball, while the regional feature of the image is related to the entire eye area. The boundary feature method is to obtain the feature parameters of the image through the model of the boundary feature of the eyeball.
[0111] Among them, the basic index calculations (fixation point, number of fixations, saccades) include:
[0112] Fixation point: When the human eye is relatively stable for a period of time (usually 100-200 milliseconds), the angle of eye movement is less than 2 degrees, and the dispersion speed is less than 20-80 degrees/sec, which is called fixation. By calculating and analyzing the extracted image feature parameters (dwell time, motion angle, discrete speed), the fixation point of the eye is determined.
[0113] Number of fixations: The image algorithm analysis module will record the number of observers' fixations on each area. By sorting the number of gazes, the more people gaze at a certain area, the more important the area is to the observer.
[0114] Saccades: eye movements that occur between two fixations, usually within 20-40 milliseconds. The saccade behavior is determined by calculating and analyzing the extracted image feature parameters (eyeball closure, duration). The image algorithm analysis module records the process of each eye saccade of the observer. The more saccades, the longer the search path.
[0115] Among them, the composite index calculation (scan path, fixation time, look back) includes:
[0116] Scan path: The path of eye scan is the process of "gaze-saccade-gaze-saccade-gaze". The image algorithm analysis module calculates and records the scan path that is synthesized once according to the recorded basic indicators. The ideal scanning path is a straight line that always faces the target.
[0117] Gaze duration: The length of time to fix at the fixation point, by calculating and analyzing the extracted image feature parameters (stay time), to determine the fixation duration of the eye at the fixation point, the image algorithm analysis module will record the fixation duration of each fixation point, When a person looks at an area for a longer time, it indicates that the area is more important to the observer.
[0118] Looking back: that is, looking back, the consciousness of looking back, the turning point of the scanning path, the image algorithm analysis module will record the current fixation point of looking back, and record the number of looking back.
[0119] In a specific implementation, the processor is further configured to:
[0120] Sampling and reducing the image in the gaze region of interest, and then use the ORB algorithm for feature extraction;
[0121] Use the extracted ORB features to perform nearest neighbor matching, and filter the obtained matching point pairs through the RASANC algorithm to obtain rough matching point pairs;
[0122] Use the extracted coordinates of the coarse matching point to calculate the corresponding coordinates in the image in the gaze area of interest, and extract the ORB feature again from the image block where the matching point pair of the image in the gaze area of interest is located for precise matching ;
[0123] The fading in and out method is used to fuse adjacent image blocks, and the location feature, deep feature and feature map of the target are used to obtain the scene module detection model.
[0124] Among them, the bilinear interpolation method can be used to sample and reduce the image in the gaze region of interest; then, the ORB algorithm is used for feature extraction of all the images after the sample is reduced.
[0125] The ORB feature uses the Oriented FAST feature point detection operator and the Rotated BRIEF feature descriptor. ORB algorithm not only has the detection effect of SIFT characteristics, but also has the characteristics of rotation, scale scaling, and brightness change invariance. The most important thing is that its time complexity is greatly reduced compared with SIFT.
[0126] The present invention improves the user's sense of immersion in the VR/AR interaction mode, and the user can use the eyeball to locate a certain module in the scene, thereby deciding whether to interact with it.
[0127] The invention adopts VR/AR eye movement tracking technology to improve the comfort and ease of use of the head display.
[0128] The present invention improves the functional visibility of the human-computer interaction system based on eye movement tracking, allowing users to easily find and use, and the visibility naturally guides people to complete tasks correctly in this way.
[0129] image 3 It is a flowchart of the working method of the human-computer interaction system for eye tracking of the present invention.
[0130] Such as image 3 As shown, the working method of the human-computer interaction system for eye tracking of the present invention includes:
[0131] Step 1: The eye tracking sensor and the angular motion sensor respectively capture the eye activity information and the current motion state of the AR/VR head-mounted display device in real time and send them to the processor; the video capture device captures the scene image within the eye's sight range And sent to the processor;
[0132] Step 2: The processor constructs the current eye movement model according to the eye activity information and the movement state of the AR/VR head-mounted display device, and matches it with the pre-stored eye movement model, and then drives the AR/VR head-mounted display device to respond accordingly Move and locate the visual direction;
[0133] Step 3: The processor determines the eye sight range according to the positioned visual direction, receives the inner scene image and locates the eyeball, and then determines the gaze interest area;
[0134] Step 4: The processor recognizes the image in the gaze area of interest, and then obtains the scene module detection model and locates it;
[0135] Step 5: The processor uses the eye movement parameters and the position of the scene module detection model to compare with the preset corresponding cross-reference range respectively to determine whether to interact and issue the corresponding interactive operation control.
[0136] Specifically, the eye movement parameters include the number of saccades, the number of fixations, and the length of fixation.
[0137] Among them, in addition to the number of saccades, the number of fixations, and the length of fixation, the eye movement parameters may also include the duration of eye closure.
[0138] Specifically, the three-dimensional coordinates of the eyes, eyeballs, and the scene are used to locate the scene module detection model.
[0139] Wherein, when the eye position (that is, the position in the direct front direction of the eye), the eyeball position and the scene module detection model position become a straight line, the positioning of the target module is completed at this time.
[0140] In the step 3, as Figure 4 As shown, the specific process of positioning the eyeball includes:
[0141] Step 3.1: Determine the classification of scene images in the eye sight range, and sort the scene images in time series;
[0142] Step 3.2: Extract the sorted characteristic parameters of the scene image, the characteristic parameters of the scene image include dwell time, motion angle, discrete speed and eye closure frequency;
[0143] Step 3.3: Construct an evaluation system for the characteristic parameters of the scene image, and then obtain the optimal gaze point as the eye position point.
[0144] Among them, the classification method of scene images is as follows:
[0145] (1) Based on color characteristics: the same type of objects have similar color characteristics, so we can distinguish objects based on color characteristics and use color characteristics for image classification.
[0146] (2) Based on image texture: classify the image according to the gray-scale spatial distribution law of the neighborhood and the wavelet change of the pixels.
[0147] (3) Based on the image shape: use the combination of regional features and boundary features to classify images similarly.
[0148] According to the classified image, the feature parameters are extracted, the specific method is as follows:
[0149] (1) Feature parameter extraction based on image color: When a person looks at a certain place, the eyeball part of the eye image is much higher than the saturation value of other areas. That is, first convert the eye image to the saturation space to get a saturation map with two peaks, one is the eye area with less saturation, and the other is the larger eye area. Then use the maximum between-class variance method (Otsu) to get the image segmentation threshold, and then segment the image according to the threshold. That is, the eye area with saturation value higher than the saturation threshold in the image is separated to extract the eye area with smaller saturation value.
[0150] (2) Feature parameter extraction based on image texture: Through the comparison of images, four key features of the gray-level co-occurrence matrix are obtained: energy, inertia, entropy and correlation. Through the calculation of the energy spectrum function of the image, characteristic parameters such as texture thickness and directionality are extracted.
[0151] (3) Feature parameter extraction based on the eye movement model: Model-based methods usually use the geometric relationship of the object or the feature points of the object to estimate. Generally, there are two types of representation methods for shape features, one is contour features, and the other is regional features. The contour feature of the image is mainly aimed at the outer boundary of the eyeball, while the regional feature of the image is related to the entire eye area. The boundary feature method is to obtain the feature parameters of the image through the model of the boundary feature of the eyeball.
[0152] In the step 4, as Figure 5 As shown, the specific process of obtaining the scene module detection model includes:
[0153] Step 4.1: Sample and reduce the image in the area of interest of the gaze, and then use the ORB algorithm for feature extraction;
[0154] Step 4.2: Use the extracted ORB features to perform nearest neighbor matching, and filter the obtained matching point pairs through the RASANC algorithm to obtain a rough matching point pair;
[0155] Step 4.3: Use the extracted coarse matching point pair coordinates to calculate the corresponding coordinates in the image in the gaze area of interest, and extract the ORB feature again from the image block where the matching point pair of the image in the gaze area of interest is located. Make an exact match;
[0156] Step 4.4: Use the fade-in and fade-out method to fuse the adjacent image blocks, and use the location feature, deep feature and feature map of the target to obtain the scene module detection model.
[0157] Among them, the bilinear interpolation method can be used to sample and reduce the image in the gaze region of interest; then, the ORB algorithm is used for feature extraction of all the images after the sample is reduced.
[0158] The ORB feature uses the Oriented FAST feature point detection operator and the Rotated BRIEF feature descriptor. ORB algorithm not only has the detection effect of SIFT characteristics, but also has the characteristics of rotation, scale scaling, and brightness change invariance. The most important thing is that its time complexity is greatly reduced compared with SIFT.
[0159] The present invention improves the user's sense of immersion in the VR/AR interaction mode, and the user can use the eyeball to locate a certain module in the scene, thereby deciding whether to interact with it.
[0160] The invention adopts VR/AR eye movement tracking technology to improve the comfort and ease of use of the head display.
[0161] The present invention improves the functional visibility of the human-computer interaction system based on eye movement tracking, allowing users to easily find and use, and the visibility naturally guides people to complete tasks correctly in this way.
[0162] Although the specific embodiments of the present invention are described above in conjunction with the accompanying drawings, they do not limit the scope of protection of the present invention. Those skilled in the art should understand that on the basis of the technical solutions of the present invention, those skilled in the art do not need to make creative efforts. Various modifications or variations that can be made are still within the protection scope of the present invention.
PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.