Real object recognition method based on AI big data and intelligent wearable device
By integrating image sensors and eye-tracking sensors into wearable devices, and combining user eye feature data and AI big data, efficient and accurate identification of objects that users are interested in is achieved, solving the problem of low identification efficiency in existing technologies.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN ZHIMEIDE TECH CO LTD
- Filing Date
- 2025-07-30
- Publication Date
- 2026-06-26
AI Technical Summary
Existing object recognition methods struggle to accurately identify the target object of interest to users in wearable devices, resulting in low recognition efficiency and insufficient accuracy, requiring users to manually filter the recognition results.
By combining image sensors and eye-tracking sensors, the system collects user eye feature data and environmental images, uses eye movement trajectory and blink frequency to determine target object candidate areas, and performs feature matching with preset AI big data to achieve object recognition.
It improves the accuracy and efficiency of object recognition, reduces interference from irrelevant objects in complex backgrounds, and eliminates the need for users to manually filter recognition results.
Smart Images

Figure CN120954079B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer technology, and in particular to a method for physical object recognition based on AI big data and a smart wearable device. Background Technology
[0002] In wearable device applications, users' need to identify unknown objects in their surroundings is growing. For example, when wearing smart glasses, users want to quickly obtain specific information about unfamiliar buildings, plants, or objects in front of them. In this case, object recognition methods become the core technology for achieving this function.
[0003] Existing object recognition methods for wearable devices primarily rely on machine vision technology. These methods acquire images of the surrounding environment through image acquisition devices mounted on the wearable device, and then extract and recognize features from all objects within the image. However, when multiple objects exist in the surrounding environment, machine vision technology struggles to accurately identify the object the user is truly interested in. It tends to output the recognition results for all objects in the image simultaneously, forcing the user to manually sift through a large number of results. This not only increases the user's workload but also reduces the efficiency and accuracy of object recognition. Summary of the Invention
[0004] This invention provides a method for physical object recognition based on AI big data and a smart wearable device to improve the efficiency and accuracy of physical object recognition.
[0005] In a first aspect, the present invention provides a physical object recognition method based on AI big data, applied to smart wearable devices; the smart wearable device includes an image sensor and a gaze tracking sensor; the method includes:
[0006] The image sensor acquires continuous environmental images of the user's surroundings, and the eye-tracking sensor captures the user's eye feature data; the eye feature data includes the user's eye rotation angle, pupil center position, and blink frequency;
[0007] The continuous environmental image is detected based on edge contours to obtain initial candidate regions for physical objects. The rotation angle and the pupil center position are correlated based on spatial position to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant.
[0008] Based on the imaging parameters of the image sensor and the pixel coordinates of each initial candidate object region, a spatial transformation is performed to obtain a three-dimensional spatial region. Then, based on the three-dimensional spatial coordinates of the user's gaze point at each instant, a trajectory is fitted in the time dimension to obtain the user's gaze movement trajectory.
[0009] Based on the blink frequency and the line of sight movement trajectory, the three-dimensional spatial region of each initial object candidate region is matched to determine the target object candidate region, and the object feature is extracted from the target object candidate region to obtain the first object feature of the object to be identified.
[0010] Based on the first physical object feature and the second physical object feature of the preset AI big data, feature matching is performed to obtain the physical object recognition result of the object to be identified.
[0011] In a second aspect, the present invention also provides a smart wearable device for use in the object recognition method based on AI big data as described in the first aspect; the smart wearable device integrates a data acquisition unit and an object recognition unit, the data acquisition unit including an image sensor and a gaze tracking sensor;
[0012] The data acquisition unit is used to acquire continuous environmental images of the user's surrounding environment based on the image sensor, and to capture the user's eye feature data based on the gaze tracking sensor; the eye feature data includes the user's eyeball rotation angle, pupil center position, and blink frequency;
[0013] The object recognition unit is used for:
[0014] The continuous environmental image is detected based on edge contours to obtain initial candidate regions for physical objects. The rotation angle and the pupil center position are correlated based on spatial position to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant.
[0015] Based on the imaging parameters of the image sensor and the pixel coordinates of each initial candidate object region, a spatial transformation is performed to obtain a three-dimensional spatial region. Then, based on the three-dimensional spatial coordinates of the user's gaze point at each instant, a trajectory is fitted in the time dimension to obtain the user's gaze movement trajectory.
[0016] Based on the blink frequency and the line of sight movement trajectory, the three-dimensional spatial region of each initial object candidate region is matched to determine the target object candidate region, and the object feature is extracted from the target object candidate region to obtain the first object feature of the object to be identified.
[0017] Based on the first physical object feature and the second physical object feature of the preset AI big data, feature matching is performed to obtain the physical object recognition result of the object to be identified.
[0018] Thirdly, the present invention also provides an electronic device, comprising: a memory for storing computer software programs; and a processor for reading and executing the computer software programs, thereby realizing the object recognition method based on AI big data as described above.
[0019] Fourthly, the present invention also provides a non-transitory computer-readable storage medium storing a computer software program, which, when executed by a processor, implements the object recognition method based on AI big data as described above.
[0020] Fifthly, the present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the object recognition method based on AI big data as described above.
[0021] The AI-based big data object recognition method provided in this invention determines the user's gaze trajectory by observing the rotation angle of the user's eyeballs and the center position of the pupil. This gaze trajectory is then used to filter out the target object candidate region from all initial object candidate regions identified based on the environmental image. The object's features within the target candidate region are then matched with the object features from the preset AI big data to obtain the object recognition result. Therefore, during object recognition, the gaze trajectory focuses on the area of the object the user is truly interested in, reducing interference from other irrelevant objects in complex backgrounds and avoiding interference from irrelevant object recognition results, thus improving the accuracy of object recognition. Since the object recognition result only requires the user's blink frequency and gaze trajectory to identify the object of interest, manual filtering is unnecessary, improving the efficiency of object recognition. Attached Figure Description
[0022] Figure 1 This is a flowchart illustrating the AI-based big data-based object recognition method provided in an embodiment of the present invention.
[0023] Figure 2 This is a schematic diagram of the structure of the smart wearable device provided in an embodiment of the present invention;
[0024] Figure 3 An embodiment diagram of the electronic device provided in this invention;
[0025] Figure 4 An embodiment diagram of a computer-readable storage medium provided in accordance with the present invention. Detailed Implementation
[0026] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0027] In the description of this invention, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of the stated features. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.
[0028] In the description of this invention, the term "for example" is used to mean "used as an example, illustration, or description." Any embodiment described as "for example" in this invention is not necessarily to be construed as being more preferred or advantageous than other embodiments. The following description is provided to enable any person skilled in the art to make and use the invention. Details are set forth in the following description for purposes of explanation. It should be understood that those skilled in the art will recognize that the invention can be made without using these specific details. In other instances, well-known structures and processes will not be described in detail to avoid obscuring the description of the invention with unnecessary detail. Therefore, the invention is not intended to be limited to the embodiments shown, but is consistent with the broadest scope of the principles and features disclosed herein.
[0029] Optional, see below Figure 1 , Figure 1 This is a flowchart illustrating the AI-based big data-driven object recognition method provided by the present invention. In this embodiment, the executing entity of the AI-based big data-driven object recognition method is a smart wearable device. Therefore, the AI-based big data-driven object recognition method in this embodiment includes:
[0030] Step 10: Acquire continuous environmental images of the user's surroundings using an image sensor, and capture the user's eye feature data using a gaze-tracking sensor. The eye feature data includes the user's eye rotation angle, pupil center position, and blink frequency.
[0031] Optionally, the smart wearable device in this embodiment of the invention is a children's watch. The children's watch is equipped with an image sensor and a gaze tracking sensor. It continuously collects continuous environmental images of the user's (child's) surroundings through its own image sensor (camera). When collecting images, the image sensor automatically adjusts the exposure parameters according to the ambient light intensity to ensure that the acquired environmental images are clear and distinguishable.
[0032] Furthermore, the device utilizes its built-in gaze-tracking sensors to capture the user's eye feature data. This eye feature data specifically includes the user's eye rotation angle, pupil center position, and blink frequency.
[0033] The eye-tracking sensor illuminates the user's eyes with an infrared light source and then captures the reflected light from the eyes using an image acquisition element, thereby accurately extracting eye feature data.
[0034] In one embodiment, when a child is playing outdoors wearing a children's smartwatch and sees an unfamiliar bird, they want to identify it using the smartwatch. At this time, the smartwatch's image sensor (camera) starts working, continuously capturing images of the bird and its surrounding environment at a frequency of 30 frames per second, forming a continuous sequence of environmental images containing elements such as the bird, trees, ground, and sky. Simultaneously, the gaze-tracking sensor activates, illuminating the child's eyes with infrared light. The sensor captures eye feature data such as the child's eyeball turning 30 degrees to the left, the pupil center being located in the left third of the eye image, and blinking 5 times within 10 seconds (i.e., a blinking frequency of 0.5 times per second), and transmits this data in real time to the smartwatch's processor.
[0035] Step 20: Detect continuous environmental images based on edge contours to obtain initial candidate regions for physical objects, and perform correlation analysis on rotation angle and pupil center position based on spatial location to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant.
[0036] Furthermore, the children's watch processes the collected continuous environmental images and uses edge detection algorithms (such as the Canny edge detection algorithm) to detect the edge contours of objects in the images. Regions with complete edge contours are marked as initial object candidate regions. For example, the outline regions of birds and trees in the image may become initial object candidate regions.
[0037] Furthermore, the children's watch combines the user's eye rotation angle and pupil center position, and through spatial position correlation analysis, determines the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant. Specifically, the eye rotation angle is used to determine the horizontal and vertical deflection angles of the gaze, and combined with the coordinates of the pupil center position in the image, the three-dimensional spatial coordinates of the gaze point are obtained through geometric calculation, as detailed in steps 201 to 204.
[0038] In one embodiment, the children's smartwatch performs edge detection on the captured continuous environmental images containing the bird, identifying the outlines of the bird, trees, and ground, etc., and these outline regions are marked as initial object candidate regions. Simultaneously, the child's eyeballs rotate 25 degrees to the upper left, with the pupil center located in the upper left corner of the eye image. The smartwatch's processor correlates this data with the spatial information of the environmental image to calculate the three-dimensional spatial coordinates (assumed to be X = 5 meters, Y = 3 meters, Z = 1.5 meters, with the smartwatch's location as the origin) of the child's current gaze point on the bird's location.
[0039] Step 30: Based on the imaging parameters of the image sensor and the pixel coordinates of each initial candidate object region, perform spatial transformation to obtain a three-dimensional spatial region, and perform trajectory fitting in the time dimension based on the three-dimensional spatial coordinates of the user's gaze pointing point at each instant to obtain the user's gaze movement trajectory.
[0040] Furthermore, the children's watch converts the pixel coordinates of each initial candidate object region into a three-dimensional spatial region using a spatial transformation algorithm based on the imaging parameters of the image sensor (such as focal length, image distance, sensor size, etc.). This three-dimensional spatial region represents the spatial extent of the initial candidate object region in the real environment. Optionally, the spatial transformation algorithm in this embodiment of the invention can be a perspective transformation algorithm.
[0041] Furthermore, the children's watch fits the three-dimensional spatial coordinates of the user's gaze at each instant in the time dimension to obtain the user's gaze movement trajectory, which reflects the user's gaze movement path over a period of time, as described in steps 301 to 304.
[0042] According to the above embodiment, the image sensor of the children's watch has a focal length of 5 mm, an image distance of 10 mm, and a sensor size of 1 / 2.3 inch. For the initial candidate area of the bird, its pixel coordinates in the image range from (100, 80) to (200, 150). Through perspective transformation algorithms, the corresponding three-dimensional spatial region is calculated to be X between 4.8 meters and 5.2 meters, Y between 2.8 meters and 3.2 meters, and Z between 1.4 meters and 1.6 meters. Simultaneously, as the child observes the bird, their gaze changes with the bird's slight movements. The children's watch records the three-dimensional spatial coordinates of the gaze points at 10 instants, and through trajectory fitting, obtains a smooth gaze trajectory around the three-dimensional spatial region where the bird is located.
[0043] Step 40: Based on blink frequency and gaze movement trajectory, the three-dimensional spatial region of each initial object candidate region is matched to determine the target object candidate region, and the object feature is extracted from the target object candidate region to obtain the first object feature of the object to be identified.
[0044] Furthermore, the children's watch matches blink frequency and gaze movement trajectory with the three-dimensional spatial region of each initial object candidate area. Typically, users will have a relatively stable gaze fixation on objects of interest (gaze movement trajectory within the three-dimensional spatial region of the object), and their blink frequency will be relatively low. This filters out the target object candidate areas that the user is interested in, as detailed in steps 301 to 304.
[0045] Furthermore, the children's watch uses a feature extraction algorithm to extract the physical features of the target physical candidate region, and obtains the first physical feature of the physical object to be identified. The first physical feature includes shape features, color features, texture features, etc. The feature extraction algorithm in this embodiment of the invention is the SIFT feature extraction algorithm.
[0046] In one embodiment, when a child observes a bird, their blinking frequency is low (0.3 times per second), and their gaze movement trajectory remains mostly within the bird's corresponding three-dimensional spatial region (X between 4.8 and 5.2 meters, Y between 2.8 and 3.2 meters, and Z between 1.4 and 1.6 meters). This data is matched with the three-dimensional spatial regions of each initial candidate object region to determine the initial candidate object region containing the bird as the target candidate object region. Feature extraction is performed on the target candidate object region to obtain the bird's first physical features, such as brown and white feathers, a body size of approximately 20 centimeters, and the presence of a sharp beak and claws.
[0047] Step 50: Based on the first physical object feature and the second physical object feature of the preset AI big data, feature matching is performed to obtain the physical object recognition result of the object to be identified.
[0048] Furthermore, the children's smartwatch performs feature matching between the first physical feature of the object to be identified and the second physical feature stored in a pre-set AI big data database. This pre-set AI big data database contains a large number of second physical features of different objects, which have been categorized and labeled. During feature matching, the similarity between the first and second physical features (such as cosine similarity, Euclidean distance, etc.) is calculated. When the similarity exceeds a preset threshold, the two are considered to be successfully matched, thus obtaining the object identification result, which is then fed back to the user in text or voice format.
[0049] Continuing with the above embodiment, the children's smartwatch matches the extracted first physical characteristics of the bird (brown and white feathers, approximately 20 cm in size, sharp beak and claws, etc.) with the second physical characteristics of various birds stored in a preset AI big data database. The big data database includes the second physical characteristics of sparrows: brown and white feathers, approximately 15-20 cm in size, sharp beak and claws, etc. By calculating the similarity, it was found that the similarity reached 90%, exceeding the preset 80% threshold, therefore the object to be identified was determined to be a sparrow. The children's smartwatch then provides voice feedback to the child: "This item is a sparrow, a common small bird."
[0050] This invention determines the user's gaze trajectory by observing the rotation angle of the user's eyeballs and the center position of the pupil. Based on this gaze trajectory, it filters out the target object candidate region from all initial object candidate regions identified from the environmental image. Then, it performs feature matching between the object's features in the target object candidate region and the object features in the preset AI big data to obtain the object recognition result. Therefore, during object recognition, the gaze trajectory focuses on the area of the object the user is truly interested in, reducing interference from other irrelevant objects in complex backgrounds and avoiding interference from irrelevant object recognition results, thus improving the accuracy of object recognition. Since the object recognition result only requires the user's blink frequency and gaze trajectory to identify the object of interest, manual filtering is unnecessary, improving the efficiency of object recognition.
[0051] In one embodiment, steps 201 to 204 include:
[0052] Step 201: Convert the horizontal and vertical rotation angles of the eyeball rotation into unit vectors of the gaze direction in the head coordinate system. The head coordinate system has the midpoint of the line connecting the centers of the pupils of both eyes as the origin, the direction pointing to the center of the right pupil as the positive direction of the horizontal axis, and the positive direction of the vertical axis in a plane perpendicular to the horizontal axis and passing through the origin, which is consistent with the direction of looking straight ahead. The positive direction of the vertical axis is determined by the cross product of the positive directions of the horizontal and vertical axes.
[0053] Optionally, the children's watch uses the midpoint of the line connecting the centers of the pupils of both eyes as the origin, the direction pointing to the center of the right pupil as the positive direction of the horizontal axis, and the positive direction of the vertical axis in a plane perpendicular to the horizontal axis and passing through the origin, which is consistent with the direction of looking straight ahead. The positive direction of the vertical axis is determined by the cross product of the positive directions of the horizontal axis and the positive directions of the vertical axis, thus constructing a head coordinate system.
[0054] Furthermore, the children's watch converts the horizontal and vertical rotation angles of the eyeball into a unit vector of the gaze direction in the head coordinate system. The horizontal rotation angle is the angle by which the eyeball rotates around the vertical axis of the head coordinate system, and the vertical rotation angle is the angle by which the eyeball rotates around the horizontal axis of the head coordinate system. By combining these two angles with trigonometric function calculations, the components of the gaze direction unit vector on the three axes of the head coordinate system can be obtained.
[0055] Continuing with the scenario of the child observing a bird, the children's watch detects a horizontal eye rotation angle of 25 degrees to the left (i.e., a rotation of -25 degrees around the vertical axis, with the left being the negative direction) and a vertical eye rotation angle of 10 degrees upward (i.e., a rotation of 10 degrees around the horizontal axis, with the upward direction being the positive direction). In the head coordinate system, the origin is the midpoint of the line connecting the centers of the child's pupils, the horizontal axis points to the center of the right pupil, the vertical axis represents the line of sight, and the vertical axis is determined by the cross product. Based on the horizontal and vertical rotation angles, a unit vector representing the direction of the gaze is calculated. Let the horizontal rotation angle be α = -25° and the vertical rotation angle be β = 10°. Then the components of the unit vector of the line of sight in the head coordinate system are: the horizontal axis component is sinα×cosβ≈-0.4226×0.9848≈-0.416, the vertical axis component is cosα×cosβ≈0.9063×0.9848≈0.892, and the vertical axis component is sinβ≈0.1736. That is, the unit vector of the line of sight is (-0.416, 0.892, 0.1736).
[0056] Step 202: Based on the intrinsic parameter matrix of the image sensor, the two-dimensional image coordinates of the pupil center position are converted into three-dimensional coordinates of the pupil center in the head coordinate system.
[0057] Furthermore, the children's watch acquires the intrinsic parameter matrix of the image sensor, which includes parameters such as focal length and principal point coordinates, and is used to describe the process by which the image sensor projects three-dimensional spatial points onto a two-dimensional image plane.
[0058] Furthermore, the children's watch uses this intrinsic parameter matrix to transform the two-dimensional image coordinates of the pupil center position to obtain the three-dimensional coordinates of the pupil center in the head coordinate system. Specifically, through the inverse operation of the intrinsic parameter matrix, combined with the coordinates of the pupil center on the two-dimensional image, its three axis coordinates in the head coordinate system are calculated.
[0059] In one embodiment, the image sensor intrinsic parameter matrix is:
[0060]
[0061] Among them, f x = 500 pixels, f y = 500 pixels (focal length converted to pixels), c x = 320 pixels, c y = 240 pixels (principal point coordinates, i.e., image center coordinates). The two-dimensional image coordinates of the detected center position of the child's pupil are (280 pixels, 220 pixels). According to the intrinsic parameter matrix transformation formula, let the three-dimensional coordinates of the pupil center be (X... p ,Y p Z p The two-dimensional image coordinates are (u, v), and the relationship is: u = f x *(X p / Z p )+c x v = f y *(Y p / Z p )+c y Assume the distance Z from the pupil to the image sensor is... p = 0.1 meters (i.e., 10 centimeters), then X p =(uc x )*Z p / f x =-0.008 meters, Y p =(vc y )*Z p / f y = -0.004 meters, so the three-dimensional coordinates of the pupil center are (-0.008 meters, -0.004 meters, 0.1 meters).
[0062] Step 203: Determine the initial gaze direction in the head coordinate system based on the unit vector of the gaze direction and the three-dimensional coordinates of the pupil center.
[0063] Furthermore, after obtaining the unit vector of the gaze direction and the three-dimensional coordinates of the pupil center, the children's watch determines the initial gaze direction in the head coordinate system based on these two data points. The initial gaze direction can be represented as a ray starting from the three-dimensional coordinates of the pupil center and following the direction indicated by the unit vector of the gaze direction. That is, the initial gaze direction is a ray with the three-dimensional coordinates of the pupil center as its starting point and the unit vector of the gaze direction as its direction vector.
[0064] Continuing with the scenario of a child observing a bird, we obtain the unit vector of the gaze direction as (-0.416, 0.892, 0.1736), and the three-dimensional coordinates of the pupil center as (-0.008 m, -0.004 m, 0.1 m). Therefore, in the head coordinate system, the initial gaze direction is the ray extending from the point (-0.008 m, -0.004 m, 0.1 m) along the vector (-0.416, 0.892, 0.1736). This ray represents the initial direction of the child's gaze in the head coordinate system.
[0065] Step 204: Analyze the gaze point based on the initial gaze direction to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant.
[0066] Furthermore, the children's watch analyzes the gaze point based on the initial gaze direction to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant, as described in steps 2041 to 2043.
[0067] This invention combines eye movement parameters, image sensor parameters, head posture data, and environmental depth information to achieve a precise mapping from eye features to three-dimensional coordinates in the environment. Therefore, it can accurately obtain the three-dimensional spatial coordinates of the user's gaze point in the environment, providing a precise spatial positioning foundation for subsequent applications such as gaze-based object recognition. This ensures that the objects that children are interested in can be accurately captured, improving the accuracy and efficiency of object recognition.
[0068] In one embodiment, steps 2041 to 2043 include:
[0069] Step 2041 transforms the initial gaze direction based on the first coordinate system transformation relationship between the head coordinate system and the image sensor coordinate system to obtain the target gaze direction in the image sensor coordinate system. The image sensor coordinate system has the optical center of the image sensor as the origin, the horizontal direction pointing to the right of the image sensor imaging plane is the positive direction of the horizontal axis, and the vertical direction pointing vertically upwards of the image sensor imaging plane is the positive direction of the vertical axis. The positive direction of the vertical axis is determined by the cross product of the positive horizontal axis and the positive vertical axis.
[0070] Optionally, the children's watch uses the optical center of the image sensor as the origin, the horizontal direction pointing to the right of the image sensor's imaging plane as the positive direction of the horizontal axis, and the vertical direction pointing vertically upwards of the image sensor's imaging plane as the positive direction of the vertical axis. The positive direction of the vertical axis is determined by the cross product of the positive directions of the horizontal and vertical axes, thus constructing the image sensor coordinate system.
[0071] Furthermore, the children's watch defines the head coordinate system and the image sensor coordinate system, as well as the first coordinate system transformation relationship between them. The first coordinate system transformation relationship is a transformation matrix determined based on the position vectors between the origins of the two coordinate systems and the rotation relationship between them. Using this transformation matrix, the children's watch converts the initial gaze direction (represented by the starting coordinates and direction vector) in the head coordinate system to the target gaze direction in the image sensor coordinate system, thus obtaining the starting coordinates and direction vector in the image sensor coordinate system. Optionally, in this embodiment of the invention, the transformation matrix T1 from the head coordinate system to the image sensor coordinate system is used for point P in the head coordinate system. h and direction vector Point P transformed to the image sensor coordinate system i and direction vector for:
[0072] P i =T1×P h .
[0073] Continuing with the scenario of a child observing a bird, the initial coordinates of the gaze direction in the head coordinate system are (-0.008 m, -0.004 m, 0.1 m), and the direction vector is (-0.416, 0.892, 0.1736). The first coordinate system transformation matrix between the head coordinate system and the image sensor coordinate system is known to be T1, which is determined based on the position and rotation relationship between the two coordinate systems. Through calculation, the initial coordinates in the head coordinate system are transformed to (0.02 m, 0.01 m, 0.15 m) in the image sensor coordinate system, and the transformed direction vector is (-0.38, 0.91, 0.16), which is the target gaze direction in the image sensor coordinate system.
[0074] Step 2042 involves performing spatial intersection analysis based on the target's line-of-sight direction and the depth information of each candidate object region in the continuous environmental image to obtain the coordinates of the intersection point in the image sensor coordinate system. The depth information is correlated with the image sensor coordinate system.
[0075] Furthermore, the children's smartwatch first acquires the depth information of each initial object candidate region in the continuous environmental image. This depth information refers to the distance from a point within each initial object candidate region to the optical center of the image sensor, and it is correlated with the image sensor coordinate system; that is, the point corresponding to the depth information has corresponding three-dimensional coordinates in the image sensor coordinate system. Further, based on the target viewing direction (a ray in the image sensor coordinate system), the children's smartwatch performs spatial intersection analysis with the spatial region corresponding to the depth information of each initial object candidate region, finding the intersection point between the viewing ray and the spatial region of the initial object candidate region. The coordinates of this intersection point in the image sensor coordinate system are the intersection point coordinates.
[0076] Continuing with the above embodiment, the depth information of the initial object candidate region where the bird is located in the continuous environmental image acquired by the children's watch is that the distance from the point in this region to the optical center of the image sensor is between 4.9 meters and 5.1 meters. The target line of sight obtained in step 2041 is a ray extending from the point (0.02 meters, 0.01 meters, 0.15 meters) along the direction vector (-0.38, 0.91, 0.16) in the image sensor coordinate system. Through spatial intersection analysis, it is found that this line of sight intersects with the spatial region of the initial object candidate region where the bird is located. The coordinates of the intersection point in the image sensor coordinate system are (-1.88 meters, 4.56 meters, 5.0 meters), which are the intersection point coordinates.
[0077] Step 2043: Based on the second coordinate system transformation relationship between the absolute environmental coordinate system and the image sensor coordinate system, the intersection coordinates are transformed to obtain the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant.
[0078] Optionally, the children's watch uses three non-collinear rigid feature points in the environment as reference points, with the first rigid feature point as the origin, the direction from the second rigid feature point to the third rigid feature point as the positive direction of the horizontal axis, the plane perpendicular to the horizontal axis from the first rigid feature point, the direction from the first rigid feature point to the third rigid feature point as the positive direction of the vertical axis, and the direction pointed to by rotating the positive direction of the horizontal axis 90° around the positive direction of the vertical axis as the positive direction of the vertical axis, thus constructing an absolute coordinate system for the environment.
[0079] Furthermore, the children's watch defines the absolute coordinate system of the environment and the coordinate system of the image sensor, as well as the second coordinate system transformation relationship between the two. The second coordinate system transformation relationship is a transformation matrix determined based on the position vector between the origins of the two coordinate systems and the rotation relationship between the two coordinate systems.
[0080] Furthermore, the children's watch utilizes this transformation matrix to convert the intersection coordinates in the image sensor coordinate system to coordinates in the absolute environmental coordinate system. These coordinates represent the three-dimensional spatial coordinates of the user's gaze point in the environment at any given moment. Optionally, in this embodiment of the invention, the transformation matrix T2 from the image sensor coordinate system to the absolute environmental coordinate system is used for the intersection coordinates P in the image sensor coordinate system. i Transform to the coordinates of the line-of-sight point P in the absolute environmental coordinate system. e for:
[0081] P e =T2×P i .
[0082] Continuing with the above embodiment, the second coordinate system transformation matrix between the environmental absolute coordinate system and the image sensor coordinate system is T2. The intersection point coordinates obtained in step 2042 are (-1.88 meters, 4.56 meters, 5.0 meters) in the image sensor coordinate system. After transformation using the transformation matrix T2, the coordinates of the intersection point in the environmental absolute coordinate system are obtained as (5 meters, 3 meters, 1.5 meters), which are the three-dimensional spatial coordinates of the child's line of sight in the environment at that instant.
[0083] This invention establishes a transformation relationship between different coordinate systems and performs spatial intersection analysis using depth information. Ultimately, it accurately converts the initial gaze direction in the head coordinate system into the three-dimensional spatial coordinates of the gaze point in the absolute environmental coordinate system. Therefore, it can accurately obtain the three-dimensional spatial coordinates of the user's gaze point in the environment, providing a precise spatial positioning basis for subsequent gaze-based object recognition and ensuring that the children's watch can accurately identify the objects that the child is interested in.
[0084] In one embodiment, steps 301 to 304 include:
[0085] Step 301: Arrange the three-dimensional spatial coordinates of the user's line of sight at consecutive time points in chronological order to obtain a spatiotemporal coordinate sequence matrix.
[0086] Optionally, the children's smartwatch needs to organize and arrange the three-dimensional spatial coordinates of the user's gaze at consecutive time points in chronological order, forming a spatiotemporal coordinate sequence matrix. First, the children's smartwatch records the time information (such as a timestamp) for each time point, as well as the three-dimensional spatial coordinates (X, Y, and Z coordinates) of the user's gaze at that time point in the absolute environmental coordinate system. Then, in chronological order from morning to night, this time information and its corresponding three-dimensional spatial coordinates are arranged sequentially to form a matrix. Each row of this matrix represents information for a single time point, including the time value and the corresponding X, Y, and Z coordinate values—that is, the spatiotemporal coordinate sequence matrix.
[0087] Continuing as the child observes the bird, the children's watch records the three-dimensional spatial coordinates of the point of view at five consecutive time points. These time points are the 1st, 2nd, 3rd, 4th, and 5th seconds, corresponding to the following three-dimensional spatial coordinates: (5.0m, 3.0m, 1.5m), (5.1m, 3.0m, 1.5m), (5.1m, 3.1m, 1.5m), (5.0m, 3.1m, 1.5m), and (5.0m, 3.0m, 1.5m). The children's watch arranges this information chronologically, resulting in a spatiotemporal coordinate sequence matrix M. t-coord The matrix form is as follows:
[0088]
[0089] The first column of the matrix represents time (seconds), and the second to fourth columns represent the X coordinate (meters), Y coordinate (meters), and Z coordinate (meters), respectively.
[0090] Step 302: Based on the spatiotemporal coordinate sequence matrix, determine the change in position of the line-of-sight pointing point in the absolute environmental coordinate system between two adjacent time points, and obtain the spatial displacement vector.
[0091] Furthermore, based on the spatiotemporal coordinate sequence matrix, the children's watch extracts the three-dimensional spatial coordinates of the gaze point at two adjacent time points in the absolute environmental coordinate system. By subtracting the coordinates of the previous time point from the coordinates of the later time point, the change in the position of the gaze point between two adjacent time points is calculated. This change is represented by a spatial displacement vector, and the three components of the spatial displacement vector correspond to the displacement in the horizontal, vertical, and longitudinal directions in the absolute environmental coordinate system, respectively.
[0092] Continuing with the scenario of children observing birds, the three rigid feature points of the environmental absolute coordinate system are set as follows: the first rigid feature point is a fixed stone on the ground (origin), the second rigid feature point is the base of a nearby tree trunk, and the third rigid feature point is a fixed marker on the tree trunk. From the spatiotemporal coordinate sequence matrix M of step 301... t-coord Extract the coordinates of adjacent time points: the coordinates of the 1st second (5.0 m, 3.0 m, 1.5 m) and the coordinates of the 2nd second (5.1 m, 3.0 m, 1.5 m), and calculate the spatial displacement vector. The coordinates are (5.1-5.0, 3.0-3.0, 1.5-1.5) = (0.1 m, 0.0 m, 0.0 m); the coordinates at the 2nd and 3rd seconds are calculated as follows. The 3rd and 4th seconds are... The 4th and 5th seconds are... These spatial displacement vectors reflect the movement and changes of the line of sight in the absolute coordinate system of the environment.
[0093] Step 303: Based on the spatial displacement vector and the time interval between two adjacent line-of-sight pointing points, determine the velocity vector of the line-of-sight movement, and correlate the spatial displacement vector and the velocity vector in chronological order to obtain the displacement-velocity spatiotemporal correlation matrix.
[0094] Furthermore, based on the spatial displacement vector obtained in step 302 and the time interval between two adjacent line-of-sight points, the children's watch calculates the velocity vector of the line of sight movement. The time interval is the time difference between two adjacent time points. The velocity vector is calculated by dividing the spatial displacement vector by the corresponding time interval. The three components of the velocity vector correspond to the velocities along the horizontal, vertical, and axial directions in the absolute coordinate system of the environment, respectively. Then, the spatial displacement vector and velocity vector corresponding to each adjacent time interval are correlated and integrated in chronological order to form a displacement-velocity spatiotemporal correlation matrix. Each row of this matrix contains the time interval between two adjacent time points, the three components of the spatial displacement vector, and the three components of the velocity vector.
[0095] Continuing with the scenario of children observing birds, the time interval between two adjacent time points is 1 second (e.g., the interval Δt between the 1st and 2nd seconds). 1-2 =1 second). For the spatial displacement vector obtained in step 302 velocity vector Corresponding velocity vector corresponding corresponding The displacement rate spatiotemporal correlation matrix M is formed by correlation in chronological order. disp-velDisplacement rate spatiotemporal correlation matrix M disp-vel Specifically:
[0096]
[0097] The first column represents the time interval (seconds), the second to fourth columns represent the spatial displacement vector components (meters), and the fifth to seventh columns represent the velocity vector components (meters / second).
[0098] Step 304: Based on the displacement rate spatiotemporal correlation matrix, perform trajectory fitting to obtain the line-of-sight movement trajectory.
[0099] Furthermore, the children's watch performs trajectory fitting based on the displacement rate spatiotemporal correlation matrix to obtain the line-of-sight movement trajectory, as detailed in steps 3041 to 3044.
[0100] The embodiments of the present invention, by orderly arranging the gaze coordinates at continuous time points, calculating displacement and velocity, associating and integrating information, and performing trajectory fitting, can ultimately accurately transform discrete gaze pointing points into continuous, smooth gaze movement trajectories that reflect the movement patterns of the gaze.
[0101] In one embodiment, steps 3041 to 3044 include:
[0102] Step 3041: Based on the angle between adjacent spatial displacement vectors and the magnitude of adjacent velocity vectors in the displacement-velocity spatiotemporal correlation matrix, determine the turning characteristic parameters of the degree of directional change and velocity of adjacent line-of-sight trajectories. The turning characteristic parameters characterize the degree of turning of the line-of-sight trajectory.
[0103] Optionally, the children's watch extracts two adjacent spatial displacement vectors and two adjacent velocity vectors based on the displacement-velocity spatiotemporal correlation matrix. First, the angle between adjacent spatial displacement vectors is calculated using the vector dot product formula, reflecting the degree of directional change in the two line-of-sight trajectories (details omitted here). Second, the magnitude of adjacent velocity vectors is calculated using the vector magnitude formula, reflecting the change in velocity magnitude. Combining these two parameters, a steering characteristic parameter is constructed. This parameter is calculated using a composite function of the angle and magnitude; a larger value indicates a more drastic turning of the line-of-sight trajectory, while a smaller value indicates a gentler turning. Optionally, the formula for calculating the steering characteristic parameter in this embodiment is:
[0104]
[0105] Where, θ k-(k+1) It represents the angle (in radians) between two adjacent spatial displacement vectors between the k-th time point and the (k+1)-th time point, used to characterize the angle at which the line of sight turns in space. It represents the magnitude of the velocity vector of the line of sight movement at the (k+1)th time point.
[0106] Step 3042: Based on the steering feature parameters, the line-of-sight trajectory is divided into different continuous segments of line-of-sight sub-trajectories, and the spatiotemporal boundaries of the trajectory segments of each line-of-sight sub-trajectory are determined.
[0107] Furthermore, the children's watch sets a threshold for steering feature parameters. Consecutive adjacent trajectory segments with steering feature parameters less than or equal to the threshold are divided into the same line-of-sight sub-trajectory, while positions with parameters greater than the threshold are designated as segmentation points. The spatiotemporal boundaries of each line-of-sight sub-trajectory segment include temporal boundaries (start and end times) and spatial boundaries (start and end coordinates). The temporal boundary is determined by the time point corresponding to the segmentation point, and the spatial boundary is determined by the coordinates of the line-of-sight pointing point corresponding to that time point.
[0108] In one embodiment, the children's watch sets the turning feature parameter threshold to 60°. Assuming all turning feature parameter K values obtained in step 3041 are 90° (greater than 60°), each adjacent trajectory segment is an independent line-of-sight sub-trajectory. The spatiotemporal boundaries of the trajectory segmentation are:
[0109] The first segment of the trajectory: time boundary (1 second, 2 seconds), spatial boundary (5.0, 3.0, 1.5) → (5.1, 3.0, 1.5).
[0110] The second segment of the trajectory: time boundary (2 seconds, 3 seconds), spatial boundary (5.1, 3.0, 1.5) → (5.1, 3.1, 1.5).
[0111] The third segment of the trajectory: time boundary (3 seconds, 4 seconds), spatial boundary (5.1, 3.1, 1.5) → (5.0, 3.1, 1.5).
[0112] The fourth segment of the trajectory: time boundary (4 seconds, 5 seconds), spatial boundary (5.0, 3.1, 1.5) → (5.0, 3.0, 1.5).
[0113] Step 3043: Construct parameterized equations for each line-of-sight sub-trajectory within its time interval based on the spatiotemporal boundaries of each line-of-sight sub-trajectory segment. The parameterized equations characterize the coordinates of the line-of-sight pointing point at any given time within each line-of-sight sub-trajectory segment.
[0114] Furthermore, for each line-of-sight sub-trajectory, the children's watch constructs a parameterized equation using time as a parameter. Let the start time of the sub-trajectory be t0, the end time be t1, the starting point coordinates be P0(x0,y0,z0), and the ending point coordinates be P1(x1,y1,z1). Introduce the parameter s = (t-t0) / (t1-t0), s∈[0,1]. The parameterized equation is obtained through linear interpolation or high-order polynomial fitting, such that when s=0, it returns to P0, and when s=1, it returns to P1, while satisfying the rate change law within the sub-trajectory. Optionally, in this embodiment of the invention, the sub-trajectory parameterized equation is obtained through linear fitting. The specific linear fitting process is as follows:
[0115]
[0116] Where, t∈[t k ,t k+1 ], x k y k , z k For t k Time coordinates.
[0117] In one embodiment, the linear parameterized equations for the first sub-trajectory segment (1 second → 2 seconds) are: x(s) = 5.0 + 0.1s, y(s) = 3.0 + 0s, z(s) = 1.5 + 0s, where s = (t-1) / (2-1) = t-1, t ∈ [1, 2]. The linear parameterized equations for the second sub-trajectory segment (2 seconds → 3 seconds) are: x(s) = 5.1 + 0s, y(s) = 3.0 + 0.1s, z(s) = 1.5 + 0s, s = t-2, t ∈ [2, 3]. The third and fourth sub-trajectory segments are similarly constructed using linear parameterized equations.
[0118] Step 3044: Fit the parameterized equations of each line-of-sight sub-trajectory in chronological order to obtain the line-of-sight movement trajectory.
[0119] Furthermore, the children's watch splices together the parameterized equations of each segment of the gaze trajectory in chronological order, ensuring that adjacent sub-trajectories are continuous at the segmentation points (seamless connection of time and spatial coordinates). The spliced equations cover all time intervals, forming a complete gaze movement trajectory. This trajectory is a continuous function, which can calculate the coordinates of the gaze pointing point at any given time.
[0120] Continuing with the above embodiment, the four sub-trajectory equations from step 3043 are concatenated in chronological order:
[0121] t∈[1,2]: x=5.0+0.1(t-1), y=3.0, z=1.5.
[0122] t∈[2,3]: x=5.1, y=3.0+0.1(t-2), z=1.5.
[0123] t∈[3,4]: x=5.1-0.1(t-3), y=3.1, z=1.5.
[0124] t∈[4,5]: x=5.0, y=3.1-0.1(t-4), z=1.5.
[0125] The stitched images form a continuous line of sight movement within 1-5 seconds, reflecting the rectangular movement path of the line of sight around the bird.
[0126] This invention, through quantifying trajectory turning features, fitting parametric equations piecewise and splicing them together, ultimately obtains a continuous, piecewise smooth, and accurate three-dimensional line-of-sight movement trajectory that reflects the movement pattern of the line of sight, and can accurately output the coordinates of the line-of-sight pointing point at any time.
[0127] In one embodiment, steps 401 to 404 include:
[0128] Step 401: Associate all points on the line-of-sight movement trajectory with the three-dimensional spatial region of each initial physical candidate region according to the time dimension to obtain a spatiotemporal association set.
[0129] Optionally, the children's watch extracts the 3D spatial coordinates and corresponding time information of each point on the gaze movement trajectory according to the time dimension, and simultaneously obtains the 3D spatial range of each initial object candidate region (a spatial range composed of multiple 3D coordinate points). Each point on the gaze movement trajectory is associated with the 3D spatial range of each initial object candidate region, i.e., it is determined whether the 3D spatial coordinates of the point are within the 3D spatial range of that initial object candidate region. All association results are organized in chronological order to form a spatiotemporal association set, which includes the coordinates of the gaze pointing point corresponding to each time point and the identifier of the initial object candidate region to which it belongs (if it does not belong to any region, it is marked as none).
[0130] As the children continue observing the bird, their gaze trajectory has 10 three-dimensional spatial coordinates at 1-10 seconds, namely (5.0, 3.0, 1.5), (5.1, 3.0, 1.5), (5.1, 3.1, 1.5), (5.0, 3.1, 1.5), (5.0, 3.0, 1.5), (5.2, 3.0, 1.5), (5.1, 2.9, 1.5), (4.9, 3.0, 1.5), (6.0, 4.0, 2.0), and (7.0, 5.0, 2.5). There are three initial candidate regions for physical objects: the bird region (3D spatial region X∈[4.9, 5.2], Y∈[2.9, 3.1], Z∈[1.4, 1.6]), the tree region (X∈[6.0, 8.0], Y∈[4.0, 6.0], Z∈[2.0, 3.0]), and the ground region (X∈[0, 10], Y∈[0, 10], Z∈[0, 0.5]). Through association judgment, a spatiotemporal association set is formed, where the gaze point from 1 to 8 seconds belongs to the bird region, the point at 9 seconds belongs to the tree region, and the point at 10 seconds does not belong to any region (marked as none).
[0131] Step 402: Determine the spatial inclusion degree of each point on the line-of-sight movement trajectory to each initial physical candidate region based on the spatiotemporal correlation set.
[0132] Furthermore, based on a spatiotemporal association set, the children's watch calculates the spatial inclusion degree between the three-dimensional spatial coordinates of each point on the gaze movement trajectory and each initial object candidate region, and the three-dimensional spatial region of the initial object candidate region. The spatial inclusion degree is represented by a value between 0 and 1. When the point is completely inside the region, the spatial inclusion degree is 1; when the point is completely outside the region, the spatial inclusion degree is 0; when the point is at the region boundary or partially within the region, a value between 0 and 1 is calculated based on the distance between the point and the region boundary. The closer the point is to the boundary and the larger its proportion within the region, the higher the spatial inclusion degree. Optionally, in this embodiment of the invention, the spatial inclusion degree C... i,j The specific formula for (the j-th viewpoint and the i-th initial physical candidate region) is as follows:
[0133]
[0134] Among them, P j Let R be the three-dimensional coordinates of the j-th viewpoint. i Let i be the three-dimensional spatial region of the i-th initial physical candidate region. For region R i The boundary, d0 is the shortest distance from the point to the boundary, and d0 is the set boundary influence distance.
[0135] In one embodiment, for a point (5.0, 3.0, 1.5) on the line-of-sight movement trajectory at 1 second, it is inside the bird region, and its spatial containment degree is 1; for a point (6.0, 4.0, 2.0) at 9 seconds, it is inside the tree region, and its spatial containment degree is 1; for a point (7.0, 5.0, 2.5) at 10 seconds, it is outside the tree region, and its spatial containment degree is 0. If there is a point (5.25, 3.1, 1.5), where the X coordinate 5.25 exceeds the upper limit of the X coordinate of the bird region (5.2) and is 0.05 meters from the boundary, and the other coordinates are within the region, its spatial containment degree is calculated to be 0.8 (set according to the distance from the boundary).
[0136] Step 403: Align the spatial inclusion degree and the blink frequency temporal distribution features corresponding to the blink frequency at each blink occurrence time according to time to obtain the trajectory blink correlation matrix.
[0137] Furthermore, the children's watch first acquires the time and corresponding blink frequency of each blink, forming a blink frequency temporal distribution feature, which reflects the change of blink frequency over time.
[0138] Furthermore, the children's watch aligns the spatial inclusion degree of each gaze movement trajectory point obtained in step 402 with the blink frequency time distribution characteristics according to the time axis. That is, the spatial inclusion degree at the same time point corresponds to the blink frequency, and integrates them to form a trajectory blink association matrix. In this embodiment of the invention, each row of the trajectory blink association matrix contains time, coordinates of the gaze pointing point, spatial inclusion degree of each initial physical candidate area, and blink frequency.
[0139] During the observation of the children, the blinking times and frequencies were as follows: 0.2 blinks / second at 2 seconds, 0.3 blinks / second at 4 seconds, 0.2 blinks / second at 6 seconds, and 0.3 blinks / second at 8 seconds. These blinking frequency temporal distribution characteristics were aligned with spatial coverage by time to form a trajectory blinking correlation matrix. For example, the row data corresponding to 2 seconds is: time 2 seconds, coordinates of the gaze point (5.1, 3.0, 1.5), spatial coverage of the bird area 1, tree area 0, ground area 0, blinking frequency 0.2 blinks / second; the row data corresponding to 4 seconds has a blinking frequency of 0.3 blinks / second, and the spatial coverage of other areas is determined based on the location of that point, and so on.
[0140] Step 404: Based on the trajectory blink association matrix, match the three-dimensional spatial regions of each initial physical candidate region to determine the target physical candidate region.
[0141] Furthermore, the children's watch matches the target object candidate region by combining the trajectory blink association matrix with the three-dimensional spatial region of each initial object candidate region, as detailed in steps 4041 to 4044.
[0142] This invention relates to the method of associating the gaze trajectory with the initial object candidate region, calculating the spatial containment degree, and combining blink frequency characteristics to accurately filter out the target object candidate region that the user is interested in from multiple initial object candidate regions.
[0143] In one embodiment, steps 4041 to 4044 include:
[0144] Step 4041: Determine the dwell characteristic parameters of the time percentage of each initial physical candidate region covered by the line-of-sight trajectory and the average blink frequency based on the trajectory blink correlation matrix.
[0145] Optionally, the children's watch uses a trajectory blink correlation matrix to calculate the total time each initial object candidate area is covered by the gaze trajectory, and divides this total time by the total observation time to obtain the time percentage. Simultaneously, it calculates the average blink frequency when the gaze trajectory covers that area, integrating the time percentage and average blink frequency into a dwell characteristic parameter. This parameter comprehensively reflects the user's level of attention to the initial object candidate area.
[0146] Continuing with the above example, the total observation time for the child is 10 seconds, and there are three initial candidate areas for physical objects: the bird area, the tree area, and the ground area. The trajectory blink correlation matrix shows that the bird area is covered for 7 seconds, corresponding to blink frequencies of 0.3 times / second, 0.2 times / second, 0.3 times / second, 0.2 times / second, 0.3 times / second, 0.2 times / second, and 0.3 times / second respectively, representing 70% of the time. The average blink frequency is (0.3+0.2+0.3+0.2+0.3+0.2+0.3) / 7≈0.257 times / second, with dwell characteristic parameters of (70%, 0.257 times / second). The tree area is covered for 2 seconds, representing 20% of the time, with an average blink frequency of 0.5 times / second and dwell characteristic parameters of (20%, 0.5 times / second). The ground area was covered for 1 second, accounting for 10% of the total time. The average blink rate was 0.6 times / second. The dwell time characteristic parameters were 10% and 0.6 times / second.
[0147] Step 4042: Based on the dwell feature parameters of each initial physical candidate region and the movement rate features of the line-of-sight movement trajectory within the three-dimensional spatial region of each initial physical candidate region, determine the matching degree of each initial physical candidate region.
[0148] Furthermore, the children's watch assigns weights to the time percentage, average blink frequency, and movement rate features in the dwell time characteristic parameters. The higher the time percentage, the higher the matching degree; the lower the average blink frequency (the user is more focused), the higher the matching degree; the slower the movement rate of the gaze trajectory within the initial physical object candidate area (the more detailed the attention), the higher the matching degree.
[0149] Therefore, the children's watch calculates the matching degree of each initial physical candidate region by nonlinearly combining the time proportion, the inverse value of the average blink frequency, and the inverse value of the movement rate feature according to weights.
[0150] Optionally, in this embodiment of the invention, the matching degree M i The specific formula is:
[0151]
[0152] Among them, w R w f w v The weights are respectively the time percentage, the inverse value of the average blink frequency, and the inverse value of the average movement speed, and w R +w f +w v =1. f i,avg Let f be the average blink frequency in the i-th region of the gaze. max The maximum average blink frequency across all regions; v i,avg v is the average speed at which the line of sight moves within the i-th region; max This represents the maximum average movement rate across all regions.
[0153] In one embodiment, the time percentage weight is set to 0.5, the average blink frequency (reverse) weight to 0.3, and the movement speed (reverse) weight to 0.2. The bird area has a time percentage of 70% (equivalent to 0.7), an average blink frequency of 0.257 times / second (reverse value 1 - 0.257 / 1 = 0.743), and a movement speed of 0.1 meters / second (reverse value 1 - 0.1 / 1 = 0.9). The matching degree is 0.7 × 0.5 + 0.743 × 0.3 + 0.9 × 0.2 = 0.7529. The tree area has a time percentage of 20% (0.2), an average blink frequency of 0.5 times / second (reverse value 0.5), and a movement speed of 0.3 meters / second (reverse value 0.7). The matching degree is 0.2 × 0.5 + 0.5 × 0.3 + 0.7 × 0.2 = 0.39. The ground area matching degree was calculated to be 0.21, and the matching degrees of the three were 0.7529, 0.39, and 0.21, respectively.
[0154] Step 4043: Based on the matching degree of each initial physical candidate region, select the initial physical candidate regions whose matching degree exceeds the critical threshold to obtain the high-frequency attention region.
[0155] Furthermore, the children's watch presets a critical threshold for matching degree. It compares the matching degree of each initial physical candidate area with the critical threshold and filters out the initial physical candidate areas whose matching degree exceeds the critical threshold. These areas are high-frequency attention areas, indicating that users pay high attention to them during the observation process.
[0156] In one embodiment, the children's watch sets a threshold of 0.6, and the matching degrees of each initial candidate object region in step 4042 are as follows: bird region 0.7529 (above 0.6), tree region 0.39 (below 0.6), and ground region 0.21 (below 0.6). Therefore, the high-frequency attention region selected is the bird region.
[0157] Step 4044: Based on the trajectory dwell time and peak blink frequency of each high-frequency attention area, determine the target object candidate area.
[0158] Furthermore, the children's watch tracks the dwell time (total time covered by the gaze) of each high-frequency attention area and identifies the peak blink frequency (the highest blink frequency occurring in that area during observation). Typically, the high-frequency attention area with the longest dwell time and the lowest peak blink frequency is selected as a candidate target object area, as this indicates the user's most sustained and focused attention on that area.
[0159] In one embodiment, the high-frequency attention region is only the bird region, whose trajectory dwell time is 7 seconds, and the peak blink frequency in this region is 0.3 times / second (the longest dwell time and the lowest peak value among all high-frequency attention regions). Therefore, the bird region is determined as a candidate region for the target object.
[0160] This invention, through comprehensive analysis of gaze coverage time, blink frequency, and movement speed, accurately selects the target object candidate region that the user is most interested in from multiple initial object candidate regions.
[0161] Furthermore, the intelligent wearable device provided by the present invention will be described below. The intelligent wearable device described below can be referred to in correspondence with the physical object recognition method based on AI big data described above.
[0162] Optional, refer to Figure 2 , Figure 2 This is a schematic diagram of the structure of the smart wearable device provided by the present invention. The smart wearable device integrates a data acquisition unit 210 and a physical object recognition unit 220. The data acquisition unit 210 includes an image sensor 211 and a gaze tracking sensor 212.
[0163] The data acquisition unit 210 is used to acquire continuous environmental images of the user's surrounding environment based on an image sensor, and to capture the user's eye feature data based on a gaze tracking sensor; the eye feature data includes the user's eye rotation angle, pupil center position, and blink frequency;
[0164] The object recognition unit 220 is used for:
[0165] Detection of continuous environmental images based on edge contours yields initial candidate regions for physical objects. Correlation analysis is then performed on rotation angle and pupil center position based on spatial location to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant.
[0166] Spatial transformation is performed based on the imaging parameters of the image sensor and the pixel coordinates of each initial candidate object region to obtain a three-dimensional spatial region. Then, based on the three-dimensional spatial coordinates of the user's gaze point at each instant, trajectory fitting is performed in the time dimension to obtain the user's gaze movement trajectory.
[0167] Based on blink frequency, gaze movement trajectory and the three-dimensional spatial region of each initial object candidate region, the target object candidate region is determined by matching, and the object feature is extracted from the target object candidate region to obtain the first object feature of the object to be identified.
[0168] Based on the first physical object feature and the second physical object feature of the preset AI big data, feature matching is performed to obtain the physical object recognition result.
[0169] This invention determines the user's gaze trajectory by observing the rotation angle of the user's eyeballs and the center position of the pupil. Based on this gaze trajectory, it filters out the target object candidate region from all initial object candidate regions identified from the environmental image. Then, it performs feature matching between the object's features in the target object candidate region and the object features in the preset AI big data to obtain the object recognition result. Therefore, during object recognition, the gaze trajectory focuses on the area of the object the user is truly interested in, reducing interference from other irrelevant objects in complex backgrounds and avoiding interference from irrelevant object recognition results, thus improving the accuracy of object recognition. Since the object recognition result only requires the user's blink frequency and gaze trajectory to identify the object of interest, manual filtering is unnecessary, improving the efficiency of object recognition.
[0170] Please see Figure 3 , Figure 3 An embodiment diagram of an electronic device provided in accordance with the present invention. For example... Figure 3 As shown, this embodiment of the invention provides an electronic device 300, including a memory 310, a processor 320, and a computer program 311 stored in the memory 310 and executable on the processor 320. When the processor 320 executes the computer program 311, it performs the following steps:
[0171] The system acquires continuous environmental images of the user's surroundings using an image sensor, and captures the user's eye feature data using a gaze-tracking sensor; the eye feature data includes the user's eye rotation angle, pupil center position, and blink frequency;
[0172] Detection of continuous environmental images based on edge contours yields initial candidate regions for physical objects. Correlation analysis is then performed on rotation angle and pupil center position based on spatial location to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant.
[0173] Spatial transformation is performed based on the imaging parameters of the image sensor and the pixel coordinates of each initial candidate object region to obtain a three-dimensional spatial region. Then, based on the three-dimensional spatial coordinates of the user's gaze point at each instant, trajectory fitting is performed in the time dimension to obtain the user's gaze movement trajectory.
[0174] Based on blink frequency, gaze movement trajectory and the three-dimensional spatial region of each initial object candidate region, the target object candidate region is determined by matching, and the object feature is extracted from the target object candidate region to obtain the first object feature of the object to be identified.
[0175] Based on the first physical object feature and the second physical object feature of the preset AI big data, feature matching is performed to obtain the physical object recognition result.
[0176] Please see Figure 4 , Figure 4 An embodiment diagram of a computer-readable storage medium provided in accordance with an embodiment of the present invention is shown. Figure 4 As shown, this embodiment provides a computer-readable storage medium 400 on which a computer program 311 is stored. When the computer program 311 is executed by a processor, it performs the following steps:
[0177] The system acquires continuous environmental images of the user's surroundings using an image sensor, and captures the user's eye feature data using a gaze-tracking sensor; the eye feature data includes the user's eye rotation angle, pupil center position, and blink frequency;
[0178] Detection of continuous environmental images based on edge contours yields initial candidate regions for physical objects. Correlation analysis is then performed on rotation angle and pupil center position based on spatial location to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant.
[0179] Spatial transformation is performed based on the imaging parameters of the image sensor and the pixel coordinates of each initial candidate object region to obtain a three-dimensional spatial region. Then, based on the three-dimensional spatial coordinates of the user's gaze point at each instant, trajectory fitting is performed in the time dimension to obtain the user's gaze movement trajectory.
[0180] Based on blink frequency, gaze movement trajectory and the three-dimensional spatial region of each initial object candidate region, the target object candidate region is determined by matching, and the object feature is extracted from the target object candidate region to obtain the first object feature of the object to be identified.
[0181] Based on the first physical object feature and the second physical object feature of the preset AI big data, feature matching is performed to obtain the physical object recognition result.
[0182] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer is able to execute the AI big data-based object recognition method provided by the above methods, the method comprising:
[0183] The system acquires continuous environmental images of the user's surroundings using an image sensor, and captures the user's eye feature data using a gaze-tracking sensor; the eye feature data includes the user's eye rotation angle, pupil center position, and blink frequency;
[0184] Detection of continuous environmental images based on edge contours yields initial candidate regions for physical objects. Correlation analysis is then performed on rotation angle and pupil center position based on spatial location to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant.
[0185] Spatial transformation is performed based on the imaging parameters of the image sensor and the pixel coordinates of each initial candidate object region to obtain a three-dimensional spatial region. Then, based on the three-dimensional spatial coordinates of the user's gaze point at each instant, trajectory fitting is performed in the time dimension to obtain the user's gaze movement trajectory.
[0186] Based on blink frequency, gaze movement trajectory and the three-dimensional spatial region of each initial object candidate region, the target object candidate region is determined by matching, and the object feature is extracted from the target object candidate region to obtain the first object feature of the object to be identified.
[0187] Based on the first physical object feature and the second physical object feature of the preset AI big data, feature matching is performed to obtain the physical object recognition result.
[0188] The system embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0189] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0190] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for physical object recognition based on AI big data, characterized in that, Applied to smart wearable devices; the smart wearable device includes an image sensor and a gaze-tracking sensor; the method includes: The image sensor acquires continuous environmental images of the user's surroundings, and the eye-tracking sensor captures the user's eye feature data; the eye feature data includes the user's eye rotation angle, pupil center position, and blink frequency; The continuous environmental image is detected based on edge contours to obtain initial candidate regions for physical objects. The rotation angle and the pupil center position are correlated based on spatial position to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant. Based on the imaging parameters of the image sensor and the pixel coordinates of each initial candidate object region, a spatial transformation is performed to obtain a three-dimensional spatial region. Then, based on the three-dimensional spatial coordinates of the user's gaze point at each instant, a trajectory is fitted in the time dimension to obtain the user's gaze movement trajectory. Based on the blink frequency and the line of sight movement trajectory, the three-dimensional spatial region of each initial object candidate region is matched to determine the target object candidate region, and the object feature is extracted from the target object candidate region to obtain the first object feature of the object to be identified. Based on the first physical object feature and the second physical object feature of the preset AI big data, feature matching is performed to obtain the physical object recognition result of the object to be identified; Among them, determining the candidate regions for the target physical object includes: By associating all points on the line-of-sight movement trajectory with the three-dimensional spatial region of each initial physical object candidate region according to the time dimension, a spatiotemporal association set is obtained; Based on the spatiotemporal association set, determine the spatial inclusion degree of each point on the line of sight movement trajectory to each initial physical candidate region; Align the spatial inclusion degree and the blink frequency temporal distribution features corresponding to the blink frequency at each blink occurrence time according to time to obtain the trajectory blink correlation matrix; The target candidate region is determined by matching the three-dimensional spatial region of each initial physical object candidate region with the trajectory blink association matrix. The process of matching the target object candidate region based on the trajectory blink correlation matrix with the three-dimensional spatial region of each initial object candidate region includes: Based on the trajectory blink correlation matrix, the dwell characteristic parameters of the time percentage covered by the line of sight trajectory and the average blink frequency of each initial physical candidate area are determined. Based on the dwell characteristic parameters of each initial physical candidate region and the movement rate characteristics of the line of sight movement trajectory within the three-dimensional spatial region of each initial physical candidate region, the matching degree of each initial physical candidate region is determined. Based on the matching degree of each initial physical candidate region, initial physical candidate regions with matching degrees exceeding the critical threshold are selected to obtain high-frequency interest regions; The target object candidate region is determined based on the trajectory dwell time and peak blink frequency of each high-frequency attention region.
2. The object recognition method based on AI big data according to claim 1, characterized in that, The process of fitting the user's gaze trajectory in the time dimension based on the three-dimensional spatial coordinates of the user's gaze point at each instant includes: Arrange the three-dimensional spatial coordinates of the points in which the user's gaze points at consecutive time points in chronological order to obtain a spatiotemporal coordinate sequence matrix; Based on the spatiotemporal coordinate sequence matrix, the change in position of the line-of-sight pointing point in the environmental absolute coordinate system between two adjacent time points is determined to obtain the spatial displacement vector; the environmental absolute coordinate system takes three non-collinear rigid feature points in the environment as reference points, the first rigid feature point as the origin, the direction from the second rigid feature point to the third rigid feature point as the positive direction of the horizontal axis, the plane perpendicular to the horizontal axis from the first rigid feature point, the direction from the first rigid feature point to the third rigid feature point as the positive direction of the vertical axis, and the direction pointed to by rotating the positive direction of the horizontal axis 90° around the positive direction of the vertical axis as the positive direction of the vertical axis; Based on the spatial displacement vector and the time interval between two adjacent line-of-sight pointing points, the velocity vector of the line-of-sight movement is determined, and the spatial displacement vector and the velocity vector are correlated in chronological order to obtain the displacement-velocity spatiotemporal correlation matrix. The line-of-sight movement trajectory is obtained by fitting the trajectory based on the displacement rate spatiotemporal correlation matrix.
3. The method for physical object recognition based on AI big data according to claim 2, characterized in that, Trajectory fitting is performed based on the displacement rate spatiotemporal correlation matrix to obtain the line-of-sight movement trajectory, including: Based on the angle between adjacent spatial displacement vectors and the magnitude of adjacent velocity vectors in the displacement rate spatiotemporal correlation matrix, the turning characteristic parameters of the degree of directional change and velocity of two adjacent line-of-sight trajectories are determined; the turning characteristic parameters characterize the degree of turning of the line-of-sight trajectory. Based on the steering feature parameters, the line-of-sight trajectory is divided into different continuous segments of line-of-sight sub-trajectories, and the spatiotemporal boundaries of each line-of-sight sub-trajectory segment are determined. Based on the spatiotemporal boundaries of each line-of-sight sub-trajectory segment, a parameterized equation for each line-of-sight sub-trajectory is constructed within its time interval; the parameterized equation represents the coordinates of the line-of-sight pointing point at any time within each line-of-sight sub-trajectory segment. The parametric equations of each line-of-sight sub-trajectory are fitted in chronological order to obtain the line-of-sight movement trajectory.
4. The object recognition method based on AI big data according to any one of claims 1 to 3, characterized in that, The correlation analysis based on spatial location of the rotation angle and the pupil center position to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant includes: The horizontal and vertical rotation angles of the eyeball rotation angles are converted into unit vectors of the line of sight direction in the head coordinate system. The head coordinate system takes the midpoint of the line connecting the centers of the pupils of both eyes as the origin, the direction pointing to the center of the right pupil as the positive direction of the horizontal axis, and the positive direction of the vertical axis in the plane perpendicular to the horizontal axis and passing through the origin, which is consistent with the line of sight. The positive direction of the vertical axis is determined by the cross product of the positive direction of the horizontal axis and the positive direction of the vertical axis. Based on the intrinsic parameter matrix of the image sensor, the two-dimensional image coordinates of the pupil center position are converted into three-dimensional coordinates of the pupil center in the head coordinate system; Based on the unit vector of the gaze direction and the three-dimensional coordinates of the pupil center, the initial gaze direction in the head coordinate system is determined. Based on the initial line of sight, the line of sight is analyzed to determine the three-dimensional spatial coordinates of the user's line of sight in the environment at each instant.
5. The object recognition method based on AI big data according to claim 4, characterized in that, The step of analyzing the gaze point based on the initial gaze direction to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant includes: Based on the first coordinate system transformation relationship between the head coordinate system and the image sensor coordinate system, the initial gaze direction is transformed to obtain the target gaze direction in the image sensor coordinate system; the image sensor coordinate system is with the optical center of the image sensor as the origin, the direction pointing horizontally to the right of the image sensor imaging plane as the positive direction of the horizontal axis, and the direction pointing vertically upwards of the image sensor imaging plane as the positive direction of the vertical axis, and the positive direction of the vertical axis is determined by the cross product of the positive direction of the horizontal axis and the positive direction of the vertical axis. Spatial intersection analysis is performed based on the target line-of-sight direction and the depth information of each candidate object region in the continuous environmental image to obtain the coordinates of the intersection point in the image sensor coordinate system; the depth information is coordinate-related to the image sensor coordinate system. Based on the second coordinate system transformation relationship between the absolute environmental coordinate system and the image sensor coordinate system, the coordinates of the intersection point are transformed to obtain the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant.
6. A smart wearable device, characterized in that, The method is applied to the AI-based big data-based object recognition method as described in any one of claims 1 to 5; the smart wearable device integrates a data acquisition unit and an object recognition unit, the data acquisition unit including an image sensor and a gaze tracking sensor; The data acquisition unit is used to acquire continuous environmental images of the user's surrounding environment based on the image sensor, and to capture the user's eye feature data based on the gaze tracking sensor; the eye feature data includes the user's eyeball rotation angle, pupil center position, and blink frequency; The object recognition unit is used for: The continuous environmental image is detected based on edge contours to obtain initial candidate regions for physical objects. The rotation angle and the pupil center position are correlated based on spatial position to determine the three-dimensional spatial coordinates of the user's gaze point in the environment at each instant. Based on the imaging parameters of the image sensor and the pixel coordinates of each initial candidate object region, a spatial transformation is performed to obtain a three-dimensional spatial region. Then, based on the three-dimensional spatial coordinates of the user's gaze point at each instant, a trajectory is fitted in the time dimension to obtain the user's gaze movement trajectory. Based on the blink frequency and the line of sight movement trajectory, the three-dimensional spatial region of each initial object candidate region is matched to determine the target object candidate region, and the object feature is extracted from the target object candidate region to obtain the first object feature of the object to be identified. Based on the first physical object feature and the second physical object feature of the preset AI big data, feature matching is performed to obtain the physical object recognition result of the object to be identified.
7. An electronic device, comprising: Memory, used to store computer software programs; A processor for reading and executing the computer software program, characterized in that, when the processor executes the computer software program, it implements the object recognition method based on AI big data as described in any one of claims 1 to 5.
8. A non-transitory computer-readable storage medium, wherein a computer software program is stored therein, characterized in that, When the computer software program is executed by the processor, it implements the object recognition method based on AI big data as described in any one of claims 1 to 5.