Hand key point recognition method, device and equipment and storage medium

By aligning and cropping hand images, and combining this with a loss function to guide model training, the problem of high computational cost in hand keypoint recognition was solved, achieving efficient and accurate recognition on mobile devices.

CN115937970BActive Publication Date: 2026-06-12SHENZHEN ZEGO TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN ZEGO TECH CO LTD
Filing Date
2022-11-10
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing hand key point recognition methods involve large computational loads, making them unfriendly for mobile devices and lacking in accuracy and robustness in real-time dynamic scenarios.

Method used

By obtaining the coordinates of the previous frame image and aligning them with the current frame image, a rotation matrix is ​​constructed to align the image. The hand coordinates are then extracted and detected using a pre-set hand keypoint model. The model training is guided by joint length constraints and joint similarity loss functions, thereby reducing computational load and model overhead.

🎯Benefits of technology

It reduces the computational load and recognition time of hand key point recognition, improves the accuracy and robustness of recognition, and is suitable for mobile applications.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115937970B_ABST
    Figure CN115937970B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of hand key point recognition, and discloses a hand key point recognition method, device, equipment and storage medium. The hand key point recognition method comprises: acquiring a first coordinate and a second coordinate of a previous frame image; performing point position alignment processing on a current frame image based on the first coordinate and the second coordinate to obtain an aligned image; acquiring a hand coordinate based on the aligned image, and obtaining a to-be-analyzed image by cutting based on the hand coordinate; detecting the to-be-analyzed image using a preset hand key point model to obtain a first key point coordinate and an image score; and judging whether the to-be-analyzed image is a hand image based on the image score. The present application identifies the hand key point based on the previous frame image, improves the hand tracking accuracy, and reduces the calculation amount and identification time consumption.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of hand key point recognition, and more particularly to a hand key point recognition method, apparatus, device, and storage medium. Background Technology

[0002] With the continuous maturation of artificial intelligence technology, more and more application scenarios are beginning to support human-computer interaction, and gesture interaction is a common human-computer interaction method. Therefore, gesture recognition has gradually become a research hotspot, with wide applications in autonomous driving, game control, robot design, and intelligent teaching instruments. The key to gesture recognition lies in the recognition of key hand points. After identifying the coordinates of key hand points through a pre-trained hand key point recognition model, the gesture can be confirmed.

[0003] Because hand keypoints involve significant movements, motion blur occurs during hand movements in real-time dynamic scenarios, leading to low accuracy and weak robustness in practical applications of hand keypoint recognition algorithms. To improve recognition accuracy, the common approach is to activate a hand position detection model in real-time to obtain the precise hand position, and then use a hand keypoint recognition model to identify key points. This increases model overhead, and given the limited computing power of mobile devices, this method is unsuitable for mobile applications. Summary of the Invention

[0004] The main objective of this invention is to provide a method, apparatus, device, and storage medium for hand key point recognition, aiming to solve the technical problem of high computational load in existing hand key point recognition methods.

[0005] The first aspect of this invention provides a method for recognizing key points of a hand, comprising:

[0006] S1. Obtain the first and second coordinates of the previous frame image;

[0007] S2. Based on the first coordinate and the second coordinate, perform point alignment processing on the current frame image to obtain an aligned image;

[0008] S3. Based on the aligned image, obtain the hand coordinates, and based on the hand coordinates, extract the image to be analyzed;

[0009] S4. Use a preset hand key point model to detect the image to be analyzed, and obtain the coordinates of the first key point and the image score;

[0010] S5. Based on the image score, determine whether the image to be analyzed is a hand image.

[0011] Optionally, in a first implementation of the first aspect of the present invention, the hand key point recognition method further includes:

[0012] When the current frame image is the first frame image, the current frame image is detected using a preset limb key point model to obtain the first coordinate and the second coordinate of the current frame image.

[0013] Optionally, in a second implementation of the first aspect of the present invention, after determining whether the image to be analyzed is a hand image based on the image score, the method further includes:

[0014] If the image to be analyzed is not a hand image, the preset limb key point model is used to detect the current frame image to obtain the first coordinate and second coordinate of the current frame image, and steps S2-S5 are repeated.

[0015] Optionally, in a third implementation of the first aspect of the present invention, after determining whether the image to be analyzed is a hand image based on the image score, the method further includes:

[0016] If the image to be analyzed is a hand image, then based on the first key point coordinates, the second key point coordinates of the hand on the current frame image are obtained.

[0017] Optionally, in a fourth implementation of the first aspect of the present invention, the step of performing point alignment processing on the current frame image based on the first coordinates and the second coordinates to obtain an aligned image includes:

[0018] Obtain the angle between the line connecting the first and second coordinate points and the vertical direction;

[0019] Calculate the coordinates of the center point between the first coordinate and the second coordinate;

[0020] Construct a rotation matrix based on the included angle and the coordinates of the center point;

[0021] The aligned image is calculated based on the rotation matrix and the current frame image;

[0022] The rotation matrix is ​​as follows:

[0023]

[0024] Where M represents the rotation matrix, θ represents the angle between the line connecting the first and second coordinate points and the vertical direction, and c x c represents the x-coordinate of the center point. y The ordinate represents the center point.

[0025] Optionally, in the fifth implementation of the first aspect of the present invention, there are a total of 21 key points and 20 joints in the hand, and the joint length constraint loss function of the preset hand key point model is as follows:

[0026]

[0027] Among them, E len Let represent the joint length constraint loss function, T represent the joint tree, j[0] represent the first hand keypoint in the j-th joint, j[1] represent the second hand keypoint in the j-th joint, and pre j[0] Pre represents the coordinates of keypoint j[0] predicted by the model. j[1] The coordinates of keypoint j[1] predicted by the model are represented by gt. j[0] The actual coordinates of the key hand point j[0] are represented by gt. j[1] The actual coordinates of the key hand point j[1] are represented.

[0028] Optionally, in the sixth implementation of the first aspect of the present invention, the joint pre-similarity loss function of the pre-set hand key point model is as follows:

[0029]

[0030] Where V represents the joint vector, v i Let M[i]1-M[i]0, where M represents the vector list corresponding to the joint tree.

[0031] A second aspect of the present invention provides a hand key point recognition device, comprising:

[0032] The acquisition module is used to acquire the first and second coordinates of the previous frame image;

[0033] The alignment module is used to perform point alignment processing on the current frame image based on the first coordinate and the second coordinate to obtain an aligned image;

[0034] The screenshot module is used to obtain the hand coordinates based on the aligned image, and to capture the image to be analyzed based on the hand coordinates;

[0035] The hand detection module is used to detect the image to be analyzed using a preset hand key point model, and obtain the coordinates of the first key point and the image score;

[0036] The judgment module is used to determine whether the image to be analyzed is a hand image based on the image score.

[0037] Optionally, in a first implementation of the second aspect of the present invention, the hand key point recognition device further includes:

[0038] The limb detection module is used to detect the current frame image using a preset limb key point model when the current frame image is the first frame image, and obtain the first coordinate and second coordinate of the current frame image.

[0039] Optionally, in a second implementation of the second aspect of the present invention, the limb detection module is further configured to:

[0040] If the image to be analyzed is not a hand image, the preset limb key point model is used to detect the current frame image to obtain the first coordinate and the second coordinate of the current frame image, and the first coordinate and the second coordinate are input into the acquisition module.

[0041] Optionally, in a third implementation of the second aspect of the present invention, the hand key point recognition device further includes:

[0042] The calculation module is used to obtain the coordinates of a second key point of the hand on the current frame image based on the coordinates of the first key point if the image to be analyzed is a hand image.

[0043] Optionally, in a fourth implementation of the second aspect of the present invention, the alignment module is specifically used for:

[0044] Obtain the angle between the line connecting the first and second coordinate points and the vertical direction;

[0045] Calculate the coordinates of the center point between the first coordinate and the second coordinate;

[0046] Construct a rotation matrix based on the included angle and the coordinates of the center point;

[0047] The aligned image is calculated based on the rotation matrix and the current frame image;

[0048] The rotation matrix is ​​as follows:

[0049]

[0050] Where M represents the rotation matrix, θ represents the angle between the line connecting the first and second coordinate points and the vertical direction, and c x c represents the x-coordinate of the center point. y The ordinate represents the center point.

[0051] Optionally, in the fifth implementation of the second aspect of the present invention, there are a total of 21 key points and 20 joints in the hand, and the joint length constraint loss function of the preset hand key point model is as follows:

[0052]

[0053] Among them, E len Let represent the joint length constraint loss function, T represent the joint tree, j[0] represent the first hand keypoint in the j-th joint, j[1] represent the second hand keypoint in the j-th joint, and prej[0] Pre represents the coordinates of keypoint j[0] predicted by the model. j[1] The coordinates of keypoint j[1] predicted by the model are represented by gt. j[0] The actual coordinates of the key hand point j[0] are represented by gt. j[1] The actual coordinates of the key hand point j[1] are represented.

[0054] Optionally, in the sixth implementation of the second aspect of the present invention, the joint pre-similarity loss function of the pre-set hand key point model is as follows:

[0055]

[0056] Where V represents the joint vector, v i M[i]1-M[i]0, where i is 0 or 1, and M represents the vector list corresponding to the joint tree.

[0057] A third aspect of the present invention provides an electronic device, comprising: a memory and at least one processor, wherein the memory stores instructions; the at least one processor invokes the instructions in the memory to cause the electronic device to perform the above-described hand key point recognition method.

[0058] A fourth aspect of the present invention provides a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the above-described hand key point recognition method.

[0059] The technical solution provided by this invention involves obtaining the first and second coordinates of the previous frame image; performing point alignment processing on the current frame image based on the first and second coordinates to obtain an aligned image; obtaining hand coordinates based on the aligned image, and cropping the image to be analyzed based on the hand coordinates; using a preset hand keypoint model to detect the image to be analyzed, obtaining the first keypoint coordinates and image score; and determining whether the image to be analyzed is a hand image based on the image score. This invention recognizes hand keypoints based on the previous frame image, reducing model overhead and computational load, and decreasing recognition time, making it suitable for mobile applications. Attached Figure Description

[0060] Figure 1 This is a schematic diagram of an embodiment of the hand key point recognition method in this invention;

[0061] Figure 2 This is a schematic diagram of another embodiment of the hand key point recognition method in this invention;

[0062] Figure 3 This is a schematic diagram of one embodiment of the hand key point recognition device according to the present invention;

[0063] Figure 4 This is a schematic diagram of another embodiment of the hand key point recognition device in this invention;

[0064] Figure 5 This is a schematic diagram of one embodiment of the electronic device in this invention. Detailed Implementation

[0065] This invention provides a method, apparatus, device, and storage medium for hand key point recognition. It recognizes hand key points based on the previous frame image, reducing model overhead and computational load, and reducing recognition time. It is suitable for mobile applications.

[0066] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” or “having,” and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0067] For ease of understanding, the specific process of the embodiments of the present invention is described below. Please refer to [link / reference]. Figure 1 One embodiment of the hand key point recognition method in this invention includes:

[0068] 101. Obtain the first and second coordinates of the previous frame image;

[0069] It is understood that the executing entity of this invention can be a hand key point recognition device, a terminal, or a server; no specific limitation is made here. This embodiment of the invention will be described using a server as an example.

[0070] In this embodiment, the first coordinate and the second coordinate are obtained by identifying two points of the key points of the hand through a preset hand key point recognition model. Based on these two point coordinates, the approximate position of the hand can be determined. Typically, the key points of the hand are the 21 bone nodes of the hand. The first coordinate and the second coordinate are selected as the coordinates of the center point of the palm and the coordinates of the wrist point.

[0071] 102. Based on the first coordinate and the second coordinate, perform point alignment processing on the current frame image to obtain an aligned image;

[0072] Optionally, in one embodiment, step 102 above includes:

[0073] Obtain the angle between the line connecting the first and second coordinate points and the vertical direction;

[0074] Calculate the coordinates of the center point between the first coordinate and the second coordinate;

[0075] Construct a rotation matrix based on the included angle and the coordinates of the center point;

[0076] The aligned image is calculated based on the rotation matrix and the current frame image;

[0077] The rotation matrix is ​​as follows:

[0078]

[0079] Where M represents the rotation matrix, θ represents the angle between the line connecting the first and second coordinate points and the vertical direction, and c x c represents the x-coordinate of the center point. y The ordinate represents the center point.

[0080] Specifically, the current frame image is rotated using a rotation matrix, and the angle between the line connecting the first and second coordinate points in the resulting aligned image and the vertical direction is 0.

[0081] 103. Based on the aligned image, obtain the hand coordinates, and based on the hand coordinates, extract the image to be analyzed;

[0082] In this embodiment, a hand recognition model is used to identify the aligned image and obtain the hand coordinates. The hand recognition model used is not limited.

[0083] Optionally, the hand coordinates can be represented by the coordinates of the upper left and lower right corners of the hand area. For example, if the upper left corner coordinate is (l,t) and the lower right corner coordinate is (r,b), then the hand coordinates can be represented by [l,t,r,b].

[0084] In this embodiment, the aligned image is cropped based on the hand coordinates to obtain the image to be analyzed.

[0085] 104. Use a preset hand key point model to detect the image to be analyzed, and obtain the coordinates of the first key point and the image score;

[0086] Specifically, the hand keypoint model is used to identify key points of the hand. The specific hand keypoint model used is not limited, and the coordinates of the first identified keypoint are the coordinates of 21 key points of the hand. The hand keypoint model has a self-tracking branch, which determines the probability that the image to be analyzed is a hand image and outputs the determination result, i.e., the image score.

[0087] 105. Based on the image score, determine whether the image to be analyzed is a hand image.

[0088] Specifically, if the image score is greater than a preset threshold, the image to be analyzed is considered a hand image; otherwise, it is considered not to be a hand image. If the image to be analyzed is determined to be a hand image, the 21 hand key points identified by the hand key point model are relatively accurate and reliable; otherwise, the identified key point coordinates are unreliable.

[0089] Furthermore, if the image to be analyzed is not a hand image, a preset limb keypoint model is used to detect the current frame image to obtain the first and second coordinates of the current frame image, and steps 104-105 are repeated. The limb keypoint model is used to detect limb keypoints, wherein the detected limb keypoints include the hand keypoints corresponding to the first and second coordinate points.

[0090] Furthermore, if the image to be analyzed is a hand image, then based on the coordinates of the first keypoint, the coordinates of the second keypoint of the hand on the current frame image are obtained. Specifically, based on the coordinates of the first keypoint of the image to be analyzed, the coordinates of 21 hand keypoints on the aligned image are calculated, and then by inverse rotation, the keypoints are restored to the original image (the current frame image), thus obtaining the coordinates of the hand keypoints in the original image. The first and second coordinates of these hand keypoint coordinates are used as the hand position in the next frame image.

[0091] In this embodiment of the invention, the first and second coordinates of the previous frame image are obtained; based on the first and second coordinates, the current frame image is aligned to obtain an aligned image; based on the aligned image, the hand coordinates are obtained, and the image to be analyzed is cropped based on the hand coordinates; a preset hand keypoint model is used to detect the image to be analyzed, obtaining the first keypoint coordinates and image score; based on the image score, it is determined whether the image to be analyzed is a hand image. This invention recognizes hand keypoints based on the previous frame image, reducing model overhead and computational load, and reducing recognition time, making it suitable for mobile applications.

[0092] Please see Figure 2 Another embodiment of the hand key point recognition method in this invention includes:

[0093] 201. When the current frame image is the first frame image, the current frame image is detected using a preset limb key point model to obtain the first coordinate and the second coordinate of the current frame image;

[0094] Specifically, in real-time hand tracking and detection, video stream data is continuously input. Limb keypoint detection is performed on the first frame of the video stream using the limb keypoint model described in the above embodiment. The first and second coordinates of the current frame (first frame) are obtained. The limb keypoint model is applied only once, and subsequent hand tracking no longer relies on it, effectively reducing computational load and actual usage time, making it suitable for mobile devices.

[0095] 202. When the current frame image is not the first frame image, obtain the first coordinate and the second coordinate of the previous frame image;

[0096] Specifically, when the current frame image is not the first frame image, the first and second coordinates of the hand key points in the previous frame image are obtained.

[0097] 203. Based on the first coordinate and the second coordinate, perform point alignment processing on the current frame image to obtain an aligned image;

[0098] 204. Based on the aligned image, obtain the hand coordinates, and based on the hand coordinates, extract the image to be analyzed;

[0099] 205. Use a preset hand key point model to detect the image to be analyzed, and obtain the coordinates of the first key point and the image score;

[0100] Optionally, in one embodiment, there are 21 key points and 20 joints in the hand, and the joint length constraint loss function of the preset hand key point model is as follows:

[0101]

[0102] Among them, E len Let represent the joint length constraint loss function, T represent the joint tree, j[0] represent the first hand keypoint in the j-th joint, j[1] represent the second hand keypoint in the j-th joint, and pre j[0] Pre represents the coordinates of keypoint j[0] predicted by the model. j[1] The coordinates of keypoint j[1] predicted by the model are represented by gt. j[0] The actual coordinates of the key hand point j[0] are represented by gt. j[1] The actual coordinates of the key hand point j[1] are represented.

[0103] Specifically, each joint in the joint tree corresponds to an index. A joint is represented by two hand key points, i.e., the line connecting the two hand key points. The joint tree T built based on 21 hand key points is {"0":[0,1],"1":[1,2],"2":[2,3],"3":[3,4],"4":[0,5],"5":[5,6],"6":[6,7],"7":[7,8],"8":[0,9],"9":[9,10]," 10”:[10,11],“11”:[11,12],“12”:[0,13],“13”:[13,14],“14”:[14,15],“15”:[15,16],“16”:[0,17],“17”:[17,18],“18”:[18,19],“19”:[19,20]}, where “i” is the index, representing the i-th joint, and [i,j] represents the joint composed of keypoint i and keypoint j.

[0104] Specifically, by minimizing the joint length constraint loss function, we can obtain pre-validation information about the length between the predicted key points, which helps the model converge to a more accurate position.

[0105] Optionally, in one embodiment, the joint pre-similarity loss function of the pre-set hand keypoint model is as follows:

[0106]

[0107] Where V represents the joint vector, v i M[i]1-M[i]0, where i is 0 or 1, and M represents the vector list corresponding to the joint tree.

[0108] Specifically, M = T, and using the keypoint indices in the vector table, a vector V:{v0,v1,...v19} is constructed for each joint, where v i = M[i]1-M[i]0, where M[i]0 represents the first keypoint in the T[“i”] joint, and M[i]1 represents the second keypoint in the T[“i”] joint. This is achieved by comparing the prediction vectors (pre...) of the same joint... v0 ) and label vector (gt) v0 =gt j[1] -gt j[0] The similarity of the vectors is used to constrain the direction of the predicted vector joints.

[0109] Indicates pre v and gt vThe similarity between two vectors is as follows: the higher the similarity, the closer the predicted joint is to the labeled joint. In the extreme case, if the predicted joint and the labeled joint overlap, the similarity between the predicted joint and the labeled joint is 1.

[0110] In this embodiment, since the image to be analyzed in the hand keypoint model is calculated from the keypoints of the previous frame, there is inherent frame difference, making the captured hand parts not very accurate. This places high demands on the performance of the subsequent hand keypoint model. By adding the above two additional loss functions to guide network training and increasing the prior information on the length and angle between the 21 keypoints, the accuracy of the regression of the 21 keypoints is significantly improved.

[0111] 206. Based on the image score, determine whether the image to be analyzed is a hand image.

[0112] In this embodiment of the invention, when the current frame image is the first frame image, a preset limb keypoint model is used to detect the current frame image to obtain the first and second coordinates of the current frame image; when the current frame image is not the first frame image, the first and second coordinates of the previous frame image are obtained; based on the first and second coordinates, the current frame image is processed for point alignment to obtain an aligned image; based on the aligned image, the hand coordinates are obtained, and the image to be analyzed is cropped based on the hand coordinates; a preset hand keypoint model is used to detect the image to be analyzed to obtain the first keypoint coordinates and image score; based on the image score, it is determined whether the image to be analyzed is a hand image. This invention recognizes hand keypoints based on the previous frame image, reducing model overhead and computational load, reducing recognition time, and is suitable for mobile applications. At the same time, by adding two loss functions to guide the training of the hand keypoint model, the regression accuracy is improved, and the performance of the hand keypoint model is enhanced.

[0113] The above describes the hand key point recognition method in the embodiments of the present invention. The following describes the hand key point recognition device in the embodiments of the present invention. Please refer to [link / reference]. Figure 3 One embodiment of the hand key point recognition device in this invention includes:

[0114] The acquisition module 301 is used to acquire the first and second coordinates of the previous frame image;

[0115] Alignment module 302 is used to perform point alignment processing on the current frame image based on the first coordinate and the second coordinate to obtain an aligned image;

[0116] The screenshot module 303 is used to obtain the hand coordinates based on the aligned image, and to capture the image to be analyzed based on the hand coordinates;

[0117] The hand detection module 304 is used to detect the image to be analyzed using a preset hand key point model, and obtain the coordinates of the first key point and the image score;

[0118] The judgment module 305 is used to determine whether the image to be analyzed is a hand image based on the image score.

[0119] In this embodiment, the alignment module 302 is specifically used for:

[0120] Obtain the angle between the line connecting the first and second coordinate points and the vertical direction;

[0121] Calculate the coordinates of the center point between the first coordinate and the second coordinate;

[0122] Construct a rotation matrix based on the included angle and the coordinates of the center point;

[0123] The aligned image is calculated based on the rotation matrix and the current frame image;

[0124] The rotation matrix is ​​as follows:

[0125]

[0126] Where M represents the rotation matrix, θ represents the angle between the line connecting the first and second coordinate points and the vertical direction, and c x c represents the x-coordinate of the center point. y The ordinate represents the center point.

[0127] In this embodiment of the invention, the first and second coordinates of the previous frame image are obtained; based on the first and second coordinates, the current frame image is aligned to obtain an aligned image; based on the aligned image, the hand coordinates are obtained, and the image to be analyzed is cropped based on the hand coordinates; a preset hand keypoint model is used to detect the image to be analyzed, obtaining the first keypoint coordinates and image score; based on the image score, it is determined whether the image to be analyzed is a hand image. This invention recognizes hand keypoints based on the previous frame image, reducing model overhead and computational load, and reducing recognition time, making it suitable for mobile applications.

[0128] Please see Figure 4 Another embodiment of the hand key point recognition device in this invention includes:

[0129] The acquisition module 301 is used to acquire the first and second coordinates of the previous frame image;

[0130] Alignment module 302 is used to perform point alignment processing on the current frame image based on the first coordinate and the second coordinate to obtain an aligned image;

[0131] The screenshot module 303 is used to obtain the hand coordinates based on the aligned image, and to capture the image to be analyzed based on the hand coordinates;

[0132] The hand detection module 304 is used to detect the image to be analyzed using a preset hand key point model, and obtain the coordinates of the first key point and the image score;

[0133] The judgment module 305 is used to determine whether the image to be analyzed is a hand image based on the image score.

[0134] The limb detection module 306 is used to detect the current frame image using a preset limb key point model when the current frame image is the first frame image, and obtain the first coordinate and second coordinate of the current frame image;

[0135] The calculation module 307 is used to obtain the coordinates of the second key point of the hand on the current frame image based on the first key point coordinates if the image to be analyzed is a hand image.

[0136] Optionally, the limb detection module 306 can also be specifically used for:

[0137] If the image to be analyzed is not a hand image, the preset limb key point model is used to detect the current frame image to obtain the first coordinate and the second coordinate of the current frame image, and the first coordinate and the second coordinate are input into the acquisition module.

[0138] Optionally, there are 21 key points and 20 joints in the hand. The joint length constraint loss function of the preset hand key point model is as follows:

[0139]

[0140] Among them, E len Let represent the joint length constraint loss function, T represent the joint tree, j[0] represent the first hand keypoint in the j-th joint, j[1] represent the second hand keypoint in the j-th joint, and pre j[0] Pre represents the coordinates of keypoint j[0] predicted by the model. j[1] The coordinates of keypoint j[1] predicted by the model are represented by gt. j[0] The actual coordinates of the key hand point j[0] are represented by gt. j[1] The actual coordinates of the key hand point j[1] are represented.

[0141] Optionally, the pre-similarity loss function for the pre-built hand keypoint model is as follows:

[0142]

[0143] Where V represents the joint vector, v i Let M[i]1-M[i]0, where M represents the vector list corresponding to the joint tree.

[0144] In this embodiment of the invention, when the current frame image is the first frame image, a preset limb keypoint model is used to detect the current frame image to obtain the first and second coordinates of the current frame image; when the current frame image is not the first frame image, the first and second coordinates of the previous frame image are obtained; based on the first and second coordinates, the current frame image is processed for point alignment to obtain an aligned image; based on the aligned image, the hand coordinates are obtained, and the image to be analyzed is cropped based on the hand coordinates; a preset hand keypoint model is used to detect the image to be analyzed to obtain the first keypoint coordinates and image score; based on the image score, it is determined whether the image to be analyzed is a hand image. This invention recognizes hand keypoints based on the previous frame image, reducing model overhead and computational load, reducing recognition time, and is suitable for mobile applications. At the same time, by adding two loss functions to guide the training of the hand keypoint model, the regression accuracy is improved, and the performance of the hand keypoint model is enhanced.

[0145] above Figure 3 and Figure 4 The hand key point recognition device in the embodiments of the present invention will be described in detail from the perspective of modular functional entities. The electronic device in the embodiments of the present invention will be described in detail from the perspective of hardware processing.

[0146] Figure 5 This is a schematic diagram of the structure of an electronic device 500 provided in an embodiment of the present invention. The electronic device 500 can vary significantly due to differences in configuration or performance, and may include one or more central processing units (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) for storing application programs 533 or data 532. The memory 520 and storage media 530 can be temporary or persistent storage. The program stored in the storage media 530 may include one or more modules (not shown in the diagram), each module including a series of instruction operations on the electronic device 500. Furthermore, the processor 510 may be configured to communicate with the storage media 530 and execute the series of instruction operations in the storage media 530 on the electronic device 500.

[0147] Electronic device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input / output interfaces 560, and / or one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art will understand that... Figure 5 The illustrated electronic device structure does not constitute a limitation on the electronic device and may include more or fewer components than illustrated, or combine certain components, or have different component arrangements.

[0148] The present invention also provides an electronic device, the electronic device including a memory and a processor, the memory storing computer-readable instructions, which, when executed by the processor, cause the processor to perform the steps of the hand key point recognition method in the above embodiments.

[0149] The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium, wherein the computer-readable storage medium stores instructions that, when the instructions are executed on a computer, cause the computer to perform the steps of the hand key point recognition method.

[0150] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0151] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0152] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for recognizing key points of a hand, characterized in that, The hand key point recognition method includes: S0. When the current frame image is the first frame image, the current frame image is detected using a preset limb key point model to obtain the first coordinate and the second coordinate of the current frame image; S1. When the current frame image is not the first frame image, obtain the first coordinate and the second coordinate of the previous frame image; S2. Based on the first coordinate and the second coordinate, perform point alignment processing on the current frame image to obtain an aligned image; S3. Based on the aligned image, obtain the hand coordinates, and based on the hand coordinates, extract the image to be analyzed; S4. Use a preset hand key point model to detect the image to be analyzed, and obtain the coordinates of the first key point and the image score; S5. Based on the image score, determine whether the image to be analyzed is a hand image; S6. If the image to be analyzed is not a hand image, the preset limb key point model is used to detect the current frame image to obtain the first coordinate and second coordinate of the current frame image, and steps S2-S5 are repeated. The hand keypoints consist of 21 points and 20 joints. The joint length constraint loss function of the preset hand keypoint model is as follows: ; Among them, E len Let represent the joint length constraint loss function, T represent the joint tree, j[0] represent the first hand keypoint in the j-th joint, j[1] represent the second hand keypoint in the j-th joint, and pre j[0] Pre represents the coordinates of keypoint j[0] predicted by the model. j[1] The coordinates of keypoint j[1] predicted by the model are represented by gt. j[0] The actual coordinates of the key hand point j[0] are represented by gt. j[1] The actual coordinates of the key hand point j[1] are represented.

2. The hand key point recognition method according to claim 1, characterized in that, After determining whether the image to be analyzed is a hand image based on the image score, the method further includes: If the image to be analyzed is a hand image, then based on the first key point coordinates, the second key point coordinates of the hand on the current frame image are obtained.

3. The hand key point recognition method according to claim 2, characterized in that, The step of obtaining the coordinates of the second key point of the hand on the current frame image based on the coordinates of the first key point includes: Based on the coordinates of the first key point in the image to be analyzed, the coordinates of the hand key points on the aligned image are calculated by reverse rotation. The hand key points are then restored to the current frame image by inverse rotation to obtain the coordinates of the hand key points in the current frame image. The first and second coordinates of the hand key points in the current frame image are used as the hand position in the next frame image.

4. The hand key point recognition method according to any one of claims 1-3, characterized in that, The first coordinate and the second coordinate are the coordinates of the center point of the palm and the coordinates of the wrist, respectively.

5. The hand key point recognition method according to claim 1, characterized in that, The step of performing point alignment processing on the current frame image based on the first coordinate and the second coordinate to obtain an aligned image includes: Obtain the angle between the line connecting the first and second coordinate points and the vertical direction; Calculate the coordinates of the center point between the first coordinate and the second coordinate; Construct a rotation matrix based on the included angle and the coordinates of the center point; The aligned image is calculated based on the rotation matrix and the current frame image; The rotation matrix is ​​as follows: ; Where M represents the rotation matrix, θ represents the angle between the line connecting the first and second coordinate points and the vertical direction, and c x c represents the x-coordinate of the center point. y The ordinate represents the center point.

6. The hand key point recognition method according to any one of claims 1-3, characterized in that, The hand coordinates include the coordinates of the upper left corner and the lower right corner.

7. The hand key point recognition method according to claim 6, characterized in that, The joint pre-similarity loss function of the pre-set hand keypoint model is as follows: ; Where V represents the joint vector, v i Let M[i]1-M[i]0, where M represents the vector list corresponding to the joint tree; pre v0 The pre-vector represents the prediction vector of the first keypoint in the same joint. v1 The predicted vector representing the second keypoint in the same joint; gt v0 The label vector gt represents the first keypoint in the same joint. v1 The label vector representing the second keypoint in the same joint.

8. A hand key point recognition device, characterized in that, The hand key point recognition device includes: The limb detection module is used to detect the current frame image using a preset limb key point model when the current frame image is the first frame image, and obtain the first coordinate and second coordinate of the current frame image; The acquisition module is used to acquire the first and second coordinates of the previous frame image when the current frame image is not the first frame image; The alignment module is used to perform point alignment processing on the current frame image based on the first coordinate and the second coordinate to obtain an aligned image; The screenshot module is used to obtain the hand coordinates based on the aligned image, and to capture the image to be analyzed based on the hand coordinates; The detection module is used to detect the image to be analyzed using a preset hand key point model, and obtain the coordinates of the first key point and the image score; The judgment module is used to determine whether the image to be analyzed is a hand image based on the image score; The limb detection module is also used to detect the current frame image using the preset limb key point model if the image to be analyzed is not a hand image, and to obtain the first coordinate and second coordinate of the current frame image; The hand keypoints consist of 21 points and 20 joints. The joint length constraint loss function of the preset hand keypoint model is as follows: ; Among them, E len Let represent the joint length constraint loss function, T represent the joint tree, j[0] represent the first hand keypoint in the j-th joint, j[1] represent the second hand keypoint in the j-th joint, and pre j[0] Pre represents the coordinates of keypoint j[0] predicted by the model. j[1] The coordinates of keypoint j[1] predicted by the model are represented by gt. j[0] The actual coordinates of the key hand point j[0] are represented by gt. j[1] The actual coordinates of the key hand point j[1] are represented.

9. An electronic device, characterized in that, The electronic device includes: a memory and at least one processor, wherein the memory stores instructions; The at least one processor invokes the instructions in the memory to cause the electronic device to perform the hand key point recognition method as described in any one of claims 1-7.

10. A computer-readable storage medium storing instructions thereon, characterized in that, When the instruction is executed by the processor, it implements the hand key point recognition method as described in any one of claims 1-7.