Human body action matching method and device, live broadcast system, computer device and medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By identifying key point locations and predicting pose type probabilities through a pre-trained human key point prediction model, the problem of complex human pose matching algorithms in live streaming is solved, achieving efficient human motion matching and gameplay expansion.

CN116597515BActive Publication Date: 2026-06-16GUANGZHOU FANGSI INFORMATION TECH CO LTD

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: GUANGZHOU FANGSI INFORMATION TECH CO LTD
Filing Date: 2023-05-18
Publication Date: 2026-06-16

Application Information

Patent Timeline

18 May 2023

Application

16 Jun 2026

Publication

CN116597515B

IPC: G06V40/20; G06V10/26; G06V10/74; G06V10/764; G06V10/82

CPC: G06V40/23; G06V10/26; G06V10/74; G06V10/764; G06V10/82; Y02D30/70

AI Tagging

Application Domain

Character and pattern recognition High level techniques

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

In existing technologies, human pose matching algorithms have a complex processing flow during live streaming, which affects the live streaming effect and limits the expansion of gameplay.

⚗Method used

A pre-trained human keypoint prediction model carrying posture analogy information is used to identify keypoint locations and predict the probability of human posture types. The action matching state is determined by the keypoint locations and posture type probabilities, reducing the processing task process.

🎯Benefits of technology

The system achieves efficient calculation of human posture matching during live streaming, increases the scope of live streaming gameplay and interactive functions, and enhances the live streaming experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116597515B_ABST

Patent Text Reader

Abstract

The application relates to a human body action matching method and device, a live broadcast system, computer equipment and a computer readable storage medium. A host image of a host end of a live broadcast room is acquired, a human body region is recognized from the host image, and a human body image is cropped out; the human body image is input into a pre-trained human body key point prediction model carrying posture analogy information to recognize key point positions and predict a human body posture type probability; and then, according to the recognized key point positions and the human body posture type probability, the matching state of a human body action in the host image and a target human body action is determined. According to the technical scheme, the matching of human body posture information can be realized while the positions of human body key points are predicted, so that the processing task flow in the live broadcast process can be reduced, the matching function of the human body action is realized with a small amount of calculation increment, the application effect of the human body posture matching algorithm in the live broadcast is improved, and the action interaction play of the live broadcast room is empowered and the expansion dimension of the live broadcast play is increased.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of live streaming technology, and in particular to a method, apparatus, live streaming system, computer equipment, and computer-readable storage medium for matching human motion. Background Technology

[0002] Live streaming, with its intuitive, immediate, and interactive content and format, has played a significant role in promoting flexible employment, boosting economic and social development, and enriching the spiritual and cultural lives of the people. For example, live streaming strengthens the circulation of goods, revitalizes the rural economy, brings people closer together, and spreads culture; the use of virtual gifts in live streaming allows streamers to better showcase their talents and abilities, thus enabling more streamers to realize their self-worth.

[0003] In live video streaming, interactive applications such as body shaping and slimming exercises and dance moves are often used. In these application scenarios, it is necessary to compare the human body movements in the streamer's image with specified postures to determine the matching status between the streamer's human body movements and the specified postures, thereby providing accurate data references for interactive activities. In the above comparison processing, human posture matching algorithms are usually used. These matching algorithms mainly use trained human keypoint prediction models to detect human keypoints, and then use the human keypoint recognition results to perform feature matching or achieve the same result through keypoint comparison post-processing.

[0004] In conventional technical solutions, after predicting the location of key points using human key point prediction models, complex feature matching or post-processing processes are often required. This results in numerous processing tasks during the live streaming process, affecting the application effect in live streaming and limiting the expansion of live streaming gameplay. Summary of the Invention

[0005] Therefore, it is necessary to provide a human motion matching method, device, live streaming system, computer equipment, and computer-readable storage medium to address the aforementioned technical problems, so as to improve the application effect of human pose algorithms in live streaming.

[0006] A human motion matching method, comprising:

[0007] Get the streamer's image uploaded by the streamer in the live broadcast room;

[0008] Identify human body regions from the broadcaster's image, and crop out human body images from the broadcaster's image based on the human body regions;

[0009] The human image is input into a pre-trained human keypoint prediction model carrying pose analogy information to identify keypoint locations and predict the probability of human pose type.

[0010] The matching status between the human body movements in the anchor image and the target human body movements is determined based on the key point locations and the probability of human body posture types.

[0011] In one embodiment, the human motion matching method further includes:

[0012] Obtain a pre-trained, converged human keypoint model, and add a pose type classification branch to the human keypoint model;

[0013] Set the human motion category of the posture type classification branch to the human motion of the preset motion library, and configure the key point coordinate set corresponding to each human motion.

[0014] By fixing the weights of the human keypoint model, the pose type classification branch is trained using the set of keypoint coordinates to obtain a human keypoint prediction model carrying pose analogy information.

[0015] In one embodiment, the structure of the pose type classification branch includes a convolutional layer, a linear layer, and a sigmoid function-normalized output layer.

[0016] In one embodiment, determining the matching status between human actions in the anchor image and target human actions based on the key point locations and the probability of human pose types includes:

[0017] Obtain the location of the first key point and the probability of the first human posture type corresponding to the human action category of the target human action;

[0018] Calculate the alignment parameters between the position of the first key point and the preset position of the second key point on the target human body movement;

[0019] The matching state between the human body movement in the anchor image and the target human body movement is determined based on the alignment state parameters and the probability of the first human body pose type.

[0020] In one embodiment, before obtaining the streamer image uploaded by the streamer's client, the method further includes:

[0021] Select a target human body action from the preset action library and play it to the broadcaster to guide the broadcaster to perform the corresponding human body action;

[0022] The step of obtaining the streamer image uploaded by the streamer in the live broadcast room includes:

[0023] Receive the streamer image uploaded by the streamer in the live broadcast room; wherein, the streamer image is a live video image containing the streamer's body image captured in real time by the streamer's camera.

[0024] In one embodiment, the human motion matching method further includes:

[0025] The basic score of the human body movement in the anchor's image is calculated based on the correspondence between the first human posture type probability and the preset category probability score value.

[0026] The first key point position is scaled to match the target human body movement size and aligned with the target human body movement; the score value is adjusted based on the number of key points that fall into the key area of the human body.

[0027] The comprehensive score of the human body movements in the anchor's image is determined based on the base score and the adjusted score.

[0028] In one embodiment, the human motion matching method further includes:

[0029] Construct a human keypoint prediction model that includes a two-level cascaded convolutional neural network;

[0030] A one-dimensional heatmap prediction branch and a two-dimensional heatmap prediction branch are added to the human body key point prediction model to supervise the model output results.

[0031] Set the model loss of the human body key point prediction model, and train the human body key point prediction model based on the model loss.

[0032] A human motion matching device, comprising:

[0033] The acquisition module is used to acquire the streamer image uploaded by the streamer in the live broadcast room;

[0034] The cropping module is used to identify human body regions from the broadcaster image and crop out human body images from the broadcaster image based on the human body regions.

[0035] The prediction module is used to input the human image into a pre-trained human keypoint prediction model carrying pose analogy information to identify keypoint locations and predict the probability of human pose type.

[0036] The matching module is used to determine the matching status between the human body action in the anchor image and the target human body action based on the key point position and the probability of human body posture type.

[0037] A live streaming system, characterized in that it includes: a broadcaster terminal, a viewer terminal, and a live streaming server; wherein the broadcaster terminal and the viewer terminal are respectively connected to the live streaming server via a network;

[0038] The live streaming server is used to collect the streamer's image from the live streaming platform, use the human motion matching method to calculate the matching status between the streamer's human motion and the target human motion, and send the live video stream to the audience.

[0039] The broadcaster terminal is used to connect to the live broadcast room, capture the broadcaster's real-time video images through the camera and upload them to the live broadcast server;

[0040] The viewer terminal is used to access the live broadcast room and receive and play the live video stream sent by the live broadcast server.

[0041] A computer device comprising:

[0042] One or more processors;

[0043] Memory;

[0044] One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications being configured to perform the steps of the human motion matching method.

[0045] A computer-readable storage medium storing at least one instruction, at least one program, a code set, or an instruction set, wherein the at least one instruction, the at least one program, the code set, or the instruction set is loaded by a processor and the steps of the human motion matching method are executed.

[0046] The aforementioned human motion matching method, device, live streaming system, computer equipment, and computer-readable storage medium first acquire the anchor image from the live streaming host's end, identify the human body region from the anchor image, and crop out the human body image; then, the human body image is input into a pre-trained human keypoint prediction model carrying posture analogy information to identify keypoint locations and predict the probability of human posture types; finally, the matching state between the human motion in the anchor image and the target human motion is determined based on the identified keypoint locations and human posture type probabilities. This technical solution, by using a pre-trained human keypoint prediction model carrying posture analogy information to identify keypoint locations and predict human posture type probabilities, can achieve human posture information matching while predicting the human keypoint locations. This reduces the processing workload during live streaming. By using keypoint locations and human posture type probabilities to calculate the matching state, the human motion matching function can be achieved with minimal computational increment, improving the application effect of human posture matching algorithms in live streaming, empowering interactive gameplay in live streaming rooms, and expanding the dimensions of live streaming gameplay. Attached Figure Description

[0047] Figure 1This is a schematic diagram illustrating an example of a live streaming business application scenario;

[0048] Figure 2 This is a flowchart of a human motion matching method according to one embodiment;

[0049] Figure 3 This is an example of a cropped human body image;

[0050] Figure 4 This is a flowchart illustrating the training method of an example human keypoint prediction model.

[0051] Figure 5 This is a schematic diagram illustrating an example of model training;

[0052] Figure 6 This is a flowchart of an example method for calculating the matching state;

[0053] Figure 7 This is a sample diagram of human body movements from a pre-set motion library;

[0054] Figure 8 This is a schematic diagram illustrating keypoint alignment as an example;

[0055] Figure 9 This is an example of a scoring diagram;

[0056] Figure 10 This is a schematic diagram of the structure of a human motion matching device according to one embodiment;

[0057] Figure 11 This is a schematic diagram of an example network live streaming system structure;

[0058] Figure 12 This is a block diagram of an example computer device. Detailed Implementation

[0059] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0060] The technical solutions provided in the embodiments of this application can be applied to, for example... Figure 1 In the application scenarios of the methods related to this application shown, Figure 1 This is a schematic diagram of an example live streaming business application scenario. The live streaming system can include a live streaming server, a broadcaster's end, and a viewer's end. The broadcaster's end and the viewer's end communicate with the live streaming server through the network, so that the broadcaster on the broadcaster's end and the viewer on the viewer's end can conduct real-time online live streaming.

[0061] In live streaming scenarios such as body shaping, weight loss, and dance interaction, it is necessary to recognize the streamer's body movements to determine the matching degree between the streamer's body movements and specified movements, providing accurate data references for live streaming gameplay. To reduce the feature matching or post-processing tasks in conventional human pose matching algorithms, and to empower and expand the dimensions of live streaming interaction gameplay, this application provides a human pose matching method, applied to a live streaming server, such as... Figure 2 As shown, Figure 2 This is a flowchart of a human motion matching method according to one embodiment, including the following steps:

[0062] Step S10: Obtain the streamer image uploaded by the streamer in the live broadcast room.

[0063] In this step, during the live broadcast, the host's terminal in the live broadcast room captures images of the host, including human figures, and uploads them to the live broadcast server.

[0064] In one embodiment, during a live broadcast, the live broadcast server can select a specific target human action from a preset action library and send a static image sequence of the target human action to the broadcaster's terminal for display, so as to guide the broadcaster to make the corresponding human action, thereby identifying the matching status between the broadcaster's human action and the target human action in the subsequent process.

[0065] For the images of the broadcaster uploaded by the broadcaster, the broadcaster's camera can capture live video images containing the broadcaster's body in real time and upload them to the live streaming server.

[0066] For example, in a live stream featuring body shaping, weight loss, or dance interaction, the live stream server can send a sequence of static images of dance moves to the streamer's device for playback, allowing the streamer to follow the sequence of dance move images and perform the same human movements.

[0067] Step S20: Identify the human body region from the broadcaster image, and crop out the human body image from the broadcaster image based on the human body region.

[0068] In this step, a human body recognition algorithm can be used to identify the human body region from the broadcaster's image, and the region can be defined by the minimum bounding rectangle. Then, the human body image can be cropped from the broadcaster's image along the minimum bounding rectangle.

[0069] like Figure 3 As shown, Figure 3 This is an example of cropping a human image. As shown in the left image of the anchor, after identifying the human body area, the human image as shown in the right image is cropped using the rectangular box in the image.

[0070] Step S30: Input the human image into a pre-trained human key point prediction model carrying pose analogy information to identify key point locations and predict the probability of human pose type.

[0071] In this step, a pre-trained human keypoint prediction model carrying pose analogy information is used to predict and recognize human images. A pose classification branch can be added to the converged human keypoint prediction model, enabling the prediction of human pose type matching probability while predicting human keypoints. By identifying keypoint locations and predicting human pose type probabilities through a pre-trained human keypoint prediction model carrying pose analogy information, it is possible to match human pose information while predicting keypoint locations, thereby reducing the processing tasks during live streaming.

[0072] In one embodiment, the training method for the above-mentioned human keypoint prediction model is as follows: Figure 4 As shown, Figure 4 This is a flowchart illustrating the training method for an example human keypoint prediction model, which may include the following:

[0073] S301, Obtain a pre-trained, converged human keypoint model, and add a pose type classification branch to the human keypoint model.

[0074] For example, a convolutional neural network (CNN) can be used for human keypoint models. The following is a training method for a human keypoint model, including the following:

[0075] (a) Construct a human keypoint prediction model containing a two-level cascaded convolutional neural network.

[0076] (b) Add a one-dimensional heatmap prediction branch and a two-dimensional heatmap prediction branch to the human body key point prediction model to supervise the model output results.

[0077] (c) Set the model loss of the human body key point prediction model, and train the human body key point prediction model according to the model loss.

[0078] Specifically, the human key prediction technology based on one-dimensional Gaussian heatmaps takes a human image as input, first trains a human key model based on a two-dimensional heatmap, then adds a one-dimensional Gaussian heatmap branch to the converged two-dimensional heatmap human key model for training, and simultaneously imposes consistency constraints on the output results of the two-dimensional Gaussian heatmap and the one-dimensional heatmap, and finally uses only the one-dimensional Gaussian heatmap for inference after training.

[0079] The training method for the human keypoint model described above reduces the learning difficulty and computational load of the human keypoint model by supervising the one-dimensional heatmap with a two-dimensional heatmap. At the same time, the design of a cascaded FPN model structure improves the accuracy and stability in large pose scenarios such as side-viewing. The length of the one-dimensional Gaussian heatmap can be increased to increase the output dimension according to the computational accuracy requirements, achieving sub-pixel level accuracy.

[0080] In one embodiment, for the pose type classification branch, its structure includes convolutional layers, linear layers, and a sigmoid function-normalized output layer, such as... Figure 5 As shown, Figure 5 This is a schematic diagram of an example model training, where the pose type classification branch is used to identify the matching probability of human actions in an input human image with a set human action category.

[0081] S302, set the human action category of the posture type classification branch to the human action of the preset action library, and configure the key point coordinate set corresponding to each human action.

[0082] Specifically, firstly, a preset action library containing multiple human actions is configured, and each human action in the preset action library is represented using a set of key point coordinates. Then, the human action category identified by the training posture type classification branch is set as the human posture type probability corresponding to each human action in the preset action library.

[0083] S303, fix the weights of the human body key point model, and use the key point coordinate set to train the posture type classification branch to obtain a human body key point prediction model carrying posture analogy information.

[0084] Specifically, during model training, each human movement is first scaled to a uniform size using a set of keypoint coordinates. h , w Then, the weights of the human keypoint model are fixed so that the gradient of the human keypoint part is zero. The pose type classification branch is trained using the keypoint coordinate set of each human action, which includes keypoint annotation information and pose category information. Figure 5 In the process, the left-hand image ① is input into the converged human keypoint prediction model, and the posture type classification branch is trained. During the model training process, only the posture type classification branch is trained, and the output is the image ② with keypoints and the matching probability of human action categories. Finally, the human keypoint prediction model carrying posture analogy information is obtained.

[0085] For example, during training, the pose type classification branch can use the cross-entropy loss function to guide parameter updates.

[0086]

[0087] In the formula, Represents the model loss function. It is annotation information. x This is the prediction result from CNN.

[0088] The technical solution of the above embodiment increases the pose type classification branch by utilizing the converged human key point model and training the pose type classification branch to obtain a human key point prediction model carrying pose analogy information. This increases the pose classification function of the model without losing the recognition accuracy of the human key point model.

[0089] Step S40: Determine the matching status between the human body movement in the anchor image and the target human body movement based on the key point positions and the probability of human body posture type.

[0090] Specifically, based on the key point locations and human posture type probabilities identified in the aforementioned steps, the matching state is calculated using the key point locations and human posture type probabilities in this step. This enables the matching function of human movements with minimal computational increment, making it easy to use in various gameplay scenarios in the live streaming room and increasing live streaming interactive functions, such as empowering gameplay such as dance movement matching and dance challenge.

[0091] In one embodiment, for the process of step S40, such as Figure 6 As shown, Figure 6 Here is a flowchart of an example method for calculating the matching state, which may include the following steps:

[0092] S401, obtain the location of the first key point and the probability of the first human posture type identified by the human action category corresponding to the target human action.

[0093] Specifically, the live streaming server selects target human movements from a preset action library, such as... Figure 7 As shown, Figure 7 This is a sample preset action library of human motion diagrams. When the anchor follows the target human motion and makes the same posture, the anchor image is input into the human key point prediction model, which can identify the location of the first key point. At the same time, through the posture type classification branch, the probability of the first human posture type between the human motion in the anchor image and the target human motion can be identified.

[0094] S402, calculate the alignment parameters between the first key point position and the preset second key point position on the target human body movement.

[0095] Specifically, by using the location of the first key point of human movement in the identified anchor image, it can be aligned with the location of the known second key point of the target human movement, thereby obtaining alignment state parameters that characterize the alignment between the two.

[0096] S403, determine the matching state between the human body action of the anchor image and the target human body action based on the alignment state parameters and the probability of the first human body pose type.

[0097] Specifically, based on the alignment state parameters and the probability of the first human pose type identified, the matching state between the human action in the anchor's image and the target human action can be calculated, such as the similarity of human pose and action, the matching degree, etc.

[0098] Compared with conventional human pose matching algorithms, the technical solutions of the above embodiments can add human action pose matching recognition function to the human key point prediction model in the live streaming scenario with negligible computational increment, thereby improving the application effect in the live streaming scenario, empowering the interactive gameplay of the live streaming room and increasing the expansion dimension of live streaming gameplay.

[0099] In one embodiment, based on the matching status between the human actions in the anchor image and the target human actions, the human actions in the anchor image can also be scored. Accordingly, the human action matching method of this application may further include the following process:

[0100] (1) Calculate the basic score of the anchor's human body movement based on the correspondence between the first human body posture type probability and the preset category probability score value.

[0101] (2) Scale the position of the first key point to match the size of the target human body movement and align it with the target human body movement. Calculate and adjust the score value based on the number of key points that fall into the key area of the human body.

[0102] (3) Determine the comprehensive score of the anchor's human body movements based on the basic score and the adjusted score.

[0103] In the above embodiment, the consistency between the anchor's human body movements and the target's human body movements can be quickly determined by the probability of the first human body posture type, thereby forming a basic score. Then, by using the alignment comparison of the first key point position, deviations in local details can be identified, thereby obtaining an adjustment score value. Finally, by using the basic score value and the adjustment score value, a comprehensive score value that can accurately reflect the matching state of human body movements can be calculated, thereby making the matching results of body shaping, slimming, dance, etc. more accurate, natural, and reasonable.

[0104] The human motion matching method provided in the above embodiments can be applied to live streaming features such as body slimming, dance motion matching, and dance challenge. The following is an application example in conjunction with the motion scoring function.

[0105] Assuming that in the scoring scheme for human motion matching, a human motion is worth a maximum of 10 points, when applied in live streaming, the live streaming server randomly selects a target human motion A from a preset motion library and gives it to the streamer user. The streamer user makes the same human motion in front of the camera. After the streamer image containing the human motion is identified by the human key point prediction model, the corresponding first human key point position coordinates p and the first human posture type probability k of the target human motion A category are obtained.

[0106] When scoring, if the probability k If the score is less than 0.5, then the base score is... s =0, end the scoring process; otherwise, base score. s =10; then calculate the adjusted score. l First, define the head and waist areas as the key areas of the human body, as shown in the box in the image. M The coordinates of the first human body key points p By scaling and aligning the head and waist keypoints of the target human body A, the coordinate p of the upper body keypoint falls within the bounding box of the target human body A. M The number of outer ones, for reference Figure 8 , Figure 8 This is an example of keypoint alignment; each is within a box. M External key point deduction l =1 point; if two key points are outside the box, deduct 2 points, and the overall score is... d = s - l Output a comprehensive score. d =8 points.

[0107] In the above application examples, such as Figure 9 As shown, Figure 9 This is an example of a scoring diagram. In a live streaming scenario, the live streaming server first selects a target human image from a preset action library and plays it to the streamer. Then, the streamer's image is captured in real time through a camera. The human body detector identifies the human body region and crops out the human body image. The human body image is then input into the human body keypoint prediction model, which outputs the keypoint location and the probability of the human body pose type. Finally, a classification box constraint is used to directly score the degree of human body action matching. It can add human body action pose matching function on the basis of human body keypoint prediction model, and the computational increment is negligible. It is particularly suitable for use in live streaming rooms, empowering live streaming gameplay.

[0108] The following describes an embodiment of the human motion matching device.

[0109] refer to Figure 10 As shown, Figure 10 This is a schematic diagram of a human motion matching device according to one embodiment, which mainly includes the following:

[0110] Module 10 is used to acquire the streamer image uploaded by the streamer in the live broadcast room.

[0111] The cropping module 20 is used to identify human body regions from the broadcaster image and crop out human body images from the broadcaster image based on the human body regions.

[0112] Prediction module 30 is used to input the human image into a pre-trained human key point prediction model carrying pose analogy information to identify key point positions and predict the probability of human pose type.

[0113] The matching module 40 is used to determine the matching status between the human body action in the anchor image and the target human body action based on the key point position and the probability of human body posture type.

[0114] The human motion matching device of this embodiment can execute a human motion matching method provided in the embodiments of this application. The implementation principle is similar. The actions performed by each module in the human motion matching device in each embodiment of this application correspond to the steps in the human motion matching method in each embodiment of this application. For detailed functional descriptions of each module of the human motion matching device, please refer to the descriptions in the corresponding human motion matching methods shown above. They will not be repeated here.

[0115] The following describes an implementation example of a live streaming system.

[0116] The live streaming system in this embodiment refers to... Figure 11 As shown, Figure 11 This is a schematic diagram of an example network live streaming system, including: the broadcaster's client, the viewer's client, and the live streaming server; wherein the broadcaster's client and the viewer's client are connected to the live streaming server via the network.

[0117] The live streaming server collects the streamer's image from the live streaming room, uses the human motion matching method described in the above embodiment to calculate the matching status between the streamer's human motion and the target human motion, and sends the live video stream to the audience. The streamer's terminal connects to the live streaming room, collects the streamer's real-time video image through the camera and uploads it to the live streaming server. The audience's terminal connects to the live streaming room and receives the live video stream sent by the live streaming server for playback.

[0118] Specifically, the streamer uses a streaming client to broadcast live in the live room, engaging in activities such as body shaping and weight loss, dance matching, and dance challenge. Viewers select and enter the live room to watch the streamer's broadcast through a viewer client. Both the viewer and streamer clients can access the live streaming platform through clients installed on computer devices. For example, the streamer and viewer clients can be computer devices such as PDAs, smartphones, tablets, desktop computers, or laptops, etc., without limitation. They can also be application software modules. The live streaming server includes a backend server that provides backend services for the computer devices, which can be implemented using a standalone server or a server cluster consisting of multiple servers.

[0119] In this embodiment, after the live streaming activity begins, the live streaming server selects a target human action and sends a video of the human action to the broadcaster for playback. The broadcaster follows the human action video, performing the same posture and action. The broadcaster's camera captures the broadcaster's image in real time and uploads it to the live streaming server. The live streaming server identifies and crops the human image from the broadcaster's image, inputs the human image into a human keypoint prediction model to predict the keypoint positions and human posture type probabilities of the broadcaster's human action. Based on the keypoint positions and human posture type probabilities, the matching status between the broadcaster's human action and the target human action can be calculated. Furthermore, the broadcaster's human action can be scored and a comprehensive score can be output. This empowers the interactive gameplay in the live streaming room and adds to the expansion dimensions of live streaming activities.

[0120] The following describes embodiments of computer devices and computer-readable storage media.

[0121] This application provides a technical solution for a computer device to implement functions related to human motion matching methods.

[0122] In one embodiment, this application provides a computer device comprising: one or more processors; a memory; and one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, and the one or more application programs are configured for the human motion matching method of any embodiment.

[0123] like Figure 12 As shown, Figure 12 This is a block diagram of an example computer device. The computer device may be a mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical device, fitness equipment, personal digital assistant, etc. The computer device 100 may include one or more of the following components: processing component 102, memory 104, power supply component 106, multimedia component 108, audio component 109, input / output (I / O) interface 112, sensor component 114, and communication component 116.

[0124] Processing component 102 typically controls the overall operation of computer device 100, such as operations associated with display, telephone calls, data communication, camera operation, and recording operation.

[0125] The memory 104 is configured to store various types of data to support the operation of the computer device 100. Such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk or optical disk.

[0126] The power supply unit 106 provides power to the various components of the computer device 100.

[0127] Multimedia component 108 includes a screen that provides an output interface between computer device 100 and user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). In some embodiments, multimedia component 108 includes a front-facing camera and / or a rear-facing camera.

[0128] The audio component 109 is configured to output and / or input audio signals.

[0129] I / O interface 112 provides an interface between processing component 102 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.

[0130] Sensor assembly 114 includes one or more sensors for providing various aspects of state assessment for computer device 100. Sensor assembly 114 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.

[0131] The communication component 116 is configured to facilitate wired or wireless communication between the computer device 100 and other devices. The computer device 100 can access wireless networks based on communication standards, such as WiFi, carrier networks (such as 2G, 3G, 4G, or 5G), or combinations thereof.

[0132] This application provides a computer-readable storage medium to implement the functions related to a human motion matching method. The computer-readable storage medium stores at least one instruction, at least one program, code set, or instruction set. The at least one instruction, at least one program, code set, or instruction set is loaded by a processor and executes the human motion matching method of any embodiment.

[0133] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0134] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties.

[0135] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0136] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A method for matching human motion, characterized in that, include: Get the streamer's image uploaded by the streamer in the live broadcast room; Identify human body regions from the broadcaster's image, and crop out human body images from the broadcaster's image based on the human body regions; The human image is input into a pre-trained human keypoint prediction model carrying pose analogy information to identify keypoint locations and predict the probability of human pose type. The training method of the human keypoint prediction model includes: obtaining a pre-trained and converged human keypoint model, and adding a pose type classification branch to the human keypoint model. The human motion category of the posture type classification branch is set to the human motion of the preset motion library, and the key point coordinate set corresponding to each human motion is configured; the weight of the human key point model is fixed, and the posture type classification branch is trained using the key point coordinate set to obtain the human key point prediction model carrying posture analogy information. The matching status between the human body movements in the anchor image and the target human body movements is determined based on the key point locations and the probability of human body posture types.

2. The human motion matching method according to claim 1, characterized in that, The structure of the pose type classification branch includes a convolutional layer, a linear layer, and a sigmoid function-normalized output layer.

3. The human motion matching method according to claim 1, characterized in that, Determining the matching status between human actions in the anchor image and target human actions based on the key point locations and the probability of human pose types includes: Obtain the location of the first key point and the probability of the first human posture type corresponding to the human action category of the target human action; Calculate the alignment parameters between the position of the first key point and the preset position of the second key point on the target human body movement; The matching state between the human body movement in the anchor image and the target human body movement is determined based on the alignment state parameters and the probability of the first human body pose type.

4. The human motion matching method according to claim 3, characterized in that, Before obtaining the streamer's image uploaded by the streamer in the live broadcast room, the process also includes: Select a target human body action from the preset action library and play it to the broadcaster to guide the broadcaster to perform the corresponding human body action; The step of obtaining the streamer image uploaded by the streamer in the live broadcast room includes: Receive the streamer image uploaded by the streamer in the live broadcast room; wherein, the streamer image is a live video image containing the streamer's body image captured in real time by the streamer's camera.

5. The human motion matching method according to claim 4, characterized in that, Also includes: The basic score of the human body movement in the anchor's image is calculated based on the correspondence between the first human posture type probability and the preset category probability score value. The first key point position is scaled to match the target human body movement size and aligned with the target human body movement; the score value is adjusted based on the number of key points that fall into the key area of the human body. The comprehensive score of the human body movements in the anchor's image is determined based on the base score and the adjusted score.

6. The human motion matching method according to any one of claims 1-5, characterized in that, Also includes: Construct a human keypoint prediction model that includes a two-level cascaded convolutional neural network; A one-dimensional heatmap prediction branch and a two-dimensional heatmap prediction branch are added to the human body key point prediction model to supervise the model output results. Set the model loss of the human body key point prediction model, and train the human body key point prediction model based on the model loss.

7. A human motion matching device, characterized in that, include: The acquisition module is used to acquire the streamer image uploaded by the streamer in the live broadcast room. The cropping module is used to identify human body regions from the broadcaster image and crop out human body images from the broadcaster image based on the human body regions. The prediction module is used to input the human image into a pre-trained human keypoint prediction model carrying pose analogy information to identify keypoint locations and predict the probability of human pose type. The training method of the human keypoint prediction model includes: obtaining a pre-trained and converged human keypoint model, and adding a posture type classification branch to the human keypoint model; setting the human action category of the posture type classification branch to the human actions in a preset action library, and configuring the keypoint coordinate set corresponding to each human action; fixing the weights of the human keypoint model, and using the keypoint coordinate set to train the posture type classification branch to obtain a human keypoint prediction model carrying posture analogy information. The matching module is used to determine the matching status between the human body action in the anchor image and the target human body action based on the key point position and the probability of human body posture type.

8. A live streaming system, characterized in that, include: The system includes a broadcaster's client, a viewer's client, and a live streaming server; wherein the broadcaster's client and the viewer's client are connected to the live streaming server via a network. The live streaming server is used to collect the anchor image of the anchor in the live streaming room, calculate the matching status of the anchor's human body movement and the target human body movement using the human body movement matching method described in any one of claims 1 to 6, and send the live video stream to the audience. The broadcaster terminal is used to connect to the live broadcast room, capture the broadcaster's real-time video images through the camera and upload them to the live broadcast server; The viewer terminal is used to access the live broadcast room and receive and play the live video stream sent by the live broadcast server.

9. A computer device, characterized in that, The computer device includes: One or more processors; Memory; One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications being configured to perform the steps of the human motion matching method according to any one of claims 1-6.

10. A computer-readable storage medium, characterized in that, The storage medium stores at least one instruction, at least one program, code set, or instruction set, wherein the at least one instruction, the at least one program, the code set, or instruction set is loaded by a processor and the steps of the human motion matching method according to any one of claims 1-6 are executed.