[0037] The present invention will be described in detail below with reference to the disclosure of which. The present embodiment is implemented in terms of the technical solution of the present invention, and a detailed embodiment and a specific operation process are given, but the scope of the invention is not limited to the following examples.
[0038] Such as figure 1 Down:
[0039] Professional dance evaluation method for realizing human attitude detection based on deep migration learning, as follows:
[0040] Step S1: Using the principle of depth migration learning, combined with professional dance training attitude detection model;
[0041] Step S11: Using the pre-training convolutional neural network model and source field training set, used to extract the characteristics of the image hierarchy, and realize human joint point identification;
[0042] The data set used in the present invention is a video data set, and the number of video key frames is required to detect, shear, and alignment pre-processing before performing model training. Specifically, the present invention selects the characteristics of each frame using the IMAGENET pre-trained VGG-16, and determines a key frame to obtain a video, and the shear is completed, and the present invention is based on the feature matching to achieve a feature point alignment.
[0043] The present invention selects the RESNET improvement model as the characteristic of the image hierarchy in the video of the video, and realizes the human joint point identification.
[0044]The RESNET-18 model has a total of 4 rolls, each consisting of 4 to 5 convolution layers, and each two convolution layers are connected by a SHORTCUT residual structure, and the model is finally a full connection layer. Where the convolution layer is 3 * 3 in addition to the first convolution layer 7 * 7, the first convolution layer of each block is 2, and the remaining layers are 1, ie The model is reduced by each block of the first convolution layer; the pool layer window size is 3 * 3, and the step size is 2. Improved RESNET-18 model pick frame is: 1 64-channel convolution layer → Pihua layer → 4 64 channels of convolution layer → 4 128 channels of convolution layer → 4 256 channel convolution layers → 4 512 channels of convolutional layer → global pilot layer → output to n-dimensional full connecting layer (n is the number of human attitude detection points).
[0045] Using the source domain training set, fed the improved RESNET-18 convolutional neural network model, the characteristics of the extraction of the image hierarchy were performed and the human key recognition training; at the same time, the posture detection point of the training convolutional neural network model training Pixel point continental distance with standard points as a loss function, for:
[0046] f = || x-g (x) ||
[0047] Where f represents the loss function function, X is the joint I standard attitude point position information, G (x) is the artistic attitude point location information obtained by the pre-training convolutional neural network model. The human body attitude point information is a series of multi-frame multi-dimensional coordinate information formed based on a plurality of individual parts (head, waist, shoulder, elbow, buttock, and ankle, etc.), each of the key point coordinate information is human The central point is the reference origin and the relative pole coordinate representation of the connection relationship of the human body is established. In the head, waist, shoulder, elbow, buttocks, and ankle joints (as an example of elbow joint position), the present invention takes a relative coordinate reference to other joints thereof (such as shoulder joint as an example). The original point, the plane formed by the elbow joint and the shoulder joint as an example of the coordinate surface, calculating the displacement distance, direction and angle of the elbow joint relative to the shoulder joint, indicating that the following:
[0048] x i+1 = X i (ρ, α)
[0049] Where X i Indicates the coordinate point of the joint I, ρ is the relative displacement value of the joint I + 1 relative to the joint I, and the relative angle value of the joint I + 1 relative to the joint I.
[0050] Step S12: According to the requirements of professional dance evaluation, we will perform deep migration and training, design a special humanistic posture detection model;
[0051] A deep migration learning model training will be used in combination with a large number of labeling source field training sets and a small amount of tag or professional assessment of a target domain sample, and design a special human gesture detection network model.
[0052] The process is as follows:
[0053] First, use the RESNET-18 for the initial network model training for a large number of label source training sets to obtain specific weights and offset parameters of the network;
[0054] Second, a small amount of target domain dataset is input into the pre-trained network, and the newly established convolutional neural network model is updated by network weight and offset parameters;
[0055] Finally, model learning training is performed on the obtained label target domain data set.
[0056] Step S2: The video of the demonstration dance action is collected, enters the aforementioned human body attitude detection model, obtain a reference standard for evaluation of human key data streams over time; obtaining a group of two-dimensional body that is changed over time Key coordinate information, the coordinate is a relatively coordinate of reference in the human center point.
[0057] Step S21: Cut the demonstration dance video, align the pretreatment;
[0058] The present invention will show a video key frame number detection, shear, and alignment of the video key frame. Specifically, the present invention selects the characteristics of each frame using the IMAGENET pre-trained VGG-16, and determines a key frame to obtain a video, and the shear is completed, and the present invention is based on the feature matching to achieve a feature point alignment.
[0059] Step S22: Enter the exemplary dance image into the human body attitude detection model to get the human key data.
[0060] The exemplary dance image obtained by the pre-processed is input to the human gesture detection model for testing, and the homogeneous key point is obtained.
[0061] Step S3: The homogeneous information of the dance video action to be evaluated by step S2, the similarity between the reference standard is evaluated as the degree of dance standard.
[0062] The present invention will have a depth migration learning model (i.e., a human posture detection model) and the pixel point container distance of the reference standard point as a similarity function, and calculate the evaluation score of the testist dance action by unified weighted scoring rules. .
[0063] f = || x-g (x) ||
[0064] Where F represents the similarity function, X is the joint I standard attitude point position information, G (x) is the attitude point position information obtained by the joint I by the depth migration learning model.
[0065] The step S3 is specifically:
[0066] Step S31: Waiting for the dance video to be evaluated, aligning pretreatment;
[0067] The present invention will dance with dance sounds as a time axis, which will be compared by the corresponding relationship between the treated dance and the demonstration dance of the exemplary dance. At a certain moment, the music and dance movements should have a fixed correspondence, so the audio data of the video is aligned.
[0068] Step S32: Will evaluate the dance video to enter the human body attitude detection model to get the human key data;
[0069] Step S33: The human body key data to be evaluated to be evaluated to evaluate with the human body attitude detection point data of the demonstration dance is compared by the corresponding relationship, and the degree of similarity is measured and evaluated.
[0070] The present invention will be converted to the human key data to be evaluated and demonstrative dance under the same standard, and each corresponding posture detection point is similar to the pixel point European distance of reference standard points; the total similarity in the entire dance process will be As an evaluation of the dance. Similarity calculations have reasonable degree of fault tolerance rate and can remove data points that have obvious errors.
[0071] During the specific implementation, the dance video to be evaluated by the video device (the content to be evaluated should be exactly the same appearance in the demonstration dance video), and the video outputs the human key information by training a good humanistic posture detection model. Will be evaluated to be evaluated with the key information of the human body key information of the corresponding demonstration dance, and the human key information of the evaluation of the dance is made to make standardization to make it with standard information in the same coordinate system. The audio information of the dance soundtrack is the timeline. After alignment of the key information flow of the two, the key point coordinate deviation is converted to the similar degree. Finally, the total similarity output of the entire dance process was final as an evaluation.
[0072] The present invention uses a depth migration learning principle to train human body posture detection models, improves the efficiency and accuracy of training; compared to other real-time human attitude detection models, the model in the present invention has focused on professional dance assessment evaluation. Increased the identification accuracy of all kinds of dance attitude.
[0073] The present invention has good development prospects, and its applications can expand to other related fields. If the dance evaluation system is developed to the personal side, assisting the dance professional candidates or learners according to the published dance demonstration; can also be applied to the health field, assisting patients with action barriers to rehabilitate or assist sports enthusiasts in sports exercises Adjust your body attitude; in addition, it is also applicable to the learning of human videos in human video and hard-hardware development in other related fields.