The invention provides a method and device for generating an action instruction of a virtual object model, electronic equipment and a storage medium, and relates to the field of image processing. According to the method provided by the embodiment of the invention, when the three-dimensional coordinates of the skeleton of the target unit are calculated, the two-dimensional coordinates of the multiple frames of two-dimensional action images are used for cooperative calculation. The conditions (two-dimensional coordinates of different frames) of other frames with similar shooting time are considered during calculation, therefore, the finally calculated three-dimensional coordinates of the frame of image and the three-dimensional coordinates of the adjacent image are not easy to generate coordinate mutation. Therefore, after the virtual object model control instruction is generated according to the calculated three-dimensional coordinates, when the virtual object model is driven to act through the control instruction, the action of the virtual object model is smoother and more natural. That is to say, the method provided by the invention can enable the action of the virtual object model to be closer to the action of the actual target unit to a greater extent, and improves the action restoration precision.