A method and system for rendering skeletal animation
By parallelizing skinning calculations using the GPU and reducing draw calls, the CPU pressure issue of rendering massive character models in multiplayer online real-time strategy games was resolved, improving game performance and stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU YIYU NETWORK TECH CO LTD
- Filing Date
- 2026-04-23
- Publication Date
- 2026-06-19
AI Technical Summary
In multiplayer online real-time strategy games, rendering massive amounts of character models puts excessive pressure on the CPU's skeletal calculations and data transmission, leading to frame rate fluctuations and an overload of draw calls, thus affecting game performance.
By deeply integrating GPU Instancing and GPU Skinning, the number of DrawCalls is reduced. The CPU only transmits lightweight animation parameters, and the GPU performs skinning calculations in parallel, avoiding the transmission of bone matrix arrays and CPU calculations.
It significantly reduced the CPU rendering load, decreased data transfer volume, optimized game frame rate, and maintained the integrity of the skeletal attachment function.
Smart Images

Figure CN122244265A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to real-time rendering technology in the field of computer graphics, and more particularly to an optimized method, apparatus and storage medium for batch skeletal animation rendering of massive character models in multiplayer online real-time strategy (SLG) games. Background Technology
[0002] In massively multiplayer online real-time strategy (SLG) games and similar game scenarios that require rendering a massive number of units, thousands or even tens of thousands of character models often need to be displayed on the same screen simultaneously. When rendering skeletal animation models using traditional methods, the entire rendering process is typically handled by the CPU, which performs the skinning calculations. Specifically, the CPU needs to calculate the transformation matrix for each bone in the current animation frame, then upload the bone matrix array to the GPU, where the GPU vertex shader performs the vertex position transformation based on the bone indices and weights bound to the vertices.
[0003] Therefore, the CPU needs to independently calculate the transformation matrix of dozens to hundreds of bones for each instance. When the number of instances reaches thousands, the CPU-side skinning calculation overhead increases linearly, causing the main thread to lag. Existing technologies require the transmission of complete bone matrix data for each instance. The parallel transmission of thousands of instances can easily cause PCIe bus congestion. Furthermore, each model instance typically corresponds to an independent DrawCall call. When the number of instances exceeds 3000, the number of DrawCall calls directly overloads the CPU rendering thread, causing severe frame rate fluctuations and frame drops. Summary of the Invention
[0004] The present invention aims to solve the technical problems existing in the prior art and provides a skeletal animation rendering method, system and electronic device, which can effectively reduce the pressure on the CPU from the number of DrawCalls when rendering a large number of model instances with independent skeletal animations on the same screen, while eliminating the bottleneck of skeletal calculation and data transmission bandwidth pressure on the CPU.
[0005] The embodiments of this application disclose the following technical solutions: The first aspect of this application provides a method for rendering skeletal animation, including: Obtain the animation control parameters for each model instance. The animation control parameters include the animation identifier, the current playback frame, and the world transformation matrix. The animation control parameters do not include the skeleton transformation matrix. The animation control parameters of each model instance are written into the instance data buffer, and a single drawing command is sent to the GPU. The instance data buffer is located in the GPU video memory and is configured to be writable by the CPU and readable by the GPU. In response to the single-draw instruction, the GPU samples the bone transformation matrix from the pre-stored animation matrix texture according to the animation control parameters in the instance data buffer, and performs skinning calculations in parallel on each model instance based on the sampled bone transformation matrix.
[0006] In one optional implementation, writing the animation control parameters of each model instance into the instance data buffer includes: Perform view frustum clipping on each model instance to determine the visible model instances; The animation control parameters of each visible model instance are packaged and written into a contiguous memory block; The contiguous memory block is transferred to the instance data buffer in one go.
[0007] In one optional implementation, the data structure of the animation control parameters includes: a world transformation matrix field, an animation identifier field, a current playback frame field, and a playback speed field.
[0008] In one alternative implementation, the GPU performs skinning calculations in parallel on each model instance based on the sampled skeleton transformation matrix, including: The GPU reads the animation control parameters of the corresponding model instance from the instance data buffer based on the instance identifier; The sampling coordinates of the animation matrix texture are calculated based on the animation identifier and the current playback frame, and the skeletal transformation matrix is obtained by sampling. The final position of the vertex is calculated based on the bone transformation matrix, the initial position of the vertex, and the vertex weight.
[0009] In one alternative implementation, it also includes: The performance level of the testing equipment; The precision encoding method of the animation control parameters is determined according to the performance level of the equipment, wherein high-end equipment uses full-precision floating-point encoding and low-end equipment uses half-precision floating-point encoding.
[0010] In one alternative implementation, it also includes: The CPU groups each model instance according to the mesh and animation identifier, and model instances in the same group share the same mesh and animation identifier; The write and send operations are performed on each group respectively.
[0011] In one alternative implementation, the pre-stored animation matrix texture is generated in the following manner: Calculate the transformation matrix of each bone frame by frame for the skeletal animation; The transformation matrix is encoded into RGBA color data; Write the texture in the order of animation identifier, frame index, and bone index.
[0012] In one alternative implementation, it also includes: The CPU reads the transformation data of the target bone from the sampled bone transformation matrix; The world coordinates of the skeleton attachment point are calculated based on the transformation data, the pre-stored local offset, and the world transformation matrix.
[0013] A second aspect of this application provides a rendering system for skeletal animation, comprising: The CPU is configured to acquire animation control parameters for each model instance and write them to the instance data buffer. The animation control parameters include the animation identifier, the current playback frame, and the world transformation matrix. The animation control parameters do not include the skeleton transformation matrix. The GPU is communicatively connected to the CPU. The GPU has an instance data buffer in its video memory. The GPU is configured to, in response to a single drawing instruction, sample a bone transformation matrix from a pre-stored animation matrix texture according to the animation control parameters in the instance data buffer, and perform skinning calculations in parallel on each model instance based on the sampled bone transformation matrix.
[0014] A third aspect of this application provides a computer-readable storage medium having a computer program stored thereon, characterized in that, when the computer program is executed by a processor, it implements the steps of the method described in any implementation of the first aspect of this application.
[0015] A fourth aspect of this application provides an electronic device, characterized in that it includes: a CPU, a GPU, and a memory, wherein the memory stores a computer program, and when the computer program is executed by the CPU and the GPU, it implements the steps of the method described in any implementation of the first aspect of this application.
[0016] In summary, compared with the prior art, the beneficial effects of the technical solution provided in this application include at least the following: By deeply integrating GPU Instancing and GPU Skinning, compared to traditional solutions where each instance requires a separate DrawCall, only a small number of batch DrawCalls are needed, significantly reducing rendering command overhead. The CPU no longer transmits the complete bone matrix array, but only transmits approximately 20-30 bytes of lightweight animation parameters. In scenarios with 30,000 instances, the amount of data transmitted per frame is reduced from hundreds of MB to less than 1 MB. The CPU is completely freed from bone matrix calculations, handling only lightweight business logic and parameter packaging. Through global reuse of GPU memory resources and manual management of CPU-side data structures, the garbage collection (GC) per frame is optimized from 1-2 MB to 0 KB. While implementing GPU-driven skeletal animation, an innovative dynamic update mechanism for attachment points preserves the essential bone attachment functionality required for game development. Attached Figure Description
[0017] Figure 1 This is an overall flowchart of a skeletal animation rendering method provided in an embodiment of the present invention.
[0018] Figure 2 A detailed flowchart of CPU-side parameter writing and transmission provided for embodiments of the present invention.
[0019] Figure 3 A detailed flowchart of GPU-side parallel skinning computation provided for embodiments of the present invention.
[0020] Figure 4 A flowchart of an animation matrix texture generation method provided in an embodiment of the present invention.
[0021] Figure 5 This is a structural block diagram of a skeletal animation rendering system provided in an embodiment of the present invention. Detailed Implementation
[0022] The technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments.
[0023] Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0024] This embodiment provides a method for rendering skeletal animation, suitable for scenarios such as SLG games that require rendering a large number of independent skeletal animation model instances on the same screen. Figure 1 As shown, the method includes: Step S10: Obtain the animation control parameters for each model instance.
[0025] The CPU obtains the animation control parameters for each model instance to be rendered in the current frame. These animation control parameters include three essential fields: animation identifier, current playback frame, and world transformation matrix. The animation identifier identifies the currently playing animation segment, such as idle, walking, or attacking; the current playback frame indicates the animation playback progress, supporting decimals for inter-frame interpolation; and the world transformation matrix identifies the instance's position, rotation, and scale within the scene. It is important to note that the animation control parameters do not include the bone transformation matrix; the CPU does not perform bone matrix calculations.
[0026] By traversing all model instances in the scene, the current playback frame is calculated based on the current animation state and playback speed of each instance. The world transformation matrix is calculated based on the instance's spatial position, orientation, and size within the scene. The animation identifier, the current playback frame, and the world transformation matrix are then combined into animation control parameters.
[0027] Step S20: Write the animation control parameters of each model instance into the instance data buffer and send a single draw command to the GPU.
[0028] The CPU writes the animation control parameters of each model instance obtained in step S10 into the instance data buffer. The instance data buffer is located in the GPU video memory and is configured to be writable by the CPU and readable by the GPU. After writing is completed, the CPU sends a single-draw instruction to the GPU. The single-draw instruction is an instantiation draw call, and multiple model instances can be rendered in one call.
[0029] The write operation specifically includes: determining the visibility of each model instance and writing only the animation control parameters of visible instances to the buffer; sequentially writing the animation control parameters of each visible instance to a pre-allocated memory area; and transferring the memory data to the instance data buffer in the GPU memory in one go through the mapping mechanism of the graphics API. The single draw instruction includes the vertex buffer handle, index buffer handle, instance data buffer handle, and instance count parameter of the basic Mesh.
[0030] Step S30: In response to a single drawing instruction, the GPU samples the bone transformation matrix from the pre-stored animation matrix texture according to the animation control parameters, and performs skinning calculations in parallel on each model instance based on the sampled bone transformation matrix.
[0031] The GPU receives and responds to the single-draw instruction sent in step S20. During the vertex shader stage, it reads the animation control parameters of the corresponding model instance from the instance data buffer based on the instance identifier. Based on the animation identifier in the animation control parameters and the current playback frame, it calculates the sampling coordinates of the pre-stored animation matrix texture and samples the bone transformation matrix from the texture. Based on the sampled bone transformation matrix, it performs parallel skinning calculations on all vertices of each model instance to calculate the final vertex positions.
[0032] The sampling operation queries the starting position offset of the animation in the texture based on the animation identifier. The keyframe index is determined based on the integer part of the current playing frame, and the interpolation coefficients are determined based on the fractional part. The horizontal sampling coordinates are calculated based on the bone index in the vertex attributes. The bone matrix data of the current and next frames are sampled from the texture and blended using the interpolation coefficients to obtain the final bone transformation matrix.
[0033] For each vertex, the skinning calculation obtains the indices and blended weights of its bound influence bones. For each influence bone, the bone transformation matrix is applied to the vertex's initial position, weighted according to the weights. The weighted results of each influence bone are summed to obtain the skinned vertex position. The skinned vertex position is multiplied by the world transformation matrix to obtain the final world space position, which is then transformed by the view matrix and projection matrix to output the clipping space coordinates.
[0034] In one example, the specific implementation of step 102, which writes the animation control parameters to the instance data buffer, is as follows: Figure 2 As shown, it may include the following steps: Step S201: Perform view frustum clipping on each model instance to determine the visible model instances.
[0035] The CPU performs a frustum visibility test on all model instances acquired in step S10. The frustum consists of six clipping planes of the current camera: the left plane, right plane, top plane, bottom plane, near plane, and far plane. Each model instance has an axis-aligned bounding box, defined by its minimum and maximum corner points in its world coordinate system.
[0036] The view frustum clipping method employs the separating axis theorem to calculate the projection interval of the bounding box onto the normal vector of each clipping plane. If the bounding box lies entirely outside any clipping plane, the instance is determined to be invisible. If the test passes all six planes, the instance is determined to be visible. To improve efficiency, the SIMD instruction set is used to batch process the clipping tests of multiple instances. After clipping, a set of visible model instances is obtained. Only instances in this set participate in subsequent rendering; invisible instances are not written to the instance data buffer, reducing invalid data transmission.
[0037] Step S202: Pack the animation control parameters of each visible model instance into a contiguous memory block.
[0038] The CPU pre-allocates a contiguous memory region for visible model instances, with a capacity equal to the product of the number of visible instances and the size of a single animation control parameter. The animation control parameters of each visible instance are written consecutively into this memory block in a fixed order, the fixed order being consistent with the generation order of the instance identifiers.
[0039] Step S203: Transfer the contiguous memory blocks to the instance data buffer all at once.
[0040] The CPU invokes the buffer mapping mechanism of the graphics API to transfer the contiguous memory block packaged in step S202 to the instance data buffer in the GPU's video memory in one go. The instance data buffer is located in the GPU's video memory and is configured to be writable by the CPU and readable by the GPU, supporting fast updates every frame.
[0041] On the DirectX platform, a dynamic buffer mapping mode is used to obtain a writable pointer to the GPU memory, perform a memory copy, and then unmap the buffer. On the OpenGL platform, a streaming draw buffer mapping mode is used, with the mapped buffer range in a writable state, and the mapping is unmapped after a copy. On the Vulkan platform, a temporary buffer and a transfer queue are used. The data is first written to a CPU-accessible temporary buffer, and then copied to the GPU device's local buffer via an asynchronous transfer command. This one-time transfer is performed only once per frame, and the number of transfer operations remains constant regardless of the number of visible instances, avoiding the overhead of transferring each instance individually.
[0042] In one example, the data structure for the animation control parameters includes four fields and their memory layout.
[0043] The data structure uses the same field order and memory alignment on both the CPU and GPU sides to ensure data consistency. Specifically, it includes a world transformation matrix field, an animation identifier field, a current playback frame field, and a playback speed field.
[0044] The world transformation matrix field has an offset address of zero, is a four-dimensional floating-point matrix, and occupies 64 bytes. This field stores the spatial transformation information of the model instance in the scene's world coordinate system, including three translation components, three rotation components, and three scaling components. The matrix storage order is either row-major or column-major, which must be consistent with the GPU's convention. This field is updated by the CPU every frame according to the instance's game logic state. The animation identifier field has an offset address of 64, is a 32-bit signed integer, and occupies 4 bytes. This field is an enumeration value that identifies the currently playing animation segment, such as 0 for standby, 1 for walking, 2 for attacking, and 3 for death. This field supports instance-level independent switching, allowing different model instances on the same screen to be in different animation states. The current playing frame field has an offset address of 68, is a 32-bit floating-point number, and occupies 4 bytes. This field supports decimals; the integer part represents the current keyframe index, and the decimal part represents the interpolation coefficient between two keyframes. This design supports arbitrary playback speeds and smooth animation transitions without being limited to integer frame steps. The playback speed field has an offset address of 72, a data type of 32-bit floating-point number, and occupies 4 bytes. The default value for this field is 1.0; a positive value indicates forward playback, a negative value indicates reverse playback, an absolute value greater than 1 indicates accelerated playback, and a value less than 1 indicates decelerated playback. This field supports instance-level independent control to achieve slow-motion or fast-forward effects. The data structure also includes a 4-byte reserved field with an offset address of 76 for memory alignment, ensuring the total size is a multiple of 16 to meet the alignment requirements of the GPU constant buffer. This reserved field can be reused for other control parameters.
[0045] In one example, such as Figure 3 As shown, the GPU performs skinning calculations in parallel on each model instance based on the sampled bone transformation matrix, which includes three sub-steps.
[0046] Step S401: The GPU reads the animation control parameters of the corresponding model instance from the instance data buffer based on the instance identifier.
[0047] The GPU vertex shader obtains the instance identifier of the currently processed vertex through system semantics. This identifier is automatically generated by the input assembler based on the instance data stepping frequency. Using this instance identifier as an index, the animation control parameter field values of the corresponding model instance are read from the structured buffer, including the world transformation matrix, animation identifier, current playback frame, and playback speed.
[0048] Step S402: Calculate the sampling coordinates of the animation matrix texture based on the animation identifier and the current playback frame, and obtain the bone transformation matrix by sampling.
[0049] The GPU calculates the sampling coordinates of the pre-stored animation matrix texture based on the animation identifier read in step S401 and the current playback frame. Specifically, the calculation includes: querying metadata based on the animation identifier to obtain the starting frame offset and total number of frames for the animation in the texture; performing loop processing on the current playback frame to obtain the normalized frame index and interpolation coefficients; calculating the starting positions of the current keyframe and the next keyframe in the vertical direction of the texture; calculating the horizontal sampling coordinates for each influencing bone in the vertex attributes based on the bone index; sampling the bone matrix encoding data of the current frame and the next frame from the texture, and performing component-level linear interpolation using the interpolation coefficients to obtain the final bone transformation matrix.
[0050] Step S403: Calculate the final position of the vertex based on the bone transformation matrix, the initial position of the vertex, and the vertex weight.
[0051] The GPU performs standard skinning calculations. For each vertex, the indices and blend weights of its bound influence bones are obtained from the vertex attributes. The blend weights are normalized floating-point numbers, and the sum of the weights is one. For each influence bone, if its weight is greater than zero, the bone transformation matrix sampled in step S402 is applied to the initial vertex position, i.e., matrix-vector multiplication is performed to obtain the transformed position, which is then multiplied by the blend weights for weighting. The weighted positions of each influence bone are summed to obtain the skinned vertex position. The skinned vertex position is multiplied by the world transformation matrix read in step S401, and then the observation matrix and projection matrix are concatenated to output the final clipping space coordinates.
[0052] One example includes sub-steps for device performance testing and accuracy coding method determination.
[0053] Step S501: Test the performance level of the equipment.
[0054] During application startup, system information is queried to determine the device's performance level. Key indicators include GPU adapter model and dedicated video memory capacity; GPU computing power, estimated through benchmark testing; number of CPU logical processors; and system physical memory capacity.
[0055] Based on the above indicators, the devices are divided into three tiers. High-end devices have a GPU with 4 GB or more of video memory, a GPU computing power of 1.5 trillion floating-point operations per second or more, a CPU with 8 or more cores, and system memory of 8 GB or more. Mid-range devices have a GPU with 2-4 GB of video memory, a GPU computing power of 0.5 to 1.5 trillion floating-point operations per second, a CPU with 4-8 cores, and system memory of 4-8 GB.
[0056] Step S502: Determine the precision encoding method of the animation control parameters according to the equipment performance level.
[0057] Based on the device class determined in step S501, the precision encoding method for the animation control parameters is selected. For high-end devices, the world transformation matrix uses a full-precision 4D floating-point matrix, the animation identifier uses a 32-bit signed integer, and the current playback frame and playback speed use full-precision 32-bit floating-point numbers. For mid-range devices, the world transformation matrix remains full-precision, while the current playback frame and playback speed are reduced to half-precision 16-bit floating-point numbers. For low-end devices, the world transformation matrix is compressed into a float3x4 format, the last row is omitted and assumed to be homogeneous coordinates, and all floating-point number segments use half-precision 16-bit floating-point numbers. Simultaneously, the format of the animation matrix texture is determined: high-end devices use RGBA32F, 32-bit floating-point per channel; mid-range and low-end devices use RGBA16F, 16-bit floating-point per channel.
[0058] In one example, grouping model instances based on mesh and animation identifiers and performing write and send operations on each group includes the following sub-steps.
[0059] Step S601: The CPU groups each model instance according to the mesh and animation identifier.
[0060] The CPU calculates a grouping key for each model instance, which is generated by combining the mesh identifier, material identifier, and animation identifier. Model instances in the same group share the same mesh, material, and animation identifier, ensuring that they can share the same rendering resources and draw calls. A hash table is used to store the groups, with the key being the grouping key and the value being a batch object. All model instances are iterated through, their grouping key is calculated, and the hash table is checked to see if the key exists. If it does not exist, a new batch object is created, and the instance is added to the list of the corresponding batch.
[0061] Step S602: Perform write and send operations on each group respectively.
[0062] For each batch, view frustum clipping, data packing, and one-time transfer operations are performed separately. Specifically, for each visible instance within a batch, its animation control parameters are packed and written into a contiguous memory block corresponding to that batch, transferred in one go to the corresponding offset position in the GPU instance data buffer, and a single draw command for that batch is sent to the GPU. Each batch corresponds to one draw command call. For example, by dividing tens of thousands of model instances on the same screen into dozens of batches based on mesh and animation type, the number of DrawCalls is reduced from tens of thousands to tens.
[0063] In one example, such as Figure 4 As shown, the generation method of the pre-stored animation matrix texture includes the following sub-steps.
[0064] Step S701: Calculate the transformation matrix of each bone frame by frame for the skeletal animation.
[0065] For each animation segment, discrete sampling is performed at a fixed frame rate. For each sampled frame, the skeletal hierarchy is traversed, and the local transformation matrix of each bone is recursively calculated from the root bone to the leaf bones. This local transformation matrix is then concatenated with the world transformation matrix of the parent bone to obtain the transformation matrix of that bone relative to the root node of the model.
[0066] Step S702: Encode the transformation matrix into RGBA color data.
[0067] The 4D transformation matrix calculated in step S701 is decomposed into rotation quaternions, scaling vectors, and translation vectors. The four components of the rotation quaternions, the three components of the scaling vectors, and the three components of the translation vectors—a total of 10 floating-point numbers—are packaged into 2 or 3 RGBA pixels. Specifically, the first RGBA pixel stores the rotation quaternion; the second RGBA pixel stores the scaling vector in its RGB channel and the X component of the translation vector in its A channel; and the third RGBA pixel stores the Y and Z components of the translation vector in its RG channel.
[0068] Step S703: Write the texture in the order of animation identifier, frame index, and bone index. The encoded matrix data is written to a 2D texture according to a 3D logical arrangement. The first dimension is the animation identifier, the second dimension is the frame index, and the third dimension is the bone index. The coordinates mapped to the 2D texture are as follows: the horizontal coordinate is the bone index modulo the number of matrices in each row of the texture; the vertical coordinate is the animation identifier multiplied by the total number of frames per animation plus the frame index, then multiplied by the number of rows occupied per frame, and finally added to the quotient of the bone index divided by the number of matrices in each row. If the total height exceeds the maximum texture size of the GPU, it is split into multiple textures.
[0069] In one example, dynamic updates to bone attachment points include the following sub-steps.
[0070] Step S801: The CPU samples the transformation data of the target bone from the pre-stored animation matrix texture according to the animation control parameters.
[0071] The CPU maintains a queue of bone attachment points that require real-time world coordinates. Each attachment point has a predefined bound bone index and local offset. For each request in the queue, the CPU obtains the animation identifier and current playback frame of its target instance, executes the same animation matrix sampling logic as the GPU, calculates the texture sampling coordinates based on the animation identifier and current playback frame, reads the matrix-encoded data of the target bone from the animation matrix texture, decodes and interpolates to obtain the bone transformation matrix.
[0072] If the CPU cannot directly read the GPU texture, then CPU-readable image data is generated simultaneously during the offline phase to avoid data backhaul at runtime.
[0073] Step S802: Calculate the world coordinates of the skeleton attachment point based on the transformation data, the pre-stored local offset, and the world transformation matrix.
[0074] The skeletal transformation matrix sampled in step S801 is applied to the pre-stored local offset vector, i.e., a matrix-vector multiplication is performed to obtain the position of the attachment point in skeletal space. This position is then multiplied by the instance's world transformation matrix to obtain the attachment point's position coordinates in world space. The calculation result is written to the attachment point's world coordinate cache for use by the effects system, weapon system, etc.
[0075] This invention also provides a rendering system for skeletal animation. For example... Figure 5 As shown, it includes a CPU module, a GPU module, and a communication bus.
[0076] The CPU module is configured to acquire animation control parameters for each model instance and write them to an instance data buffer. The animation control parameters include an animation identifier, the current playback frame, and a world transformation matrix, but do not include a bone transformation matrix. The CPU module is also configured to send a single-draw instruction to the GPU. The GPU module is communicatively connected to the CPU module. The GPU module's video memory contains the instance data buffer, which is configured to be writable by the CPU and readable by the GPU. In response to the single-draw instruction, the GPU module samples a bone transformation matrix from a pre-stored animation matrix texture based on the animation control parameters in the instance data buffer, and performs parallel skinning calculations on each model instance based on the sampled bone transformation matrix. The communication bus connects the CPU module and the GPU module. At the physical layer, it is a PCI Express bus; at the logical layer, it is a command buffer mechanism for the graphics API, used to transmit instance data, drawing instructions, and synchronization signals.
[0077] This embodiment provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described in any one of Embodiments 1 to 8.
[0078] The computer-readable storage media include, but are not limited to: magnetic storage media, such as hard disks, floppy disks, and magnetic tapes; optical storage media, such as optical discs and digital universal optical discs; semiconductor storage media, such as solid-state drives, flash memory cards, and universal serial bus storage devices; and any other physical media having non-volatile or volatile storage characteristics.
[0079] The computer program contains a sequence of instructions that, when executed by the processor of an electronic device, implement a skeletal animation rendering method as described above.
[0080] This embodiment provides an electronic device. The processor includes one or more central processing unit cores, as well as associated cache, memory controller, and input / output controller. The processor is configured to execute the CPU-side steps in Embodiment 1, including animation state updates, view frustum clipping, parameter generation, data packaging, and batch transmission, while simultaneously executing general computing tasks such as operating system, game logic engine, physics engine, and artificial intelligence system. The processor is preferably a multi-core x86-64 architecture processor or an ARM architecture processor. The graphics processor is connected to the processor via a PCI Express bus and includes one or more graphics processing unit cores, dedicated video memory, rasterization unit, texture sampling unit, and rendering output unit. The graphics processor is configured to execute the GPU-side steps in Embodiment 1, including instance parameter reading, animation matrix sampling, skinning calculation, spatial transformation, and pixel processing. The graphics processor is preferably a discrete or integrated graphics card supporting DirectX 11 or higher, OpenGL ES 3.0 or higher, or Vulkan 1.0 or higher standards. The memory includes system memory and non-volatile storage. The system memory is dynamic random access memory used to store runtime data, application code, and the operating system kernel. Non-volatile storage, such as solid-state drives (SSDs) or hard disk drives (HDDs), is used for persistent storage of the operating system, applications, game resource files, and computer program instructions. The display device is connected to the graphics processor via a digital display interface (DPI) for displaying rendered output images. The display device can be a liquid crystal display (LCD), an organic light-emitting diode (OLED) display, a curved display, or a high refresh rate gaming monitor; in virtual reality scenarios, it can be a head-mounted display. The input device is connected to the processor via a universal serial bus (USB) or wirelessly for receiving user input commands. Input devices include keyboards, mice, touchpads, touchscreens, game controllers, motion controllers, etc., and in mobile devices, they can be sensors such as accelerometers and gyroscopes.
[0081] Computer programs are stored in non-volatile memory areas of the memory and loaded into system memory when the application starts. They are then executed collaboratively by the processor and graphics processor to implement any of the skeletal animation rendering methods described above.
[0082] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0083] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.
Claims
1. A method for rendering skeletal animation, characterized in that, include: Obtain the animation control parameters for each model instance. The animation control parameters include the animation identifier, the current playback frame, and the world transformation matrix. The animation control parameters do not include the skeleton transformation matrix. The animation control parameters of each model instance are written into the instance data buffer, and a single drawing command is sent to the GPU. The instance data buffer is located in the GPU video memory and is configured to be writable by the CPU and readable by the GPU. In response to the single-draw instruction, the GPU samples the bone transformation matrix from the pre-stored animation matrix texture according to the animation control parameters in the instance data buffer, and performs skinning calculations in parallel on each model instance based on the sampled bone transformation matrix.
2. The method according to claim 1, characterized in that, The step of writing the animation control parameters of each model instance into the instance data buffer includes: Perform view frustum clipping on each model instance to determine the visible model instances; The animation control parameters of each visible model instance are packaged and written into a contiguous memory block; The contiguous memory block is transferred to the instance data buffer in one go.
3. The method according to claim 1, characterized in that, The animation control parameters include the world transformation matrix field, the animation identifier field, the current playback frame field, and the playback speed field.
4. The method according to claim 1, characterized in that, The GPU performs skinning calculations in parallel on each model instance based on the sampled skeleton transformation matrix, including: The GPU reads the animation control parameters of the corresponding model instance from the instance data buffer based on the instance identifier; The sampling coordinates of the animation matrix texture are calculated based on the animation identifier and the current playback frame, and the skeletal transformation matrix is obtained by sampling. The final position of the vertex is calculated based on the bone transformation matrix, the initial position of the vertex, and the vertex weight.
5. The method according to claim 1, characterized in that, Also includes: The performance level of the testing equipment; The precision encoding method of the animation control parameters is determined according to the performance level of the equipment, wherein high-end equipment uses full-precision floating-point encoding and low-end equipment uses half-precision floating-point encoding.
6. The method according to claim 3, characterized in that, Also includes: The CPU groups each model instance according to the mesh and animation identifier, and model instances in the same group share the same mesh and animation identifier; The write and send operations are performed on each group respectively.
7. The method according to claim 1, characterized in that, The pre-stored animation matrix texture is generated in the following way: Calculate the transformation matrix of each bone frame by frame for the skeletal animation; The transformation matrix is encoded into RGBA color data; Write the texture in the order of animation identifier, frame index, and bone index.
8. The method according to claim 1, characterized in that, Also includes: The CPU reads the transformation data of the target bone from the sampled bone transformation matrix; The world coordinates of the skeleton attachment point are calculated based on the transformation data, the pre-stored local offset, and the world transformation matrix.
9. A rendering system for skeletal animation, characterized in that, include: The CPU is configured to acquire animation control parameters for each model instance and write them to the instance data buffer. The animation control parameters include the animation identifier, the current playback frame, and the world transformation matrix. The animation control parameters do not include the skeleton transformation matrix. The GPU is communicatively connected to the CPU. The GPU has an instance data buffer in its video memory. The GPU is configured to, in response to a single drawing instruction, sample a bone transformation matrix from a pre-stored animation matrix texture according to the animation control parameters in the instance data buffer, and perform skinning calculations in parallel on each model instance based on the sampled bone transformation matrix.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1-8.
11. An electronic device, characterized in that, include: A CPU, a GPU, and a memory, wherein the memory stores a computer program that, when executed by the CPU and the GPU, implements the method as described in any one of claims 1-8.