A three-dimensional animation automatic production method and system based on a three-dimensional engine double-channel permission isolation

By using a dual-channel permission isolation system for the 3D engine, the mutual exclusion problem between skeletal animation and shape key-driven animation was solved, enabling fully automated production across the entire chain. This addressed the compliant mass production needs of short video creators and improved the reuse rate and production efficiency of animation resources.

CN122199750APending Publication Date: 2026-06-12覃和平

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
覃和平
Filing Date
2026-03-16
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies cannot achieve low-cost, high-efficiency, and compliant automated production of 3D animation. In particular, the underlying mutual exclusion problem between skeletal animation and form key-driven animation has not been resolved, resulting in high barriers to entry, fragmented processes, poor resource adaptability, and high risk of infringement in traditional 3D animation production, making it unsuitable for the mass production needs of short video creators.

Method used

A dual-channel permission isolation system based on a 3D engine is constructed. Through an integrated universal skeleton binding system and a full-link homogeneous semantic tag system, the independent driving and synchronous operation of skeletal animation and shape keys are realized. Combined with a semantic animation resource library and automated shooting script generation, a fully unmanned closed loop from content input to final output is established.

🎯Benefits of technology

It achieves fully automatic synchronization of body movements, lip movements, and facial expressions, improves the reuse rate of animation resources, ensures high compliance of generated content, increases the production efficiency of single-minute animation by more than 70 times, lowers the production threshold, and adapts to the mass production needs of ordinary creators.

✦ Generated by Eureka AI based on patent content.
Patent Text Reader

Abstract

The application discloses a kind of based on three-dimensional engine double-channel authority isolation 3D animation automation production method and system, belong to computer animation, artificial intelligence content generation, three-dimensional engine application technical field.The integrated general skeleton binding system with double-channel authority mutual exclusion lock mechanism is pre-constructed, the execution pipeline of skeleton animation and form key driven channel is completely decoupled;Unified semantic label ontology system of all-link homology is constructed, the cross-link standardized unified mapping of script, action library, engine execution is realized;The semantic 3D action library based on topological tree matching is constructed, and the original script is translated into the automated shooting script executable by engine;Through double independent channel stage execution, frame-level synchronous compensation mechanism, realize the interference-free synchronous driving of skeleton animation and form key, finally complete the whole-process automated shooting output.The application solves the industry pain points that skeleton animation and form key driven mutual exclusion must be manually corrected in the prior art from bottom, realizes the full-process unmanned, industrialized mass production of 3D animation, and greatly reduces production threshold.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the fields of computer animation technology, artificial intelligence content generation, and 3D engine application technology. Specifically, it relates to an automated 3D animation production method and supporting system based on dual-channel permission isolation for short video mass production scenarios. Background Technology

[0002] With the rapid development of the short video industry, content creators have an increasingly urgent need for low-cost, high-efficiency, and compliant original content production solutions. However, existing technologies have the following core defects that cannot be solved, and the industry deadlock of "compliance and mass production cannot be achieved at the same time" in short video content production has always been unable to break through.

[0003] First, short video content production faces a core contradiction between compliance and mass production. Live-action content is costly and time-consuming; shooting, editing, and coordinating actors for a single short video requires significant human and material resources, making large-scale production impossible. Meanwhile, content plagiarism, re-editing, and remixing pose serious copyright risks, easily leading to platform plagiarism checks, traffic restrictions, and account penalties. Even if short-term traffic is gained, it cannot pass the platform's second review, making it unsustainable. Existing solutions only modify the original content superficially through cropping, adding filters, and audio changing, failing to achieve fundamental filtering of copyrighted content. Such content will still be detected by platform plagiarism checks, making it impossible for ordinary creators without professional teams to achieve stable original content production.

[0004] Secondly, traditional 3D animation production has extremely high barriers to entry, making it unsuitable for the core needs of short video creators. The traditional 3D animation production process is fragmented, requiring collaboration among professionals such as modelers, rigging specialists, animators, renderers, and editors. The production cost of a single minute of animation can reach tens of thousands of yuan, with production cycles lasting several days or even weeks. This is completely incompatible with the low-cost, fast-paced, and mass production needs of short video creators, making it impossible for ordinary creators without specialized skills to enter the market.

[0005] Third, existing 3D animation technology suffers from a core driver conflict flaw, making fully automated production impossible. In existing 3D animation tools and game engines, there is a common underlying control mutual exclusion problem between the playback of character skeletal animation and the driving of lip movements and facial expressions: the playback logic of skeletal animation modifies the model mesh vertices through skinning calculations, locking or overriding vertex deformation permissions for morphological keys. This prevents the character from simultaneously executing limb movements and lip movements / expressions, resulting in lip movements / expressions failing during limb movement execution. Solving this problem requires manual frame-by-frame correction by professional animators, which can take hours to correct a single minute of animation, making automated production impossible and unsuitable for batch, assembly-line production demands. Existing technologies can only achieve layered playback and temporal synchronization of animation and expressions, never addressing the core mutual exclusion problem at the underlying pipeline and permission control level. This is a long-standing and unresolved technical pain point in the industry.

[0006] Fourth, existing animation resources suffer from poor adaptability and extremely low reuse rates, making it impossible to form a standardized production pipeline. Existing general-purpose animation resource libraries only support one-way animation resource adaptation and cannot achieve reverse compatibility between custom characters with facial expression keys and general-purpose animation resources. Existing third-party automatic binding tools have low accuracy in recognizing non-standard skeletons, easily leading to mapping misalignment, weight loss, and damage to the form key structure, requiring manual correction. At the same time, existing animation resources lack standardized end-to-end semantic annotation, making it impossible to automatically match with script content. Manual parameter tuning and frame-by-frame adjustments are required, resulting in an animation resource reuse rate of less than 10%, making it impossible to form a standardized mass production pipeline.

[0007] Fifth, existing AI content generation tools suffer from severely fragmented capabilities, failing to achieve an end-to-end closed loop. Current AI tools can only generate content in single stages, such as AI scriptwriting, AI voice-over, AI image generation, and AI motion capture, failing to establish a complete chain from "content input - compliant original reconstruction - automated animation-engine rendering - final output." Furthermore, existing technologies have never built a unified semantic tagging system across the entire chain, making it impossible to directly translate natural language content requirements into machine-executable production instructions. Scriptwriting, shot design, animation execution, and post-production editing are completely disconnected, hindering end-to-end automated production and failing to address the core compliant mass production needs of short video creators.

[0008] In summary, existing technologies have consistently failed to provide a complete 3D animation production solution that is "low-threshold, fully automated, compliant, and mass-producible" to address the core pain points of short video creators. In particular, they have failed to resolve the industry's core pain point of the mutual exclusion between skeletal animation and form key-driven underlying technology. This is the core problem that this invention aims to solve. Summary of the Invention

[0009] To address the aforementioned shortcomings of existing technologies, the purpose of this invention is to provide a complete, low-barrier-to-entry, compliant, and mass-producible 3D [technology / technology]. The core objectives of this automated animation production solution include: First, resolving the underlying mutual exclusion problem between skeletal animation and form key-driven animation in existing technologies by addressing the underlying pipeline and access control of the 3D engine. This enables fully automated, high-precision synchronous operation of body movements, lip movements, and facial expressions, eliminating the need for manual correction and laying the core technological foundation for fully automated production. Second, constructing a unified semantic tag ontology system with consistent origins across the entire chain, breaking down the barriers between content creation and production execution. This allows for the direct conversion from natural language content requirements to executable instructions from the 3D engine, creating a fully automated closed loop from content input to final output. Third, achieving the filtering and isolation of copyrighted content at the technological level, completely eliminating the infringement risks in short video content production. The generated content fully complies with the platform's original content rules, addressing the core survival pain point for short video creators. Fourth, solving the industry problem of difficult reverse adaptation and low reuse rate of animation resources. This achieves seamless compatibility between custom characters with facial expressions and form keys and general animation resources, significantly improving the reuse rate of animation resources and forming a standardized mass production pipeline. Fifth, significantly reducing 3D... The animation production threshold is low, requiring no professional animation skills, directing ability, or collaboration from multiple personnel. Ordinary creators can achieve mass production and industrialization of 3D animated short videos.

[0010] To achieve the above-mentioned objectives, the technical solution adopted by the present invention is as follows: An automated 3D animation production method based on dual-channel permission isolation of a 3D engine includes the following steps: S1: Pre-build an integrated universal skeleton binding system compatible with skeletal animation and shape key driving, formulate unified skeleton hierarchy rules and shape key driving specifications, configure exclusive modification permissions for skeleton node spatial transformation for the skeletal animation control channel and permanently disable its model mesh vertex modification permissions, configure exclusive modification permissions for model mesh vertex deformation for the shape key driving channel and permanently disable its skeleton node modification permissions, so as to achieve complete decoupling of the execution pipelines of the two channels and mutual exclusion lock of modification permissions; S2: Construct a unified semantic tag ontology system with the same source across the entire chain. The tag system includes six dimensions: character tags, action tags, emotion tags, scene tags, shot tags, and time sequence tags. It realizes standardized and unified mapping of script semantics, action library semantics, and engine execution semantics across links, and provides a unified semantic benchmark for full-process action matching, script translation, and synchronous driving. S3: Construct a semantically searchable standardized 3D motion library, perform standardized preprocessing and redirection of 3D animation resources based on the skeletal topology tree structure, and complete the full-dimensional semantic annotation of animation clips based on the unified semantic tag ontology system; S4: Based on the unified semantic tag ontology system, the original script is broken down into standardized semantic tags and translated into an automated shooting script that can be directly executed by the 3D engine. The original script is either directly input by the user or automatically generated by the system. S5: Interference-free driving of skeletal animation and form key synchronization based on dual independent channels: The skeletal animation control channel is executed during the animation update phase of the 3D engine, driving only the spatial transformation of the character's skeletal nodes; the form key driving channel is executed during the mesh vertex update phase of the 3D engine, driving only the mesh vertex deformation of the character's lip-sync and facial expression form keys; the two channels achieve frame-level timing synchronization based on a globally unified timeline, verifying timing deviations and performing frame-level compensation corrections every frame, and using a permission mutual exclusion locking mechanism to prevent interference with each other's exclusive modification permissions; S6: Standardized processing of cross-platform animation data; S7: The 3D engine, based on the automated shooting script, completes the entire process of automated shooting and final output.

[0011] Furthermore, in step S1, the integrated universal skeleton binding system has established unified bone node naming rules, spatial axis standards, skinning weight system, as well as standardized morphological key naming specifications, driving logic and Chinese phoneme mapping table, which are compatible with mainstream BVH and FBX 3D animation resource formats, and realize one-time binding for universal use throughout the entire process.

[0012] Furthermore, in step S1, the dual-channel modification permission mutual exclusion locking mechanism is implemented as follows: In the 3D engine animation pipeline, the skeletal animation control channel is locked to only affect the bone Transform node through animation layer masking, completely disabling its permission to modify vertices of the skinned mesh renderer; the shape key drive channel runs independently in the engine's LateUpdate vertex update stage by executing a script, directly modifying the shape key weight values ​​without going through the skeletal animation skinning calculation pipeline; through a custom animation pipeline callback function, all modification requests from the skeletal animation control channel to the model mesh vertices are intercepted, while all modification requests from the shape key drive channel to the bone nodes are also intercepted.

[0013] Furthermore, in step S2, the dimensions of the unified semantic tag ontology system are specifically as follows: Character tags include character ID, gender, character design, and suitability type tags; Action tags include basic action type, interaction attributes, and duration range tags; Emotion tags include emotion type and intensity level tags; Scene tags include scene type, spatial environment, and lighting attribute tags; Shot tags include shot size, camera movement logic, and shooting angle tags; and Timing tags include dialogue timing, action timing, and shot transition timing tags. The unified mapping rule across all stages is as follows: the semantic tags generated from script splitting use the same dimensions and encoding rules as the semantic tags in the action library; the action requirements and emotional attributes in the script can be directly and accurately matched with the action library; shot tags and timing tags are directly mapped to engine-executable shot scheduling and timing control parameters, achieving seamless conversion from script semantics to engine execution instructions.

[0014] Further, step S3 specifically involves: performing standardized preprocessing on various types of 3D animation resources, uniformly mapping them to the integrated universal skeleton binding system, and generating reusable standardized animation clips; performing semantic recognition and annotation on the animation clips through an artificial intelligence model, generating semantic tags of corresponding dimensions based on the unified semantic tag ontology system, and forming a standardized 3D motion library that can be automatically retrieved and called.

[0015] Furthermore, the standardized preprocessing of the 3D animation resources includes automatic and accurate identification and mapping based on skeletal topology and hierarchical relationships. Specifically, using the root node Hips as the anchor point, the input custom character skeletal binding system is automatically identified without manual annotation, and a skeletal parent-child hierarchical topology tree corresponding to the general standard skeletal system is constructed. Based on the hierarchical structure similarity of the topology tree, a one-to-one accurate mapping with the general standard skeletal system is completed. The calculation formula for the hierarchical structure similarity is: similarity = (number of hierarchical matching nodes × 0.7 + number of axial matching nodes × 0.3) / total number of core nodes, where the core nodes are 19 key skeletal nodes of the spine and limbs, and the preset matching threshold is 90%. Nodes with a matching degree exceeding the preset threshold are mapped one-to-one, and terminal nodes with insufficient matching degree are smoothly transitioned by spherical linear interpolation. Skeletal redirection and axial calibration are automatically completed, and the original morphological key-driven structure and permissions of the character are completely preserved during the redirection process.

[0016] Furthermore, prior to step S4, an original script generation step is included, supporting two input modes: reference video and text commands. For the reference video input mode, the reference video undergoes audio-visual separation, voice extraction, and subtitle recognition to generate dialogue text corresponding to the timestamp. The video footage and dialogue text are jointly deconstructed using a multimodal large model, and standardized narrative metadata is extracted based on a unified semantic tag ontology system. Only the narrative logic and temporal structure in the narrative metadata are retained, while the original video's frame data, audio waveforms, specific shot parameters, and art elements are completely discarded, completing the filtering and isolation process. The narrative logic and temporal structure are then fully adapted using a large language model to generate a completely new original script. For the text command input mode, the user inputs a plot outline / text command, and the large language model extracts standardized narrative metadata to directly generate a complete original script that matches the temporal structure, character interactions, and shot arrangement rules.

[0017] Furthermore, the standardized narrative metadata includes the number of characters and their character settings, dialogue sequence, action requirements, emotional attributes, shot size, camera movement logic, single shot duration, character positioning and interaction relationships.

[0018] Furthermore, in step S5, the specific logic of the frame-level compensation correction is as follows: when the execution timing deviation of the two channels exceeds the duration of a single frame, the playback rate of the single frame of the skeletal animation sequence is linearly fine-tuned based on the audio timestamp, and the timestamps of the morphic key driving parameters are aligned at the frame level to ensure that the skeletal transformation data and the morphic key weight data are completely matched within a single frame without any misalignment deviation.

[0019] Further, step S6 specifically involves: standardizing and cleaning the matched skeletal animation data and morphological key driving parameters, performing format conversion and redirection verification, removing invalid and redundant data, correcting skeletal axial deviations, and generating standardized resource files that can be dynamically loaded by the 3D engine.

[0020] Further, step S7 specifically involves: pre-building a modular animation studio in the 3D engine, including reusable scenes, lights, and shot templates; the 3D engine reading the automated shooting script and automatically completing the entire process of scene initialization, character loading, animation and form key synchronization, shot switching, and audio-visual synchronization, and automatically completing real-time recording; outputting the recorded video footage, and simultaneously outputting a JSON format metadata file that corresponds one-to-one with the video sequence, wherein the metadata file contains the sequence, character, dialogue, and action parameters for each shot; completing automated post-processing based on the metadata file, and finally outputting an original 3D animation film that can be directly published.

[0021] Corresponding to the above method, this invention also provides a 3D animation automated production system based on dual-channel permission isolation of a 3D engine, including: a binding system management module for managing an integrated universal skeletal binding system compatible with skeletal animation and shape key driven systems, configuring exclusive modification permissions and mutual exclusion lock-in rules for the two channels; a tag system management module for managing a unified semantic tag ontology system with consistent origins across the entire chain, configuring standardized mapping rules for tags in each dimension; and a basic asset management module for managing the integrated universal skeletal binding model and semantic 3D... The system includes: an animation library, scene assets, and character models; a content deconstruction and original script generation module for multimodal deconstruction of input content, filtering and isolating copyrighted content, and generating original scripts; a script generation module for breaking down original scripts into standardized semantic tags and translating them into automated shooting scripts that can be directly executed by the 3D engine; an animation and drive matching module for matching the animation library, generating skeletal animation sequences, matching phonemes, and generating morphic key drive parameters based on semantic tags; a dual-channel synchronization control module for driving the skeletal animation and morphic key deformation of characters through two completely isolated independent control channels, achieving frame-level timing synchronization and deviation compensation based on a globally unified timeline; a cross-platform data processing module for animation data cleaning, bone redirection, and format standardization conversion; and an engine automated rendering module for achieving automated script execution, fully unmanned shooting, asset and metadata output, and automated post-production compositing based on the 3D engine.

[0022] Furthermore, the system also includes a user-defined material management module for importing and managing user-defined character models, scene templates, and voice-over materials.

[0023] Meanwhile, the present invention also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the above-mentioned automated 3D animation production method based on dual-channel permission isolation of a 3D engine.

[0024] Compared with the prior art, the present invention has the following outstanding substantive features and significant progress: First, it breaks through the core technical bottlenecks in the industry, fundamentally solving the industry pain point of mutual exclusion between animation and facial expression driving. This invention overcomes the inherent technical prejudice in the field that "skeletal animation inevitably affects mesh vertices through skinning calculations." Through an integrated universal skeletal binding system with exclusive permission mutual exclusion locking mechanism, coupled with a control mechanism of dual independent channels executing in stages and complete pipeline isolation, it fundamentally solves the underlying mutual exclusion problem in existing technologies where skeletal animation playback locks mesh vertex permissions, causing shape key driving to fail. It achieves fully automatic synchronous operation of limb movements, lip movements, and facial expressions, with synchronization accuracy reaching the single-frame level, driving conflict rate reduced to 0, and no need for manual frame-by-frame correction, laying the core technical foundation for fully automated production. Second, a unified semantic tagging system covering the entire chain is constructed to achieve an end-to-end automated closed loop. This invention is the first to construct a unified semantic tagging system covering the entire chain of scripts, action libraries, and engine execution, breaking down the barriers between content creation and production execution. It realizes the direct conversion from natural language content requirements to 3D engine executable production instructions, and opens up a fully automated closed loop of "content input - compliant deconstruction - original reconstruction - animation-driven - engine rendering - final output". The production efficiency of animation per minute is more than 70 times higher than the traditional mode, and completely solves the pain points of the traditional animation production process being fragmented and having high barriers to entry. Third, it achieves copyright isolation from the technical level, completely avoiding the risk of infringement. This invention uses clear technical rules for filtering and isolating copyrighted content to extract only the narrative logic and temporal structure of reference content that are not protected by copyright. All original expressions are generated entirely from scratch, fundamentally solving the pain points of content plagiarism and secondary editing in the short video industry. The generated content has a 100% original review pass rate on the platform, eliminating the risks of unlimited streaming and account violations, and completely solving the core survival pain points of short video creators. Fourth, this invention solves the problem of reverse compatibility of animation resources, enabling efficient resource reuse. Through automatic and accurate recognition and mapping technology based on skeletal topology, coupled with a standardized similarity calculation formula, this invention achieves seamless reverse compatibility between custom characters with facial expression morphology keys and general animation resources. The success rate of custom character adaptation exceeds 95%, and morphology key-driven permissions are 100% preserved during redirection. Combined with a semantic tagging system that maintains consistency across the entire chain, the animation resource reuse rate is increased from less than 10% in the traditional mode to over 90%, eliminating the need for manual parameter tuning and frame-by-frame adjustments, and fully adapting to mass production workflows. Fifth, it significantly lowers the production threshold, enabling industrialized mass production. This invention requires no professional animation skills, directing abilities, or multi-positional personnel. Ordinary creators only need to input reference videos or text instructions to automatically generate original, publishable 3D animated films, completely breaking down the professional barriers to 3D animation production. Its modular architecture supports user-defined characters, actions, voice-overs, and scene templates, adapting to various commercial scenarios such as short videos, animated shorts, commercial content, and virtual human live streams, possessing strong commercial viability and scalability. Detailed Implementation

[0025] The technical solution of the present invention will be further clearly and completely described below with reference to specific embodiments. The described embodiments are only some embodiments of the present invention, and not all embodiments. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be covered within the protection scope of the present invention.

[0026] All embodiments of this invention are based on the Unity / Unreal 3D engine, with a uniform rendering frame rate of 30 frames per second. The supporting system adopts a B / S architecture, with the front end being the web terminal and mobile APP for creators to operate, and the back end being cloud computing power services and data processing modules, supporting the entire process of user uploading materials, inputting commands, automated production, and final output.

[0027] Example 1 This embodiment uses the scenario of a user uploading a 1-minute real-person emotional short video to generate an original 3D animated film as a example to explain in detail the specific implementation process of the present invention: The first step is to pre-build an integrated general skeleton binding system and a unified semantic tag ontology system with the same source across the entire chain, and complete the basic rule configuration.

[0028] In this embodiment, the integrated universal skeleton binding system adopts a 63-node standard human skeleton structure, and establishes unified skeleton naming rules, spatial axis standards on the Y-axis, and a continuous and smooth skin weight system. It also establishes standardized morphic key naming specifications, driving logic, and a Chinese phoneme mapping table, including 52 basic pronunciation mouth shapes and 24 basic facial expressions. A dual-channel permission mutual exclusion lock mechanism is configured: the skeleton animation control channel has exclusive modification rights to the spatial transformation of the 63 skeleton nodes, permanently disabling modification rights to model mesh vertices; the morphic key driving channel has exclusive modification rights to the weights of mouth shape and facial expression morphic keys, permanently disabling modification rights to skeleton nodes. This achieves complete decoupling of the execution pipelines of the two channels and compatibility with mainstream BVH and FBX 3D animation resource formats.

[0029] The entire chain uses a unified semantic tag ontology system, which configures standardized tags for six dimensions: role, action, emotion, scene, shot, and time sequence. All production processes are executed based on this tag system. The semantic tags generated by script splitting and the semantic tags in the action library adopt completely consistent dimensions and encoding rules, achieving cross-process mapping without difference.

[0030] The second step is to build a semantically standardized 3D action library: Standardized preprocessing is performed on animation clips in a general 3D animation resource library. An automatic and accurate identification and mapping algorithm based on skeletal topology and hierarchical relationships is used, with the root node (Hips) as the anchor point, to automatically identify the skeletal system of the animation resources without manual annotation. A skeletal parent-child hierarchical topology tree corresponding to the general standard skeletal system is constructed. Based on the hierarchical structure similarity calculation formula, a one-to-one accurate mapping with the general skeletal binding system of this embodiment is completed. The preset matching threshold is 90%, prioritizing matching the core skeletal nodes of the spine and limbs, followed by matching the distal skeletal nodes. Automatic skeletal redirection, weight smoothing transition, and axial calibration are completed. During redirection, 100% of the character's morphic key-driven structure and permissions are preserved. A pre-trained artificial intelligence model performs semantic recognition and annotation on the adapted animation clips, generating semantic tags of corresponding dimensions based on a unified semantic tag ontology system, forming a standardized 3D motion library that can be automatically retrieved and invoked.

[0031] The 19 core skeletal nodes mentioned in this embodiment are: Hips, Spine, Spine1, Spine2, Neck, Head, LeftShoulder, LeftArm, LeftForeArm, LeftHand, RightShoulder, RightArm, RightForeArm, RightHand, LeftUpLeg, LeftLeg, LeftFoot, RightUpLeg, RightLeg, and RightFoot.

[0032] The third step involves deconstructing the input content, filtering and isolating copyrighted content, and generating original scripts: Users upload a 1-minute real-life emotional short video. The system first separates the audio and video of the uploaded reference video, extracts the audio track from the video, separates the human voice from the background sound from the audio track, and extracts the pure human voice. The pure human voice is then subjected to speech recognition to generate dialogue text corresponding to millisecond-level timestamps. The video screen and dialogue text are jointly deconstructed through a multimodal big data model, and standardized narrative metadata is extracted based on a unified semantic tag ontology system. This includes the character settings of 2 characters, the time sequence of 4 sets of dialogues, the action requirements corresponding to each set of dialogues, emotional attributes, shot size, single shot duration, character positioning and interaction relationships.

[0033] Based on filtering and isolation rules, the system retains only the narrative logic and temporal structure in the narrative metadata. That is, two characters complete four rounds of dialogue in an indoor scene, with their emotions changing from low to relaxed, and the dialogue pace slowing down at the beginning and speeding up later. Copyright-protected content such as the original video frames, audio waveforms, specific camera parameters, and specific indoor scene designs are completely discarded, thus completing the filtering and isolation process. The extracted narrative logic and temporal structure are then fully adapted using a large language model. The two characters are re-defined as workplace colleagues, the scene is set in an office break room, and the dialogue content and plot development are completely original, generating a 1-minute complete original script.

[0034] The fourth step is to complete the automated shooting script translation, drive parameter matching, and interference-free synchronous drive for dual independent channels: The system performs sentence-by-sentence semantic decomposition on the generated original script, and decomposes it into standardized semantic tags of corresponding dimensions based on the unified semantic tag ontology system. Then, it is translated into an automated shooting script that can be directly executed by the 3D engine. The script contains the full-process execution parameters of office scene initialization parameters, loading and positioning rules of two characters, action playback sequence of each shot, lip-sync driving parameters, shot switching instructions, and audio-visual synchronization rules, without the need for manual secondary editing.

[0035] Based on the action requirements and emotion tags in the shooting script, the system automatically matches the corresponding animation clips from the standardized 3D motion library to generate skeletal animation sequences for two characters. Based on the dialogue content in the script, the system generates corresponding dubbing audio through an AI speech synthesis model. Through a phoneme matching algorithm, the dubbing audio is decomposed into pronunciation phonemes that correspond one-to-one with timestamps. Based on the phoneme mapping table, the system generates corresponding lip shape and facial expression key-driven parameters.

[0036] The system achieves synchronous, interference-free driving through a dual-independent channel control mechanism: In this embodiment, the 3D engine's rendering frame rate is 30 frames per second. The skeletal animation control channel executes during the engine's animation update phase, driving the spatial transformation of the character's skeletal nodes through the animation state machine. It only modifies the position and rotation data of the skeletal level without touching the model's mesh vertex data, thus achieving the corresponding actions of the character. The morphic key driving channel executes during the engine's mesh vertex update phase, directly controlling the vertex deformation of the character's model mesh through the generated driving parameters, without interfering with the spatial transformation of the skeletal nodes, thus achieving lip-sync and facial expression changes synchronized with the audio. The execution cycles of the two channels are perfectly matched with the engine's rendering frame rate. Using the timestamp of the dubbing audio as the reference anchor point, frame-level timing synchronization is achieved through a globally unified timeline. Timing deviations are checked synchronously for each frame. When the deviation exceeds 33 milliseconds (single frame duration), frame-level compensation correction is automatically performed. The control permissions of the two channels are completely isolated, enabling synchronous and coordinated operation of the character's limb movements, lip-sync, and facial expression changes without control conflicts or motion deformation issues, and without the need for manual correction.

[0037] Step 5: Standardize cross-platform animation data: The system performs standardized cleaning, format conversion, and redirection verification on the matched skeletal animation data and morphological key driving parameters, removes invalid and redundant data, corrects skeletal axial deviations, and ensures that the data can be seamlessly imported into the 3D engine to generate standardized resource files that can be dynamically loaded by the engine.

[0038] Step 6: Automated 3D Engine Shooting and Final Output: The 3D engine pre-builds a modular office scene animation studio, including a reusable tea room scene, a three-point lighting template, and four types of shot templates: panoramic, medium shot, close-up, and extreme close-up. After reading the automated shooting script, the 3D engine automatically completes the entire process of scene initialization, loading and initial positioning of two characters, synchronous driving of animation and form keys, automatic switching of the camera according to the script, and audio-visual synchronization, while also automatically completing real-time recording.

[0039] After recording is complete, the system outputs video footage in MP4 format and simultaneously outputs JSON format metadata files that correspond one-to-one with the video sequence. The metadata files contain the corresponding parameters for the sequence, characters, dialogue, and actions of each shot. Based on the metadata files, the system automatically completes post-processing such as adding subtitles, matching background music, and color grading, and finally outputs an original 3D animated film that can be directly published to short video platforms.

[0040] This embodiment compares and verifies the technical effects: For 100 sets of dialogue animations with body movements, the existing dual-channel layered driving solution has an 87% conflict rate between movement and lip movements, requiring manual frame-by-frame correction for an average of 42 minutes per minute of animation; the synchronization rate between movement and lip movements of this solution reaches 99.2%, with a driving conflict rate of 0%, requiring no manual correction; in the original content review test of 5 mainstream short video platforms in China, the 100 videos generated by this solution have a 100% original content review pass rate, with no infringement or violation warnings; the production efficiency of a single minute of animation is 72 times higher than the traditional manual production mode.

[0041] Example 2 This embodiment uses the scenario of generating an original 3D animated film from a user-inputted text plot outline as a case study to explain the specific implementation process of the present invention in detail: The first step involves pre-building an integrated universal skeleton binding system, a unified semantic tag ontology system across the entire chain, and a semantic 3D action library, completing the basic configuration. The specific implementation method is the same as in Example 1. The second step involves the user inputting a text-based plot outline: Two workplace characters are having a conversation in the office. Character A complains about work stress, and Character B comforts them. The overall mood shifts from low to relaxed, and the duration is 1 minute. The system uses a large language model based on a unified semantic tag ontology system to extract standardized narrative metadata and directly generate a complete original script that matches the temporal structure, character interactions, and camera movement rules. This script includes character settings, complete dialogue, action requirements, emotional changes, and camera movement requirements. The third step involves the system semantically decomposing the original script and translating it into an automated shooting script that can be directly executed by the 3D engine based on a unified semantic tag ontology system. The specific implementation method is the same as in Example 1. The fourth step involves the user uploading a custom voice-over audio file. Based on this file, the system generates corresponding lip-sync and facial expression key-driven parameters, while simultaneously performing motion matching and dual-independent channel synchronous interference-free driving. The specific implementation method is the same as in Example 1. The fifth step is to complete the cross-platform animation data standardization process, the specific implementation method of which is the same as in Example 1; The sixth step involves the 3D engine reading the automated shooting script, automatically loading the user-defined office scene and custom character models, completing the entire automated shooting and post-processing process, and finally outputting an original 3D animated film that can be directly published.

[0042] The above two embodiments fully demonstrate that the technical solution of the present invention can be stably implemented, achieves all the inventive objectives, and attains the expected technical effects. Any non-substantial modifications or substitutions made by those skilled in the art to the above embodiments based on the core ideas of the present invention are within the protection scope of the present invention.

Claims

1. A method for automated 3D animation production based on dual-channel permission isolation of a 3D engine, characterized in that, Includes the following steps: S1: Pre-build an integrated universal skeleton binding system compatible with skeletal animation and shape key driving, formulate unified skeleton hierarchy rules and shape key driving specifications, configure exclusive modification permissions for skeleton node spatial transformation for the skeletal animation control channel and permanently disable its model mesh vertex modification permissions, configure exclusive modification permissions for model mesh vertex deformation for the shape key driving channel and permanently disable its skeleton node modification permissions, so as to achieve complete decoupling of the execution pipelines of the two channels and mutual exclusion lock of modification permissions; S2: Construct a unified semantic tag ontology system with the same source across the entire chain. The tag system includes six dimensions: character tags, action tags, emotion tags, scene tags, shot tags, and time sequence tags. It realizes standardized and unified mapping of script semantics, action library semantics, and engine execution semantics across links, and provides a unified semantic benchmark for full-process action matching, script translation, and synchronous driving. S3: Construct a semantically searchable standardized 3D motion library, perform standardized preprocessing and redirection of 3D animation resources based on the skeletal topology tree structure, and complete the full-dimensional semantic annotation of animation clips based on the unified semantic tag ontology system; S4: Based on the unified semantic tag ontology system, the original script is broken down into standardized semantic tags and translated into an automated shooting script that can be directly executed by the 3D engine. The original script is either directly input by the user or automatically generated by the system. S5: Skeletal animation and form key synchronization without interference based on dual independent channels: The skeletal animation control channel is executed during the animation update phase of the 3D engine, driving only the spatial transformation of the character's bone nodes; The morphic key driving channel is executed during the mesh vertex update phase of the 3D engine, driving only the mesh vertex deformation of the character's lip-sync morphic key; the two channels achieve frame-level timing synchronization based on a globally unified timeline, checking timing deviations and performing frame-level compensation corrections every frame, and using a permission mutual exclusion locking mechanism to prevent interference with each other's exclusive modification permissions. S6: Standardized processing of cross-platform animation data; S7: The 3D engine, based on the automated shooting script, completes the entire process of automated shooting and final output.

2. The 3D animation automated production method based on dual-channel permission isolation of a 3D engine according to claim 1, characterized in that, In step S1, the integrated universal skeleton binding system has established unified bone node naming rules, spatial axis standards, skinning weight system, as well as standardized morphological key naming specifications, driving logic and Chinese phoneme mapping table, which are compatible with mainstream BVH and FBX 3D animation resource formats, and realize one-time binding for universal use throughout the entire process.

3. The 3D animation automated production method based on dual-channel permission isolation of a 3D engine according to claim 1, characterized in that, In step S1, the dual-channel modification permission mutual exclusion locking mechanism is implemented as follows: In the 3D engine animation pipeline, the skeletal animation control channel is locked to only affect the bone Transform node through animation layer masking, completely disabling its vertex modification permission to the skinned mesh renderer; the shape key drive channel runs independently in the engine LateUpdate vertex update stage by executing a script, directly modifying the shape key weight value without going through the skeletal animation skinning calculation pipeline; By using a custom animation pipeline callback function, all modification requests from the skeletal animation control channel to the model mesh vertices are intercepted, as well as all modification requests from the shape key driven channel to the bone nodes.

4. The 3D animation automated production method based on dual-channel permission isolation of a 3D engine according to claim 1, characterized in that, In step S2, the dimensions of the unified semantic tag ontology system are as follows: the role tag includes the role ID, gender, character design, and adaptation type tag; the action tag includes the basic action type, interaction attribute, and duration range tag; the emotion tag includes the emotion type and intensity level tag; the scene tag includes the scene type, spatial environment, and lighting attribute tag; the shot tag includes the shot size, camera movement logic, and shooting angle tag; and the sequence tag includes the dialogue sequence, action sequence, and shot switching sequence tag.

5. The 3D animation automated production method based on dual-channel permission isolation of a 3D engine according to claim 1, characterized in that, Step S3 specifically involves: performing standardized preprocessing on various 3D animation resources, uniformly mapping them to the integrated universal skeleton binding system, and generating reusable standardized animation clips; performing semantic recognition and annotation on the animation clips through an artificial intelligence model, generating semantic tags of corresponding dimensions based on the unified semantic tag ontology system, and forming a standardized 3D motion library that can be automatically retrieved and called.

6. The 3D animation automated production method based on dual-channel permission isolation of a 3D engine according to claim 5, characterized in that, The standardized preprocessing of the 3D animation resources includes automatic and accurate identification and mapping based on the bone topology and hierarchical relationship. Specifically, it involves using the root node Hips as the anchor point to automatically identify the input custom character bone binding system without manual annotation and constructing a bone parent-child hierarchical topology tree corresponding to the general standard bone system. The hierarchical structure similarity based on the topology tree achieves a one-to-one accurate mapping with the general standard skeletal system. The calculation formula for the hierarchical structure similarity is: similarity = (number of hierarchical matching nodes × 0.7 + number of axial matching nodes × 0.3) / total number of core nodes, where the core nodes are 19 key skeletal nodes of the spine and limbs, and the preset matching threshold is 90%. Nodes with a matching degree exceeding the preset threshold are mapped one-to-one, and terminal nodes with insufficient matching degree are smoothly transitioned by spherical linear interpolation. The skeleton is automatically redirected and axially calibrated, and the original morphological key-driven structure and permissions of the character are completely preserved during the redirection process.

7. The 3D animation automated production method based on dual-channel permission isolation of a 3D engine according to claim 1, characterized in that, Before step S4, there is also an original script generation step, which supports two input modes: reference video and text command. For the reference video input mode, the reference video is subjected to audio-visual separation, voice extraction, and subtitle recognition to generate dialogue text corresponding to the timestamp. The video screen and dialogue text are jointly deconstructed through a multimodal big data model, and standardized narrative metadata is extracted based on a unified semantic tag ontology system. Only the narrative logic and temporal structure in the narrative metadata are retained, and the original video's screen frames, audio waveforms, specific shot parameters, and art elements are completely discarded to complete the filtering and isolation process. The narrative logic and temporal structure are fully adapted in an original way through a big data language model to generate a brand new original script. For the text command input mode, the user inputs a plot outline / text command, and the big data language model extracts standardized narrative metadata to directly generate a complete original script that matches the temporal structure, character interaction, and shot scheduling rules.

8. The 3D animation automated production method based on dual-channel permission isolation of a 3D engine according to claim 7, characterized in that, The standardized narrative metadata includes the number of characters and their character settings, dialogue sequence, action requirements, emotional attributes, shot size, camera movement logic, single shot duration, character positioning and interaction relationships.

9. The 3D animation automated production method based on dual-channel permission isolation of a 3D engine according to claim 1, characterized in that, In step S5, the specific logic of the frame-level compensation correction is as follows: when the execution timing deviation between the two channels exceeds the duration of a single frame, the playback rate of the single frame of the skeletal animation sequence is linearly fine-tuned based on the audio timestamp, and the timestamps of the morphic key driving parameters are aligned at the frame level to ensure that the skeletal transformation data and the morphic key weight data are completely matched within a single frame without any misalignment deviation.

10. The 3D animation automated production method based on dual-channel permission isolation of a 3D engine according to claim 1, characterized in that, Step S6 specifically involves: standardizing and cleaning the matched skeletal animation data and morphological key driving parameters, performing format conversion and redirection verification, removing invalid and redundant data, correcting skeletal axial deviations, and generating standardized resource files that can be dynamically loaded by the 3D engine.

11. The 3D animation automated production method based on dual-channel permission isolation of a 3D engine according to claim 1, characterized in that, Specifically, step S7 involves: pre-building a modular animation studio in the 3D engine, including reusable scenes, lights, and camera templates; the 3D engine reading the automated shooting script and automatically completing the entire process of scene initialization, character loading, animation and form key synchronization, camera switching, and audio-visual synchronization, and automatically completing real-time recording. Output the recorded video footage, and simultaneously output a JSON format metadata file that corresponds one-to-one with the video sequence. The metadata file contains the sequence, characters, dialogue, and action parameters for each shot. Automated post-processing is performed based on metadata files, resulting in an original 3D animated film that can be directly published.

12. A 3D animation automated production system based on dual-channel permission isolation of a 3D engine, characterized in that, The method for implementing any one of claims 1-11 comprises: a binding system management module for managing an integrated universal skeletal binding system compatible with skeletal animation and shape key-driven methods, and configuring exclusive modification permissions and mutual exclusion lock-in rules for the two channels; a tag system management module for managing a unified semantic tag ontology system with consistent origins across the entire chain, and configuring standardized mapping rules for tags in each dimension; and a basic asset management module for managing the integrated universal skeletal binding model and semantic 3D... The system includes: an animation library, scene assets, and character models; a content deconstruction and original script generation module for multimodal deconstruction of input content, filtering and isolating copyrighted content, and generating original scripts; a script generation module for breaking down original scripts into standardized semantic tags and translating them into automated shooting scripts that can be directly executed by the 3D engine; an animation and drive matching module for matching the animation library, generating skeletal animation sequences, matching phonemes, and generating morphic key drive parameters based on semantic tags; a dual-channel synchronization control module for driving the skeletal animation and morphic key deformation of characters through two completely isolated independent control channels, achieving frame-level timing synchronization and deviation compensation based on a globally unified timeline; a cross-platform data processing module for animation data cleaning, bone redirection, and format standardization conversion; and an engine automated rendering module for achieving automated script execution, fully unmanned shooting, asset and metadata output, and automated post-production compositing based on the 3D engine.

13. The 3D animation automated production system based on dual-channel permission isolation of a 3D engine according to claim 12, characterized in that, It also includes a user-defined material management module, which is used for importing and managing user-defined character models, scene templates, and voice-over materials.

14. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the 3D animation automated production method based on dual-channel permission isolation of a 3D engine as described in any one of claims 1-11.