An ar tag adaptive rendering method and system
By executing the dynamic rendering logic and low-frequency differential updates of AR tags locally on the command terminal, the problems of insufficient real-time response speed and high bandwidth requirements of AR tag rendering are solved, achieving fast response and high dynamic rendering in low-bandwidth environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 广西信安锐达科技有限公司
- Filing Date
- 2026-01-31
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies for AR tag rendering have insufficient real-time response speed and high bandwidth requirements, making it difficult to achieve instant interaction and real-time visual feedback in dynamic command and dispatch scenarios.
The dynamic performance logic of AR tags is decentralized to the command terminal for local execution. A behavior rule and low-frequency differential state update are generated and transmitted through the server. The command terminal performs local image analysis to determine whether the preset conditions are triggered and executes the corresponding actions.
It achieves fast-response, highly dynamic AR tag rendering in low-bandwidth environments, reducing the demand for network bandwidth and server computing power, and improving the response speed and interactivity of command and decision-making.
Smart Images

Figure CN122199764A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of augmented reality technology, and in particular to an AR tag adaptive rendering method and system. Background Technology
[0002] In command and dispatch, emergency rescue, and other scenarios, augmented reality (AR) tagging technology is widely used to enhance video display in order to achieve precise control over target objects and their surrounding environment. By overlaying tags containing coordinates, content, and other information onto the video, commanders can intuitively obtain key data. Currently, the mainstream approach is to embed AR tag information into the supplemental enhancement information (SEI) of the video, and then display the tags synchronously with the transmission of the video stream.
[0003] This solution is essentially the loading and display of static or semi-static tags, while command and dispatch scenarios are highly dynamic. The situation of the target objects that the command is concerned about is changing in real time, and the surrounding environment will change rapidly over time, which places extremely high demands on the dynamic updating capability of AR tags.
[0004] To adapt to the above dynamic requirements, the general solution is to rely on the backend server to continuously recalculate and update the tag data, and repeatedly inject the updated data into the video stream, thereby refreshing the tag information through the transmission of the stream.
[0005] However, since dynamic information relies on the server to pre-calculate and inject the bitstream, it is difficult to achieve local instant interaction and real-time feedback of the screen. In addition, frequent tag data updates require repeated re-encoding of the video bitstream, which can easily lead to drastic fluctuations in the bitstream bit rate, increase network load, and affect the smoothness of video transmission.
[0006] Therefore, an adaptive rendering method and system for AR tags is needed. Summary of the Invention
[0007] To address the issues of insufficient real-time response speed and high bandwidth requirements in existing AR tag rendering technologies, this invention provides an adaptive AR tag rendering method and system that enables high-dynamic performance AR tag rendering with fast response in low-bandwidth environments. The specific technical solution is as follows: In a first aspect, embodiments of this application provide an AR tag adaptive rendering method, applied to an AR tag adaptive rendering system, the system including a server and a command terminal; the method includes: The server generates AR tags for target objects based on the received video stream. These AR tags include state variables and a set of behavioral rules for the target object. The target object is the object in the video stream that needs to be managed in the corresponding command and control scenario. The state variables are attribute variables of the target object that change over time or events within the command and control scenario. The behavioral rules in the set of rules indicate the actions the command terminal performs when preset conditions are met. These actions include changing the rendering style of the AR tag or updating the state variables. If the target object is appearing for the first time, the server encodes the AR tag into the SEI unit of the video stream. If the target object is not appearing for the first time, the server encodes the AR tag difference data between the current frame and the previous frame in the video stream into the SEI unit. The command terminal extracts the AR tag from the video stream after encoding the SEI unit and, through a local behavioral rule engine, determines whether the preset conditions have been triggered based on the state variables and the image analysis results of the video frames in the video stream. For each triggered preset condition, the command terminal executes the corresponding action.
[0008] Preferably, the server generates an AR tag for the target object based on the received video stream, including: the server performing target detection on the video stream to obtain the target object; the server creating and initializing a state variable for the target object based on the command and control scenario and the type of the target object; and the server compiling a set of behavioral rules for the target object based on the command and control scenario. The preset conditions include a conditional expression that takes at least one of the state variable, a time trigger, a spatial region trigger, and image features of video frames in the video stream as input.
[0009] Preferably, the command terminal extracts the AR tag from the video stream after encoding the SEI unit, and determines whether the preset condition is triggered based on the state variable and the image analysis results of the video frames in the video stream through a local behavior rule engine. This includes: the command terminal reconstructing the AR tag based on the load of the SEI unit; the command terminal loading the behavior rule set into the local behavior rule engine based on the AR tag, and updating the key variables in the state context maintained by the local behavior rule engine, which are used to determine whether the preset condition is triggered; the command terminal traversing the behavior rule set, determining the behavior rule that the preset condition is triggered based on the latest value of the key variable, and executing the corresponding action.
[0010] Preferably, after the command terminal extracts the AR tag from the video stream after encoding the SEI unit, the method further includes: the command terminal rendering and presenting the AR tag on the AR interface; the command terminal acquiring a user input event and the AR tag corresponding to the user input event; the command terminal encapsulating the event type of the user input event and the unique identifier of the AR tag being operated into an interaction trigger object, and injecting the interaction trigger object into the local behavior rule engine; and when the interaction trigger matches the preset conditions corresponding to the AR tag being operated, the command terminal executing the corresponding action.
[0011] Secondly, embodiments of this application provide an AR tag adaptive rendering system applied to a server; the method includes: An AR tag for the target object is generated based on the received video stream. The AR tag includes the target object's state variables and a set of behavioral rules. The target object is the object that needs to be managed in the corresponding command and control scenario within the video stream. The state variables are attribute variables of the target object that change over time or events in the command and control scenario. The behavioral rules in the set of behavioral rules are used to indicate the actions performed by the command terminal when preset conditions are met. These actions include changing the rendering style of the AR tag or updating the state variables. If the target object is appearing for the first time, the AR tag is encoded into the supplementary enhancement information (SEI) unit of the video stream. If the target object is not appearing for the first time, the AR tag differential data between the current frame and the previous frame in the video stream is encoded into the SEI unit. The video stream encoded with the SEI unit is then transmitted to the command terminal.
[0012] Thirdly, embodiments of this application provide an AR tag adaptive rendering method applied to a command terminal; the method includes: receiving a video stream sent by a server, wherein the supplementary enhancement information (SEI) unit of the video stream includes an encoded AR tag, the AR tag including a state variable of a target object and a set of behavior rules; wherein the target object is an object in the video stream that needs to be managed in the corresponding command business scenario, the state variable is an attribute variable of the target object that changes with time or events in the command business scenario, and the behavior rules in the set of behavior rules are used to indicate the actions performed by the command terminal when preset conditions are met, the actions including changing the rendering style of the AR tag or updating the state variable; extracting the AR tag from the SEI unit, and determining whether the preset condition is triggered based on the state variable and the image analysis results of the video frames in the video stream through a local behavior rule engine; for each triggered preset condition, executing the corresponding action.
[0013] Fourthly, embodiments of this application provide an AR tag adaptive rendering system, which includes a server and a command terminal; The server is used to generate AR tags for target objects based on the received video stream; the AR tag includes the target object's state variables and a set of behavior rules; wherein, the target object is the object in the video stream that needs to be managed in the corresponding command and control scenario, the state variable is the attribute variable of the target object that changes over time or events in the command and control scenario, and the behavior rules in the set of behavior rules are used to indicate the actions performed by the command terminal when preset conditions are met, the actions include changing the rendering style of the AR tag or updating the state variable; In cases where the target object is appearing for the first time, the server is also used to encode the AR tag into the supplemental augmentation information (SEI) unit of the video stream; If the target object is not appearing for the first time, the server is also used to encode the AR tag differential data of the current frame and the previous frame in the video stream into the SEI unit; The command terminal extracts the AR tag from the video stream after encoding the SEI unit, and determines whether the preset condition is triggered based on the state variable and the image analysis results of the video frames in the video stream through the local behavior rule engine. The command terminal is also used to execute actions corresponding to each triggered preset condition.
[0014] Fifthly, embodiments of this application provide a server, including: a memory for storing a program; and a processor for loading the program to execute the method as described in the second aspect.
[0015] In a sixth aspect, embodiments of this application provide a command terminal, comprising: a memory for storing a program; and a processor for loading the program to execute the method described in the third aspect.
[0016] In a seventh aspect, embodiments of this application provide a computer-readable storage medium including a stored program, wherein, when the program is executed, it controls the device where the computer-readable storage medium is located to perform the method as described in any of the first to third aspects.
[0017] Compared with existing technologies, the beneficial effects of this invention are as follows: by offloading the complex dynamic performance logic of AR tags, i.e., the set of behavioral rules, to the command terminal for local execution, only one rule and low-frequency differential state update need to be transmitted in the bitstream, effectively reducing the network bandwidth and server computing power required to maintain tag dynamism; the command terminal can achieve edge intelligence through local image analysis of video frames, making the command decision feedback loop shorter and the response speed faster. Using the method provided in this application, high-dynamic-performance AR tag rendering with fast response in low-bandwidth environments can be achieved. Attached Figure Description
[0018] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. In all the drawings, similar elements or parts are generally identified by similar reference numerals. In the drawings, the elements or parts are not necessarily drawn to scale.
[0019] Figure 1 This is a schematic diagram of the structure of an AR tag adaptive rendering system provided in an embodiment of this application; Figure 2 A flowchart illustrating an AR tag adaptive rendering method provided in an embodiment of this application; Figure 3 A flowchart illustrating another AR tag adaptive rendering method provided in an embodiment of this application; Figure 4 A flowchart illustrating another AR tag adaptive rendering method provided in an embodiment of this application; Figure 5 This application provides a schematic diagram of the structure of a server according to an embodiment of the present application. Figure 6 This is a schematic diagram of the structure of a command terminal provided in an embodiment of this application. Detailed Implementation
[0020] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0021] It should be understood that, when used in this specification and the appended claims, the terms "comprising" and "including" indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.
[0022] It should also be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.
[0023] It should also be further understood that the term "and / or" as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.
[0024] To address the issues of insufficient real-time response speed and high bandwidth requirements in traditional Augmented Reality (AR) tag rendering methods, this invention provides an AR tag adaptive rendering method and system that can achieve high dynamic performance AR tag rendering with fast response in low-bandwidth environments.
[0025] To better understand the embodiments of this application, the system architecture used in the embodiments of this application will be described below.
[0026] Please see Figure 1 , Figure 1 A system architecture diagram of an AR tag adaptive rendering system provided in this application embodiment is shown below. Figure 1 As shown, the system includes a server 10 and a command terminal 20, which can communicate with each other via a wired network or a wireless network.
[0027] Server 10 is the core processing node in the system backend, undertaking the core functions of global data processing, resource scheduling, information integration, and distribution. Specifically, Server 10 is responsible for generating, calculating, and encoding AR tags, encoding AR tag information into the Supplemental Enhancement Information (SEI) unit of the video stream; aggregating multi-source data such as video, sensor, and positioning collected by various front-end devices in real time, and centrally analyzing and updating information such as the status and environmental changes of the target object; uniformly scheduling the transmission and allocation of resources such as video streams and AR tag data to ensure information synchronization across multiple terminals; and undertaking basic support work such as global logical operations, access control, and data storage for the system.
[0028] Specifically, server 10 can be a blade server, high-density server, rack server, cabinet server, general-purpose server, graphics processing unit (GPU) server, data processing unit (DPU) server, or artificial intelligence (AI) server, etc. This application embodiment does not specifically limit the specific form of server 10.
[0029] The command terminal 20, serving as the local operation, information visualization, and command issuance platform for commanders, is the interactive entry point connecting the backend server and on-site command. It is typically deployed in command vehicles, command center workstations, and other locations. Specifically, the command terminal 20 is responsible for receiving and decoding video streams with AR tags sent from the server, enabling synchronized overlay display of video footage and AR tags to provide commanders with intuitive visual information. Simultaneously, it supports local human-computer interaction operations for commanders, such as target selection, tag clicking, and parameter adjustment. It also transmits the commander's operation commands and locally collected real-time data back to the server, achieving two-way communication with the backend. Furthermore, it can act as a command issuance node, transmitting dispatch instructions to the front-end execution unit, completing the closed loop of on-site command from data reception to decision-making to command issuance.
[0030] Specifically, the command terminal 20 is an AR device, which can be a head-mounted AR device, a handheld AR device, a desktop AR device, or a vehicle-mounted AR device, such as an AR smart helmet, AR tactical glasses, AR tablet, AR workstation, touch-screen AR command console, vehicle-mounted AR display screen, or UAV ground station AR terminal. This application embodiment does not specifically limit the specific form of the command terminal 20.
[0031] It should be noted that, in the specific implementation, the system architecture can be any, including... Figure 1 A similar architecture to that in [the text]. The embodiments of this application do not limit the specific composition of this system architecture. Furthermore, Figure 1 The architectural components shown do not constitute a limitation on the system architecture, except... Figure 1 In addition to the devices shown, the system architecture may include more or fewer devices than illustrated.
[0032] Based on the above system architecture, please refer to Figure 2 , Figure 2 This application provides a flowchart illustrating an AR tag adaptive rendering method, which is applied to an AR tag adaptive rendering system, including a server and a command terminal. Figure 2 As shown, the method includes the following steps: Step 201: The server generates AR tags for the target object based on the received video stream.
[0033] The server continuously receives video streams from video acquisition terminals at the command and dispatch site. For each frame in the video stream, the server can invoke a pre-trained object detection model to identify target objects in the image and output corresponding bounding boxes for each detected target object.
[0034] For example, the object detection model includes the YOLO model or the Faster R-CNN model based on convolutional neural networks.
[0035] For example, the bounding box can be defined by the top-left and bottom-right pixel coordinates of the target object.
[0036] The target object is the object in the video stream that needs to be managed in the corresponding command and control scenario.
[0037] Then, the server can use multi-target tracking algorithms such as DeepSORT to associate the bounding boxes of these target objects across frames. This algorithm combines the appearance features, motion features, and Intersection over Union (IOU) of the target objects for matching, assigning and maintaining a temporary tracking ID for each continuously tracked target object to ensure the continuity of the target within the field of view of a single camera.
[0038] Then, to facilitate command and control of large-scale operations, the server can correlate the target locations in the image with the real world through coordinate transformation. Each video acquisition device is calibrated during deployment and possesses known intrinsic and extrinsic parameter matrices. When a target object is detected, the pixel coordinates (u, v) of the center point of its bounding box are extracted. The server utilizes the camera's inverse perspective transformation model and known ground plane assumptions, or depth information acquired through a depth sensor. Transform the pixel coordinates to three-dimensional coordinates in the camera coordinate system. ).
[0039] Then, the server uses rigid body transformation to change the coordinates in the camera coordinate system. Transform to coordinates in a unified world coordinate system. Finally, through Geographic Information System (GIS) mapping, these world coordinates can be mapped to specific latitude and longitude or map grid locations.
[0040] The server maintenance includes a command and dispatch database, which can perform correlation queries and matching with information such as task planning, equipment ledgers, and personnel rosters in the database based on the category information and spatiotemporal location of the target object. Then, a unique business identifier can be assigned to the target object, and static attributes related to the identifier can be extracted from the corresponding data in the database as the initial display content of the AR tag. Finally, the temporary tracking ID, unique business identifier, and corresponding static attributes of the target object are associated.
[0041] For example, the server can compare the location of the target object with the "police officer's on-duty GPS trajectory" or the appearance of the vehicle with the "registered vehicle information" to assign a unique business identifier to the target object, such as "police officer 1024" or "fire truck A-07". Then, it can extract static attributes such as name, affiliated unit, and vehicle model and associate them with the unique business identifier.
[0042] The AR tag includes the target object's static attributes, state variables, and set of behavioral rules.
[0043] Specifically, the state variable is the attribute variable of the target object that changes over time or events in the command and control scenario. The behavior rules in the behavior rule set are used to indicate the actions that the command terminal performs when preset conditions are met. These actions include changing the rendering style of the AR tag or updating the state variable.
[0044] Preferably, the server can perform target detection on the video stream to obtain the target object; the server creates and initializes the state variable for the target object based on the command and control scenario and the type of the target object; the server compiles the behavior rule set for the target object based on the command and control scenario.
[0045] The server can create corresponding state variables based on a preset business template for the target object type. This business template is used to define the state variables for that type of target object. The initial value of the state variable can come from the real-time state variable value collected on-site or the default initial value in the database; subsequent values can be updated by the server or by local rules.
[0046] For example, the state variable includes the person's life value, which is a floating-point number in the range [0, 100], with an initial value of 100. The life value can be inferred and updated based on the data collected by the vital signs monitoring equipment or factors such as the time the person stays in the high-temperature area.
[0047] For example, this state variable includes a threat level, which is an enumeration type with an enumeration set of {low, medium, high, urgent}, and an initial value of "low". It can be updated based on suspicious behavior analysis or intelligence input, such as abnormal movement speed of a target person or their entry into a restricted area.
[0048] For example, this state variable includes task progress, which is a floating-point number ranging from [0, 100] with an initial value of 0. It can be updated manually or automatically based on task stage nodes.
[0049] The server can select behavioral rules based on the type of the target object according to a preset general rule library; or it can select specific rules pre-input by the user for the current command and control business scenario; and then instantiate and compile these rules in combination with the target object and the on-site environmental parameters.
[0050] The preset conditions include a conditional expression that takes at least one of the state variable, time trigger, spatial region trigger, and image features of video frames in the video stream as input.
[0051] For example, the preset conditions include numerical relationships of state variables such as high threat level, health value below 30, mission time greater than 30 minutes without completion, and disconnection time exceeding 5 minutes, or specific time triggers.
[0052] For example, the preset conditions also include Boolean expressions representing the analysis results of the video frame images by the command terminal. For instance, the command terminal can construct a spatial region trigger by preloading the pixel range of a key region; when personnel enter the restricted area, the corresponding Boolean expression is true, and the spatial region trigger is activated. Alternatively, the command terminal can analyze the average color intensity of the area surrounding the target object to detect whether the ambient temperature is greater than a certain threshold, or calculate whether the motion vector amplitude of the target object in the image is greater than a threshold to detect abnormal disturbances.
[0053] Then, for each target object, the server can encapsulate the aforementioned information into a structured AR tag, which includes basic attributes, state variables, and a set of behavioral rules. The basic attributes include a business-unique identifier, target category, pixel coordinates, and world coordinates; the state variables are a key-value dictionary containing a current snapshot of all state variables for the corresponding target object; the set of behavioral rules is in list form, with each behavioral rule storing its preset condition logical expression and action instruction in a serializable format.
[0054] Step 202: If the target object is appearing for the first time, the server encodes the AR tag into the SEI unit of the video stream.
[0055] In one scenario, when a target object first appears in the monitoring field of a video acquisition device, the server can encode the complete AR tag into the SEI unit of the video stream. This allows the command terminal to synchronously render the AR tag of the target object after receiving the video stream, providing users with a basis for command and decision-making.
[0056] A complete, structured AR tag is a complex data structure containing nested fields and cannot be directly used as the SEI payload. Therefore, the server can serialize it first.
[0057] Specifically, the server can employ an efficient binary serialization protocol to convert the three fields of the AR tag—basic attributes, state variables, and behavioral rule sets—and their substructures, into a compact, self-describing byte sequence according to a predefined pattern. Compared to general text formats, binary serialization significantly reduces data size.
[0058] For example, the binary serialization protocol could be Protocol Buffers or MessagePack.
[0059] The server's encoder maintains a synchronization logic. When encoding a frame (denoted as frame F) begins, the AR tag corresponding to frame F is sent to the encoder. Before generating the encoded data for frame F, the encoder first generates SEI (Network Abstraction Layer) units containing AR tag data. This ensures that, in the bitstream order, the AR tag describing frame F always appears before the image data of frame F. When the terminal decodes, it first reads the tag data and then decodes the corresponding frame, laying the foundation for zero-latency synchronous rendering.
[0060] Then, the video NAL units output by the encoder and the SEI NAL units containing AR tags are sent to a stream multiplexer. The stream multiplexer packages them into the final transmission container format according to the encoding standard specifications and transmits them to the command terminal.
[0061] Preferably, considering that video data can tolerate a small amount of packet loss, while the loss of tag data may cause errors in the rule engine state, the server can mark the SEI NAL unit as a high-reliability, low-latency data packet, while marking the video data NAL unit as a standard streaming media data packet; the network stack can adopt different forward error correction or retransmission strategies based on this marking, giving priority to ensuring the integrity and timeliness of smart tag data on unreliable networks.
[0062] Step 203: If the target object is not appearing for the first time, the server encodes the AR tag differential data of the current frame and the previous frame in the video stream into the SEI unit.
[0063] In cases where a target object is not appearing in the monitoring field of a video acquisition device for the first time, the server can calculate and transmit differential data to optimize bandwidth while ensuring that the command terminal can update the AR tag information of the video stream in a timely manner.
[0064] Specifically, for consecutive video frames, most of the basic attributes and state variables of AR tags change slowly. The server can be set... For the AR tag, a complete serialized byte array from the previous frame. The complete serialized byte array of the tag in the current frame; then the difference is calculated by the encoder. ,in, It can represent byte-by-byte XOR operations or more complex binary difference algorithms.
[0065] Then, a lightweight lossless compression algorithm is applied to the difference data Δ to obtain the compressed difference data. .
[0066] Then, the server can use the differential data Transmitted to the command terminal.
[0067] Preferably, in differential data Smaller than the current AR tag size (Size) Under the preset ratio, the server will distribute the differential data. Transmitted to the command terminal; in differential data Greater than or equal to the size of the current AR tag (Size) Under the preset ratio, the server will Transmitted to the command terminal.
[0068] The preset ratio can be obtained by comparing the completeness and timeliness of AR tag data under different selected ratios in historical data. For example, the preset ratio is 80%.
[0069] This mechanism ensures that data integrity and timeliness are maintained even when the target is moving at high speed or undergoing drastic changes in state.
[0070] Step 204: The command terminal extracts the AR tag from the video stream after encoding the SEI unit, and determines whether the preset condition is triggered based on the state variable and the image analysis results of the video frames in the video stream through the local behavior rule engine.
[0071] The command terminal continuously receives video streams from the server. Its demultiplexer separates the stream into video encoded data packets and supplementary data packets based on the container format header information. Then, the video encoded data packets are sent to the video decoder for decoding, and the original YUV or RGB format video frames are reconstructed, which are then sent to the rendering pipeline or frame buffer.
[0072] Simultaneously, the demultiplexer or video decoder extracts the SEI unit payload from the supplementary data packet, and extracts the frame sequence number and tag data block contained therein according to the defined format. The command terminal maintains a buffer corresponding to the frame, associates and stores the extracted tag data block with the corresponding frame sequence number, and waits for the video frame with that sequence number to be decoded before entering the rendering process together.
[0073] Then, the rendering engine of the command terminal can obtain the pixel coordinates (u,v) and text / icon content from the basic attributes of the AR tag from the tag data; and draw a 2D graphics layer at screen coordinates (u,v) on top of the corresponding video frame texture.
[0074] Preferably, the command terminal reconstructs the AR tag based on the load of the SEI unit; based on the AR tag, the command terminal loads the behavior rule set into the local behavior rule engine, and updates the key variables in the state context maintained by the local behavior rule engine based on the state variables and image analysis results. The key variables are used to determine whether the preset condition is triggered; the command terminal traverses the behavior rule set, determines the behavior rule that the preset condition is triggered based on the latest value of the key variables, and executes the corresponding action.
[0075] Specifically, the command terminal can first check the data block type of the tag data block associated with the current frame. If it is a complete AR tag, the command terminal can directly deserialize the byte stream using the same mode as the encoding end to reconstruct the complete AR tag; if it is differential data of the AR tag, the command terminal reads the complete object of the tag from the previous frame from the local cache. , will receive After decompression, Δ is obtained. A reconstruction operation is performed to obtain the complete AR tag for the current frame, and the local cache is updated. This step ensures that the terminal has a completely consistent copy of the state logic with the server.
[0076] The command terminal runs a lightweight local behavior rule engine. For each reconstructed AR tag, the engine loads each behavior rule from its behavior rule set into an executable memory structure; each rule is parsed into a conditional decision tree and an action execution list. Simultaneously, the engine maintains a state context containing the current values of the AR tag's state variables, as well as other environment variables required to support rule condition calculations.
[0077] Before rendering the AR tag for each frame, the engine can update the state variables of the AR tag in the state context with the latest data of the corresponding timestamp.
[0078] Specifically, the command terminal can first write or overwrite the latest state variable value in the AR tag or its differential data into the state context; it can also receive update instructions for specific state variables sent by the server through an independent, low-frequency control channel, and update the state context based on the state variable values in the update instructions.
[0079] Preferably, the engine also integrates a lightweight video analysis module; the command terminal can extract regional and global features of frame images based on this video analysis module as image analysis results.
[0080] The region features can include the color intensity and motion energy of the frame image. The color intensity is the average value of the red channel or luminance channel in the specified region, used to indicate the temperature or combustion status of the specified region. The motion energy is the average amplitude of the pixel difference values in the same region between the current frame and the previous frame, used to detect abnormal motion of the target object.
[0081] Among them, global features can be global semantic features of the frame image, such as flames, thick smoke, etc.
[0082] The video analysis module can be a micro neural network or a traditional image processing operator.
[0083] The engine can map features from image analysis results to parameters of corresponding key variables to update the values of state variables in the state context.
[0084] After updating the state context, the engine can traverse all the behavior rules of the loaded AR tag, evaluate whether the corresponding condition expression is satisfied based on the latest state variable value in the state context, and execute the corresponding action if the condition expression is satisfied, that is, if the preset condition is satisfied.
[0085] Step 205: For each triggered preset condition, instruct the terminal to execute the corresponding action.
[0086] This action includes rendering instructions and state update instructions. For rendering instructions, the engine modifies the rendering style attribute set of the AR tag, which is a data structure used to control the visual performance of the AR tag; for state update instructions, the engine directly modifies the variable values in the state context.
[0087] Understandably, after traversing the rules, the engine can first execute the corresponding state update instruction, and then traverse the rule set again with the updated state context.
[0088] After the engine completes the evaluation and action processing of all tags in the current frame, the graphics renderer starts working. It reads the dynamic set of rendering style attributes output by the rules engine and renders the AR tags based on this.
[0089] The renderer can dynamically generate or select corresponding graphic elements based on parameters in the attribute set. These parameters can include color, blink frequency, transparency, and additional text.
[0090] The renderer uses pixel coordinates provided in the basic properties, which are bound to the video content of the current frame, for drawing. Because the coordinates are integrated with the video content, zero drift is guaranteed. All dynamic style changes are synchronized with the display output refresh rate, ensuring visual smoothness.
[0091] Preferably, after the command terminal extracts the AR tag, the command terminal renders and presents the AR tag on the AR interface; the command terminal obtains the user input event and the AR tag corresponding to the user input event; the command terminal encapsulates the event type of the user input event and the unique identifier of the AR tag being operated into an interaction trigger object, and injects the interaction trigger object into the local behavior rule engine; when the preset conditions corresponding to the interaction trigger are matched, the command terminal executes the corresponding action.
[0092] This interaction trigger object acts as a special external condition trigger, participating in the rule evaluation of the next or current frame, thereby triggering the corresponding interactive feedback action. The state change caused by the interaction can be transmitted back to the server via the control channel.
[0093] By introducing a lightweight behavior rule engine that runs locally on the command terminal and is tightly coupled with the video decoding and rendering pipeline, a complete closed-loop process of perception, decision-making, and rendering is constructed. Intelligent judgment is devolved from the central server to the edge terminal, utilizing local video analysis capabilities to interact with the environment and drive the dynamic performance of AR tags in real time and autonomously. This not only greatly reduces the continuous dependence on network bandwidth and server computing power, but also achieves a low-latency, highly personalized, and highly interactive augmented reality command experience that is difficult to achieve with a centralized architecture.
[0094] To achieve a consistent view of the overall status of the command center, the command terminal needs to securely and efficiently synchronize local status changes back to the server.
[0095] The command terminal can maintain a long-term connection with the server through a reliable, low-bandwidth control channel independent of the video stream. Periodically, or when the change log reaches a certain number of entries, the terminal packages the unsynchronized records in the change log into an incremental status update message and returns it to the server.
[0096] After receiving the message, the server performs an optimistic locking conflict check on each update: comparing the version of the state variable in the request with the version of the state variable currently stored on the server. If they match, the incremental state update is applied, the version number is incremented, and the update source is recorded. Subsequently, the server generates an acknowledgment response containing a list of accepted updates and their new version numbers.
[0097] After receiving confirmation, the command terminal clears the accepted records from the synchronization log and updates the version number stored locally. If an update is rejected by the server, it means that the state variable has been modified by another terminal or the backend. In this case, the terminal can overwrite the local value with the current value returned by the server and re-evaluate the relevant rules through the rule engine; alternatively, it can display a message to the user that "the data has been modified by someone else" and show the old and new values for selection.
[0098] For certain critical status updates, after successful application, the server will proactively broadcast them via the control channel to all other command terminals that have subscribed to the relevant data stream or scenario.
[0099] In this embodiment, by offloading the complex dynamic performance logic of AR tags, i.e., the set of behavioral rules, to the command terminal for local execution, only one rule and low-frequency differential state update need to be transmitted in the bitstream, effectively reducing the network bandwidth and server computing power required to maintain tag dynamism. The command terminal can achieve edge intelligence through local image analysis of video frames, making the command decision feedback loop shorter and the response speed faster. Using the method provided in this application, high-dynamic performance AR tag rendering with fast response can be achieved in low-bandwidth environments.
[0100] Please see Figure 3 , Figure 3 This application provides a flowchart illustrating another AR tag adaptive rendering method, which is applied to the server in the above system architecture; as shown. Figure 3 As shown, the method includes the following steps: Step 301: The server generates AR tags for the target object based on the received video stream.
[0101] The AR tag includes a state variable and a set of behavioral rules for the target object. The target object is the object that needs to be managed in the corresponding command and control scenario in the video stream. The state variable is the attribute variable of the target object that changes with time or events in the command and control scenario. The behavioral rules in the set of behavioral rules are used to indicate the actions that the command terminal performs when preset conditions are met. These actions include changing the rendering style of the AR tag or updating the state variable.
[0102] Step 302: If the target object is appearing for the first time, the server encodes the AR tag into the SEI unit of the video stream.
[0103] Step 303: If the target object is not appearing for the first time, the server encodes the AR tag differential data of the current frame and the previous frame in the video stream into the SEI unit.
[0104] Step 304: The server transmits the video stream after encoding the SEI unit to the command terminal.
[0105] The specific implementation methods of steps 301 to 304 are as follows: Figure 2 The implementation of steps 201 to 203 in the illustrated embodiment is similar; please refer to [reference needed] for details. Figure 2 The relevant parts described in the embodiments will not be repeated here.
[0106] In this embodiment of the application, by delegating the complex dynamic performance logic of AR tags, i.e. the set of behavioral rules, to the command terminal for local execution, only one rule and low-frequency differential state update need to be transmitted in the bitstream, which effectively reduces the network bandwidth and server computing power required to maintain the dynamics of the tags.
[0107] Please see Figure 4 , Figure 4 This application provides a flowchart illustrating another AR tag adaptive rendering method, which is applied to the command terminal in the aforementioned system architecture; as shown below. Figure 4 As shown, the method includes the following steps: Step 401: The command terminal receives the video stream sent by the server.
[0108] The SEI unit of the video stream includes an encoded AR tag, which includes a state variable and a set of behavioral rules for the target object. The target object is the object in the video stream that needs to be managed in the corresponding command and control scenario. The state variable is the attribute variable of the target object that changes over time or events in the command and control scenario. The behavioral rules in the set of behavioral rules are used to indicate the actions performed by the command terminal when preset conditions are met. These actions include changing the rendering style of the AR tag or updating the state variable.
[0109] Step 402: The command terminal extracts the AR tag from the SEI unit and, through the local behavior rule engine, determines whether the preset condition has been triggered based on the state variable and the image analysis results of the video frames in the video stream.
[0110] Step 403: For each triggered preset condition, instruct the terminal to execute the corresponding action.
[0111] The specific implementation methods of steps 401 to 403 are as follows: Figure 2 Steps 204 to 205 in the illustrated embodiment are implemented similarly; please refer to the documentation for details. Figure 2 The relevant parts described in the embodiments will not be repeated here.
[0112] In this embodiment, the command terminal achieves edge intelligence by performing local image analysis on video frames, which can shorten the command decision feedback loop and increase the response speed.
[0113] like Figure 5 As shown, Figure 5This is a schematic diagram of a possible logical structure of a server provided in an embodiment of this application. The server 500 includes a processor 501, a communication interface 502, a memory 503, and a bus 504. The processor 501, communication interface 502, and memory 503 are interconnected via the bus 504. In an embodiment of this application, the processor 501 is used to control and manage the operations of the server 500; for example, the processor 501 is used to execute... Figure 3 The steps in the embodiments and / or other processes used in the techniques described herein. Communication interface 502 is used to support communication by server 500. Memory 503 is used to store program code and data of server 500.
[0114] The processor 501 can be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute various exemplary logic blocks, modules, and circuits described in conjunction with the disclosure of this application. The processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, etc. The bus 504 can be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, Figure 5 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.
[0115] like Figure 6 As shown, in another embodiment of this application, a command terminal 600 is also provided, including a processor 601, a communication interface 602, a memory 603, and a bus 604. The processor 601, the communication interface 602, and the memory 603 are interconnected via the bus 604. In the embodiments of this application, the processor 601 is used to control and manage the actions of the command terminal 600. For example, the processor 601 is used to execute... Figure 4 The steps in the embodiments and / or other processes used in the techniques described herein. Communication interface 602 is used to support communication by the command terminal 600. Memory 603 is used to store the program code and data of the command terminal 600.
[0116] In another embodiment of this application, a computer-readable storage medium is also provided, the computer-readable storage medium including instructions that, when executed on a computer, cause the computer to perform the above-described... Figures 2 to 4The method described in any of the embodiments.
[0117] Those skilled in the art will recognize that the units of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of the invention.
[0118] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0119] In the embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative. For instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual couplings, direct couplings, or communication connections may be through some interfaces; indirect couplings or communication connections between devices or units may be electrical, mechanical, or other forms.
[0120] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0121] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0122] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.
[0123] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention, and they should all be covered within the scope of the claims and specification of the present invention.
Claims
1. An AR tag adaptive rendering method, characterized in that, An AR tag adaptive rendering system is applied, the system including a server and a command terminal; the method includes: The server generates AR tags for target objects based on the received video stream; the AR tags include state variables and a set of behavior rules for the target objects; wherein, the target objects are objects in the video stream that need to be managed in the corresponding command and control scenario, the state variables are attribute variables of the target objects that change over time or events in the command and control scenario, and the behavior rules in the set of behavior rules are used to indicate the actions performed by the command terminal when preset conditions are met, the actions include changing the rendering style of the AR tags or updating the state variables; When the target object is appearing for the first time, the server encodes the AR tag into the supplementary augmentation information (SEI) unit of the video stream; If the target object is not appearing for the first time, the server encodes the AR tag differential data of the current frame and the previous frame in the video stream into the SEI unit; The command terminal extracts the AR tag from the video stream after encoding the SEI unit, and determines whether the preset condition is triggered based on the state variable and the image analysis results of the video frames in the video stream through the local behavior rule engine. For each preset condition that is triggered, the command terminal executes the corresponding action.
2. The method according to claim 1, characterized in that, The server generates AR tags for the target object based on the received video stream, including: The server performs target detection on the video stream to obtain the target object; The server creates and initializes the state variables for the target object based on the command business scenario and the type of the target object; The server compiles the set of behavioral rules for the target object based on the command and control scenario; wherein the preset conditions include conditional expressions that take at least one of the state variables, time triggers, spatial region triggers, and image features of video frames in the video stream as input.
3. The method according to claim 1 or 2, characterized in that, The command terminal extracts the AR tag from the video stream after encoding the SEI unit, and determines whether the preset condition is triggered based on the state variables and the image analysis results of the video frames in the video stream using a local behavior rule engine, including: The command terminal reconstructs the AR tag based on the load of the SEI unit; The command terminal loads the set of behavior rules into the local behavior rule engine based on the AR tag, and updates the key variables in the state context maintained by the local behavior rule engine based on the state variables and the image analysis results. The key variables are used to determine whether the preset conditions are triggered. The command terminal traverses the set of behavior rules, determines the behavior rule triggered by the preset condition based on the latest value of the key variable, and executes the corresponding action.
4. The method according to claim 1 or 2, characterized in that, After the command terminal extracts the AR tag from the video stream after encoding the SEI unit, the method further includes: The command terminal renders and presents the AR tag on the AR interface; The command terminal acquires user input events and the AR tags corresponding to the user input events; The command terminal encapsulates the event type of the user input event and the unique identifier of the AR tag being operated into an interaction trigger object, and injects the interaction trigger object into the local behavior rule engine; When the preset conditions corresponding to the interactive trigger object are met, the command terminal executes the corresponding action.
5. An AR tag adaptive rendering method, characterized in that, Applied to a server; the method includes: An AR tag for a target object is generated based on the received video stream; the AR tag includes a state variable and a set of behavior rules for the target object; wherein, the target object is an object in the video stream that needs to be managed in the corresponding command and control scenario, the state variable is an attribute variable of the target object that changes over time or events in the command and control scenario, and the behavior rules in the set of behavior rules are used to indicate the actions performed by the command terminal when preset conditions are met, the actions include changing the rendering style of the AR tag or updating the state variable; When the target object is appearing for the first time, the AR tag is encoded into the supplementary enhancement information (SEI) unit of the video stream; If the target object is not appearing for the first time, the AR tag differential data between the current frame and the previous frame in the video stream is encoded into the SEI unit; The video stream encoded by the SEI unit is transmitted to the command terminal.
6. An AR tag adaptive rendering method, characterized in that, Applied to a command terminal; the method includes: The system receives a video stream sent by a server. The supplementary enhancement information (SEI) unit of the video stream includes an encoded AR tag. The AR tag includes a state variable of the target object and a set of behavior rules. The target object is an object in the video stream that needs to be managed in the corresponding command and control scenario. The state variable is an attribute variable of the target object that changes over time or events in the command and control scenario. The behavior rules in the set of behavior rules are used to indicate the actions performed by the command terminal when preset conditions are met. The actions include changing the rendering style of the AR tag or updating the state variable. The AR tag is extracted from the SEI unit, and the preset condition is determined by the local behavior rule engine based on the state variable and the image analysis results of the video frames in the video stream. For each preset condition that is triggered, execute the corresponding action.
7. An AR tag adaptive rendering system, characterized in that, The system includes a server and a command terminal; The server is used to generate AR tags for target objects based on the received video stream; the AR tags include state variables and a set of behavior rules for the target objects; wherein, the target objects are objects in the video stream that need to be managed in the corresponding command and control scenario, the state variables are attribute variables of the target objects that change over time or events in the command and control scenario, and the behavior rules in the set of behavior rules are used to indicate the actions performed by the command terminal when preset conditions are met, the actions include changing the rendering style of the AR tags or updating the state variables; In the case where the target object is appearing for the first time, the server is also used to encode the AR tag into the supplementary enhancement information (SEI) unit of the video stream; If the target object is not appearing for the first time, the server is also used to encode the AR tag differential data of the current frame and the previous frame in the video stream into the SEI unit; The command terminal extracts the AR tag from the video stream after encoding the SEI unit, and determines whether the preset condition is triggered based on the state variable and the image analysis results of the video frames in the video stream through the local behavior rule engine. The command terminal is also used to execute the actions corresponding to each triggered preset condition.
8. A server, characterized in that, include: Memory, used to store programs; A processor for loading the program to execute the method as described in claim 5.
9. A command terminal, characterized in that, include: Memory, used to store programs; A processor for loading the program to execute the method as described in claim 6.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes a stored program, wherein, when the program is executed, it controls the device on which the computer-readable storage medium is located to perform the method of any one of claims 1-6.