Method and apparatus for rendering streaming content, computer device, storage medium
By performing incremental rendering on streaming content, generating a signature digest of the incremental update unit and converting it into a document object model, the problems of stuttering and accuracy in streaming content rendering are solved, achieving efficient and stable rendering results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2026-04-27
- Publication Date
- 2026-06-12
AI Technical Summary
Existing streaming content rendering methods suffer from problems such as stuttering, unresponsive scrolling, and slow input response in high-frequency update scenarios. In particular, the high parsing and rendering costs when dealing with tables, long code blocks, and image links lead to a decrease in rendering accuracy and performance.
By acquiring the text information of the streaming content, splitting it to generate an incremental update unit sequence, generating an incremental update patch based on the signature digest, and converting it into a document object model for incremental rendering, avoiding full re-rendering and only updating the changed paragraphs.
It improves the accuracy and performance of streaming content rendering, ensuring efficient rendering even in high-frequency update scenarios, reducing jitter and flicker, and enhancing rendering efficiency and quality.
Smart Images

Figure CN122196288A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a method, apparatus, computer device, storage medium, and computer program product for rendering streaming content. Background Technology
[0002] With the development of computer and internet technologies, more and more AI dialogue messages are being output using streaming communication, resulting in continuous updates of the Markdown text for the same message. As large-scale model applications become more widespread, front-ends need to support longer and more complex rich text content in dialogue products. Markdown has become the de facto standard: it can simultaneously express headings, lists, tables, code blocks, and citation links, and can be extended with custom blocks. On the other hand, streaming output is a key interaction method for reducing waiting time and enhancing the "generating experience": users expect to see the first screen of content within 1-2 seconds, and then read it as it is generated.
[0003] However, current streaming content rendering methods typically parse and re-render the entire content with each update. As the content becomes longer and the update frequency increases, noticeable stuttering, unresponsive scrolling, slow input response, and delays in copying or clicking occur. This is especially true when AI output includes tables, long code blocks, image links, and source citations, where the parsing and rendering costs are even higher, leading to more pronounced stuttering. Therefore, how to effectively improve the accuracy of streaming content rendering while maintaining high performance has become a pressing issue. Summary of the Invention
[0004] Therefore, it is necessary to provide a method, apparatus, computer device, computer-readable storage medium, and computer program product for rendering streaming content, which can ensure high performance of streaming content rendering while effectively improving the accuracy of streaming content rendering. Even in high-frequency update scenarios, it can simultaneously take into account the correctness and performance of streaming content rendering, thereby comprehensively improving the efficiency and quality of streaming content rendering.
[0005] Firstly, this application provides a method for rendering streaming content. The method includes: acquiring text information for updating streaming content; splitting the text information to obtain an incremental update unit sequence; generating a signature digest for each incremental update unit in the incremental update unit sequence that is used to independently update the streaming content; generating an incremental update patch based on the signature digests of each incremental update unit; converting the incremental update patch into a document object model for rendering the streaming content; and incrementally rendering the streaming content based on the document object model.
[0006] Secondly, this application also provides a rendering apparatus for streaming content. The apparatus includes: an acquisition module for acquiring text information used to update streaming content; a processing module for splitting the text information to obtain an incremental update unit sequence; a generation module for generating a signature digest of each incremental update unit in the incremental update unit sequence that is used to independently update the streaming content; and generating an incremental update patch based on the signature digests of each incremental update unit; a conversion module for converting the incremental update patch into a document object model for rendering the streaming content; and a rendering module for incrementally rendering the streaming content based on the document object model.
[0007] Thirdly, this application also provides a computer device. The computer device includes a memory and a processor. The memory stores a computer program, and the processor, when executing the computer program, performs the following steps: acquiring text information for updating streaming content; splitting the text information to obtain an incremental update unit sequence; generating a signature digest for each incremental update unit in the incremental update unit sequence used to independently update the streaming content; generating an incremental update patch based on the signature digests of each incremental update unit; converting the incremental update patch into a document object model for rendering the streaming content; and incrementally rendering the streaming content based on the document object model.
[0008] Fourthly, this application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program thereon, which, when executed by a processor, performs the following steps: acquiring text information for updating streaming content; splitting the text information to obtain an incremental update unit sequence; generating a signature digest for each incremental update unit in the incremental update unit sequence used to independently update the streaming content; generating an incremental update patch based on the signature digests of each incremental update unit; converting the incremental update patch into a document object model for rendering the streaming content; and incrementally rendering the streaming content based on the document object model.
[0009] Fifthly, this application also provides a computer program product. The computer program product includes a computer program that, when executed by a processor, performs the following steps: acquiring text information for updating streaming content; splitting the text information to obtain an incremental update unit sequence; generating a signature digest for each incremental update unit in the incremental update unit sequence used to independently update the streaming content; generating an incremental update patch based on the signature digests of each incremental update unit; converting the incremental update patch into a document object model for rendering the streaming content; and incrementally rendering the streaming content based on the document object model.
[0010] The aforementioned method, apparatus, computer device, storage medium, and computer program product for rendering streaming content obtain an incremental update unit sequence by acquiring text information used to update the streaming content and splitting the text information; further, based on each incremental update unit in the incremental update unit sequence used to independently update the streaming content, a signature digest of each incremental update unit is generated, and an incremental update patch is generated based on the signature digest of each incremental update unit; further, the incremental update patch is converted into a document object model for rendering the streaming content, and incremental rendering of the streaming content is performed based on the document object model. Since the text information in this application's technical solution is the full amount of text information obtained for updating streaming content, the semantics contained in the split incremental update unit sequence obtained based on the full amount of text information are more accurate. Furthermore, based on the signature digest of each incremental update unit used to independently update streaming content in the incremental update unit sequence, the change range can be quickly located. That is, the incremental update patch calculated through the signature digest is also more accurate. Only the changed paragraphs need to be incrementally updated, and the generated paragraphs are not destroyed and rebuilt, effectively avoiding the jitter and flicker caused by full re-rendering. While ensuring the high performance of streaming content rendering, it can also effectively improve the accuracy of streaming content rendering. Even in high-frequency update scenarios, it can simultaneously take into account the correctness and performance of streaming content rendering, thereby comprehensively improving the efficiency and quality of streaming content rendering. Attached Figure Description
[0011] Figure 1 This is a diagram illustrating the application environment of a streaming content rendering method in one embodiment.
[0012] Figure 2 This is a flowchart illustrating a method for rendering streaming content in one embodiment;
[0013] Figure 3 This is a schematic diagram illustrating an application scenario of the method provided in one embodiment being applied to an AI conversational product.
[0014] Figure 4This is an architecture diagram of a multi-threaded collaborative processing method for rendering streaming content provided in one embodiment;
[0015] Figure 5 This is a flowchart illustrating the main thread processing in one embodiment;
[0016] Figure 6 This is a schematic diagram of the Web Worker processing flow in one embodiment;
[0017] Figure 7 This is a schematic diagram illustrating a linear scan of a token sequence in one embodiment;
[0018] Figure 8 This is a schematic diagram illustrating an application scenario of the method provided in one embodiment in an AI dialogue product within the gaming industry.
[0019] Figure 9 This is a schematic diagram illustrating an application scenario of the method provided in one embodiment in an AI dialogue product within the writing field.
[0020] Figure 10 A timing diagram of a streaming content rendering method provided in one embodiment;
[0021] Figure 11 This is a structural block diagram of a rendering apparatus for streaming content in one embodiment;
[0022] Figure 12 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation
[0023] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0024] It should be noted that in the following description, the terms "first, second, and third" are used only to distinguish similar objects and do not represent a specific ordering of objects. It is understood that "first, second, and third" may be interchanged in a specific order or sequence where permitted, so that the embodiments of this application described herein can be implemented in an order other than that illustrated or described herein.
[0025] The streaming content rendering method provided in this application embodiment can be applied to, for example, Figure 1In the application environment shown, terminal 102 communicates with server 104 via a network. A data storage system can store the data that server 104 needs to process. The data storage system can be integrated onto server 104 or placed on a cloud or other network server. Terminal 102 obtains text information for updating streaming content and sends it to server 104, allowing server 104 to split the text information to obtain an incremental update unit sequence. Based on each incremental update unit in the sequence used for independently updating streaming content, server 104 generates a signature digest for each incremental update unit. Further, server 104 can generate incremental update patches based on the signature digests of each incremental update unit, convert the incremental update patches into a document object model (DOM) for rendering streaming content, and return the DOM to terminal 102. That is, upon receiving the DOM returned by server 104, terminal 102 performs incremental rendering of the streaming content based on the DOM.
[0026] Terminal 102 can be a smartphone, tablet, laptop, desktop computer, smart speaker, smart TV, smartwatch, IoT device, or portable wearable device. IoT devices can include smart in-vehicle devices, etc. Portable wearable devices can include smartwatches, smart bracelets, head-mounted devices, etc.
[0027] Server 104 can be an independent physical server or a service node in a blockchain system. The service nodes in the blockchain system form a peer-to-peer (Peer To Peer) network. The Peer To Peer protocol is an application layer protocol that runs on top of the Transmission Control Protocol (TCP).
[0028] In addition, server 104 can also be a server cluster consisting of multiple physical servers, which can be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery network (CDN), and big data and artificial intelligence platforms.
[0029] Terminal 102 and server 104 can be connected via Bluetooth, USB (Universal Serial Bus) or network, etc., and this application does not impose any restrictions.
[0030] In one embodiment, such as Figure 2As shown, a method for rendering streaming content is provided. This method can be executed by the server or the terminal alone, or by both the server and the terminal. This method can be applied to... Figure 1 Taking the terminal in the example, the explanation includes the following steps:
[0031] Step 202: Obtain text information for updating streaming content.
[0032] In this application, "text information" refers to textual descriptions used to update streaming content. For example, the text information can be streaming fragmented text, where streaming fragmentation refers to the transmission format where the output of an AI model is split into multiple small segments (i.e., streaming fragmented text) and delivered to the client sequentially. Each fragment (streaming fragmented text) may contain only newly added text, or it may contain control tags, special blocks (such as the start and end of code blocks), or the full text concatenated by the backend. Another example is that the streaming fragmented text in this application can be Markdown text in code form. Markdown text refers to rich text descriptions that conform to Markdown syntax (which may include extended syntax such as tables, task lists, code blocks, and optional native HTML). In AI scenarios, Markdown text is often used to simultaneously carry natural language descriptions, tabular data, code, links, and source citations.
[0033] Streaming content refers to data content that is generated or transmitted segment by segment in chronological order, rather than being provided all at once. It is commonly found in scenarios such as text, audio / video, and data streams. For example, the streaming content in this application can be rendered word-by-word or block-by-block: the content is presented gradually like a "flowing stream," such as the word-by-word display of AI Q&A or the segmented transmission of live video. It is particularly suitable for scenarios requiring real-time interaction, such as large-scale model dialogues, online customer service, and real-time detection. Memory and performance optimization: Avoid loading large amounts of data at once, reducing page jitter and memory usage. Figure 3 The diagram shown illustrates an application scenario where the method provided in this application is used in AI dialogue products. Figure 3 As shown, the streaming content rendering method provided in this application can maintain the stability of historical content in "generating and displaying simultaneously" and update new content locally; it also supports interactive operations (copying, referencing, and jumping) combined with streaming rendering without having to bind events repeatedly to each node; in addition, it supports consistent functional performance across multiple platforms (PC / mobile / mini-program) and adapts to restricted environments.
[0034] AI streaming communication / streaming output: This refers to the front end continuously receiving fragmented text or structured fragments of output from AI models via methods such as SSE (Server-Sent Events), WebSocket, and Fetch-Stream. On the front end, this manifests as the same message content being continuously appended and updated, with its status constantly changing (generating → completed / interrupted / failed). This is typically achieved through WebSocket, SSE (Server-Sent Events), or HTTP streaming responses.
[0035] Step 204: The text information is split to obtain an incremental update unit sequence.
[0036] Among them, the incremental update unit sequence refers to the sequence composed of incremental update units.
[0037] An incremental update unit refers to a block-level semantic unit used to independently update streaming content. In some cases, the incremental update unit in this application can also be called a segment. For example, the incremental update unit in this application can be the smallest incremental update unit obtained by dividing the text information (such as Markdown content) output by the AI model according to the "top-level block structure". That is, one incremental update unit (segment) in this application usually corresponds to one top-level block: paragraph, list block, quotation block, table block, code block, etc. The goal of segment design is to enable subsequent generated incremental update patches (DOM patches) to be inserted or deleted directly according to child node indices, thereby converging the rendering impact to a local area without the need for global re-rendering.
[0038] Specifically, the streaming content rendering method provided in this application can be widely applied to various scenarios that support real-time interaction and updating streaming content, such as AI dialogue products and data assistant products. For example, in scenario one (PC / large screen): a user initiates a "generate data analysis report" request to the AI, and the AI model streams an output containing multi-paragraph explanations, multiple tables, and SQL / code blocks. Requirements: simultaneous generation and display, horizontally scrollable tables, one-click copying of code blocks, and smooth overall scrolling.
[0039] Scenario 2 (Mobile / Mini Program): When a user views the same streaming output content on a mobile device, the content includes a source reference. When the user needs to click on the reference, the mini program should provide an interaction of "copying the link and prompting the user to open it in a browser". Due to the weaker performance of mobile devices, it is even more necessary to use the incremental rendering method of streaming content provided in this application to avoid lag.
[0040] Scenario 3 (Weak Network / Reconnection): During the generation of streaming content, network jitter causes a disconnection. The front end needs to continue displaying the generated content and continue to append without flickering after reconnection; in case of failure, display an error message and keep the generated paragraphs intact.
[0041] In such Figure 3 In the illustrated application scenario, the devices used by different users (operating objects) can all interact with the multimedia information platform (or application). When a user (operating object) wants to use an AI dialogue product application, they can trigger a target operation on the application (such as a launch operation) to open the AI dialogue product application (APP) on their terminal. The terminal can then display prompts or icons for launching the dialogue function on the application page of the AI dialogue product application opened by the user, allowing the user to trigger data acquisition requests for specific tasks within the application page, such as... Figure 3 The image shown is a web-based illustration of an application page that includes a top navigation area, a message list area, and an input area. Users can click on elements such as... Figure 3 The terminal will respond to the user's input box icon as shown, and the user will enter the text "Generate Data Analysis Report". Figure 3 The information input operation triggered in the application page (such as the conversation page) shown in the figure obtains the input text corresponding to the information input operation as "Generate data analysis report", and processes the input text "Generate data analysis report" through the AI model to obtain the text information (such as Markdown text) used to update the streaming content.
[0042] In one embodiment, streaming content is displayed on a session page; obtaining text information for updating the streaming content includes: the terminal responding to an information input operation triggered on the session page to obtain text information for updating the streaming content; the method further includes: the terminal displaying the streaming content in the form of a session message on the session page; wherein the streaming content includes at least text data and image data.
[0043] In one embodiment, the text information is rich text description content; the step of obtaining text information for updating streaming content in response to an information input operation triggered in the session page includes: the terminal responding to an information input operation triggered in the session page, obtaining the input text corresponding to the information input operation, and processing the input text through an AI model to obtain rich text description content for updating streaming content.
[0044] Furthermore, after the terminal obtains the text information used to update the streaming content, it can break down the obtained text information to obtain an incremental update unit sequence. For example, after the terminal receives the text information continuously output by the AI model for updating the streaming content through the main thread, it can send an update request carrying the text information to a dedicated computation thread for logical calculation. After receiving the update request carrying the text information from the main thread, the computation thread can break down the text information to obtain the incremental update unit sequence.
[0045] In one embodiment, the text information is rich text description content; after obtaining the text information for updating the streaming content, the method further includes: terminating the process when the terminal detects that the second thread is not idle through the first thread; sending an update request for updating the streaming content to the second thread when the terminal detects that the second thread is idle through the first thread; the update request carries the full amount of rich text description content and an update identifier for marking the streaming content; the step of splitting the text information to obtain an incremental update unit sequence includes: when the second thread receives the update request, the terminal splits the rich text description content based on the update identifier through the second thread to obtain the incremental update unit sequence.
[0046] In one embodiment, the rich text description content is split based on the update identifier using a second thread to obtain an incremental update unit sequence. This includes: the terminal parses the rich text description content based on the update identifier using a second thread to obtain a structured tag sequence, hierarchical information, and nesting information; the hierarchical information and nesting information are used to reflect the hierarchical structural relationship between the tags contained in the tag sequence; the terminal splits the tag sequence based on the hierarchical information and nesting information using a second thread to obtain an incremental update unit sequence.
[0047] In one embodiment, the step of splitting the marker sequence based on the hierarchical information and the nesting information to obtain an incremental update unit sequence includes: the terminal splitting the marker sequence based on the hierarchical information and the nesting information through a second thread to obtain a block set, and generating an incremental update unit sequence based on the block set; each incremental update unit in the incremental update unit sequence is a block-level semantic unit.
[0048] In one embodiment, the marker sequence is split based on hierarchical and nesting information to obtain a block set, including: the terminal performs a linear scan of the marker sequence through a second thread; when a first marker is scanned, and the identifier of the first marker is a start identifier, the hierarchical information of the first marker is a preset value, and the nesting information of the first marker is a first value, the first marker is taken as the starting point of the block; when a second marker is scanned, and the identifier of the second marker is an end identifier, the second marker is taken as the ending point of the block, thus obtaining a block; or, when a third marker is scanned, and the nesting information of the third marker is a second value, the third marker is taken as a block; or, when a fourth marker is scanned, and the fourth marker belongs to an abnormal structure, the fourth marker is taken as a block, and the blocks obtained by scanning are combined into a block set.
[0049] For example, to be applied to, such as Figure 3 The following example illustrates the real-time interaction scenario of an AI dialogue product. Figure 4 The diagram shown illustrates the multi-threaded collaborative processing architecture of the streaming content rendering method provided in this application. Assume the user clicks on... Figure 3 The terminal will respond to the user's input box icon as shown, and the user will enter the text "Basic statistical indicators for analyzing Agricultural Bank of China stock data". Figure 3 The information input operation triggered in the application page (such as a session page) shown in the example obtains the input text corresponding to the information input operation as "basic statistical indicators for analyzing Agricultural Bank of China stock data". After processing the input text "basic statistical indicators for analyzing Agricultural Bank of China stock data" using an AI model, the text information used to update the streaming content is obtained as Markdown text. The terminal can then use methods such as... Figure 4 The main thread (i.e., the UI thread) shown detects as follows: Figure 4 The state of the Web Worker (computation thread) shown, for example, when the Web Worker (computation thread) is in an idle state, the terminal can access it via, as shown in the example... Figure 4 The main thread (i.e., the UI thread) shown sends an update request to the Web Worker (computation thread) to update the streaming content. The update request carries the full Markdown text and an update identifier to mark the streaming content that needs to be updated. When the Web Worker (computation thread) receives the update request (e.g., carrying id, fullContent), it splits the Markdown text based on the update identifier id to obtain an incremental update unit sequence.
[0050] Step 206: Generate a signature digest for each incremental update unit based on each incremental update unit in the incremental update unit sequence used to independently update the streaming content.
[0051] The signature digest is a digest generated based on the key fields within the incremental update unit, used to quickly determine whether the incremental update unit has changed. In other words, the signature digest in this application can be used to quickly locate the change range of streaming content. For example, the signature digest in this application does not require encryption and is only used for efficient comparison. Furthermore, this application may preferably use a dual 32-bit rolling hash (or an equivalent stable digest) as the signature digest to reduce the risk of collisions.
[0052] Step 208: Generate incremental update patches based on the signature digests of each incremental update unit.
[0053] Incremental update patch (Patch) refers to the minimum set of operations required to update from the historical incremental update unit sequence to the current incremental update unit sequence. For example, in some cases, the incremental update patch in this application refers to the minimum set of operations required to update from the old segment list to the new segment list. The incremental update patch (Patch) in the technical solution of this application can take the form of "deleting deleteCount segments starting from start, and then inserting insertHtml[]", which corresponds one-to-one with the DOM structure of the top-level segment, making it easy for the main thread to quickly execute the fewest DOM operations.
[0054] Step 210: Convert the incremental update patch into a document object model for rendering streaming content.
[0055] The Document Object Model (DOM) is a cross-platform programming interface standard developed by the W3C for handling access to and manipulation of HTML and XML documents. It describes the hierarchical relationship between document nodes through a tree structure.
[0056] The DOM parses a document into a node tree containing elements such as tags and text. Each node corresponds to a specific object and provides operation methods and properties, supporting dynamic modification of the document structure, content, and style. Its core interfaces include Node and Document, and are divided into two standards: HTML DOM and XML DOM, which provide programmatic control capabilities for different document types.
[0057] Step 212: Incremental rendering of streaming content based on the document object model.
[0058] Incremental rendering refers to performing DOM insertion / deletion / replacement operations only on the changed incremental update units (segments) when content is updated, without reconstructing the rendering result of the entire Markdown text. It can be understood that incremental rendering in this application is not equivalent to "only transmitting incremental text," but rather "transmitting the full text but calculating incremental update patches at the structural layer" to ensure the semantic correctness of incremental rendering.
[0059] Specifically, the terminal breaks down the text information used to update the streaming content into an incremental update unit sequence. Based on each incremental update unit in the sequence that independently updates the streaming content, the terminal generates a signature digest for each incremental update unit and verifies the correctness of the signature digests to obtain a verification result. Further, if the verification result indicates that the verification passed, the terminal generates an incremental update patch based on the signature digests of each incremental update unit and converts the incremental update patch into structured data for rendering the streaming content, namely a Document Object Model (DOM patch). The terminal can then directly apply the DOM patch to... Figure 3 Incremental rendering of the streaming content shown in the image allows for the acquisition of updated streaming content, enabling features such as... Figure 3 The dialogue scenario shown maintains historical content stability while updating new content in a localized manner during the "generating and displaying" process.
[0060] In one embodiment, based on each incremental update unit in the incremental update unit sequence used to independently update the streaming content, a signature digest of each incremental update unit is generated, including: the terminal extracts key fields within each incremental update unit and mixes the key fields into a rolling hash of a preset number of bits in a preset order to obtain a text digest; the text digest is a signature digest; wherein, the signature digest is used to quickly locate the change range of the streaming content.
[0061] In one embodiment, the method further includes: verifying the correctness of the signature digest to obtain a verification result; generating an incremental update patch based on the signature digest of each incremental update unit, including: generating an incremental update patch based on the hierarchical relationship between the signature digest and each incremental update unit when the verification result indicates that the verification has passed.
[0062] In one embodiment, an incremental update patch is generated based on the signature digest and the hierarchical relationship between each incremental update unit, including: for each incremental update unit, the terminal obtains the signature digest of the current incremental update unit and the signature digest of the previous incremental update unit through a second thread; and through the second thread, compares the signature digest of the current incremental update unit and the signature digest of the previous incremental update unit to obtain a comparison result, and generates an incremental update patch based on the comparison result.
[0063] In one embodiment, the signature digest of the current incremental update unit is compared with the signature digest of the previous incremental update unit to obtain a comparison result. This includes: the terminal performs prefix and suffix scanning on the signature digest of the current incremental update unit and the signature digest of the previous incremental update unit to obtain the head difference position of the forward scan and the tail difference position of the reverse scan, and obtains a description result for indicating the deleted or inserted content based on the head difference position and the tail difference position; the description result is the comparison result.
[0064] In one embodiment, converting an incremental update patch into a document object model for rendering streaming content includes: the terminal converting the incremental update patch into a document object model for rendering the streaming content via a second thread; and performing incremental rendering of the streaming content based on the document object model includes: the terminal sending the document object model to a first thread via the second thread, so that the first thread performs incremental rendering of the streaming content based on the document object model.
[0065] In one embodiment, incremental rendering of streaming content based on a document object model includes: the terminal adding the document object model to the callback function of the target interface through a first thread, and when a call request to the target interface is detected, performing an operation to write to the document object model through the target interface to obtain the updated content of the streaming content.
[0066] In one embodiment, after incrementally rendering the streaming content based on the document object model, the method further includes: if the document object model is the end content of the streaming content, the terminal terminates the process; if the document object model is not the end content of the streaming content, the terminal continues to send an update request for updating the streaming content to the second thread through the first thread.
[0067] In one embodiment, incremental rendering of streaming content based on a document object model includes: the terminal performing an editing operation based on the document object model through a first thread to obtain updated content of the streaming content; wherein the editing operation includes at least one of an insertion operation, a deletion operation, and a replacement operation, and the updated content is displayed after the generated streaming content.
[0068] In one embodiment, incremental rendering of streaming content based on document object models includes: when the number of document object models is not a preset value and at least two document object models belong to the same frame content, the terminal caches at least two document object models, obtains the target document object model from the at least two document object models, deletes a preset number of nodes at the starting position based on the description in the target document object model, and inserts new nodes to obtain the updated content of the streaming content.
[0069] In one embodiment, incremental rendering of streaming content based on the document object model includes: the terminal calling a preset function to limit the redraw spread, and incrementally rendering the streaming content through the preset function so that the redraw range of the streaming content is concentrated in the incremental area.
[0070] For example, to be applied to, such as Figure 3 The following example illustrates a real-time interaction scenario in an AI dialogue product. Assume the terminal responds to the user's input as shown in the image. Figure 3 The information input operation triggered in the application page (such as a session page) shown in the example obtains the input text corresponding to the information input operation as "basic statistical indicators for analyzing Agricultural Bank of China stock data". After processing the input text "basic statistical indicators for analyzing Agricultural Bank of China stock data" using an AI model, the text information used to update the streaming content is obtained as Markdown text. The terminal can then use methods such as... Figure 4 The main thread (i.e., the UI thread) sends an update request to the WebWorker (computation thread) to update the streaming content. The update request carries the full Markdown text and an update identifier to mark the streaming content that needs to be updated. When the WebWorker (computation thread) receives the update request (e.g., carrying id, fullContent), it splits the Markdown text based on the update identifier id to obtain an incremental update unit sequence. The terminal can then extract the key fields in each incremental update unit (segment) from the incremental update unit sequence through the WebWorker (computation thread) and mix the key fields into a rolling hash of a preset number of bits in a preset order to obtain the text digest of each incremental update unit (segment). This text digest is a signature digest.
[0071] Furthermore, the terminal can use Web Workers (computing threads) to verify the correctness of the signature digests of each incremental update unit (segment), obtain the verification result, and if the verification result indicates that the verification is successful, obtain the signature digest nextSegSigs of the current incremental update unit (segment) and the signature digest prevSegSigs of the previous incremental update unit (segment), compare the signature digest nextSegSigs of the current incremental update unit (segment) and the signature digest prevSegSigs of the previous incremental update unit (segment) to obtain the comparison result diff patch, and generate an incremental update patch based on the comparison result diff patch.
[0072] Furthermore, the terminal can use Web Workers (computation threads) to convert incremental update patches into Document Object Model (DOM) patches for rendering streaming content, and then send the DOM patch and update identifier (id) to a server such as... Figure 4 The main thread shown (i.e., the UI thread) is used to incrementally render streaming content based on DOMpatch.
[0073] In this embodiment, text information used to update streaming content is obtained and the text information is split to obtain an incremental update unit sequence. Further, based on each incremental update unit in the incremental update unit sequence used to independently update streaming content, a signature digest of each incremental update unit is generated, and an incremental update patch is generated based on the signature digest of each incremental update unit. Further, the incremental update patch is converted into a document object model for rendering streaming content, and incremental rendering of streaming content is performed based on the document object model. Since the text information in this application's technical solution is the full amount of text information obtained for updating streaming content, the semantics contained in the split incremental update unit sequence obtained based on the full amount of text information are more accurate. Furthermore, based on the signature digest of each incremental update unit used to independently update streaming content in the incremental update unit sequence, the change range can be quickly located. That is, the incremental update patch calculated through the signature digest is also more accurate. Only the changed paragraphs need to be incrementally updated, and the generated paragraphs are not destroyed and rebuilt, effectively avoiding the jitter and flicker caused by full re-rendering. While ensuring the high performance of streaming content rendering, it can also effectively improve the accuracy of streaming content rendering. Even in high-frequency update scenarios, it can simultaneously take into account the correctness and performance of streaming content rendering, thereby comprehensively improving the efficiency and quality of streaming content rendering.
[0074] In one embodiment, the text information is rich text description content; after obtaining the text information used to update the streaming content, the method further includes:
[0075] The process terminates if the first thread detects that the second thread is not idle.
[0076] If the first thread detects that the second thread is idle, an update request for updating the streaming content is sent to the second thread; the update request carries the full amount of rich text description content and an update identifier for marking the streaming content.
[0077] The step of splitting the text information to obtain an incremental update unit sequence includes:
[0078] When the second thread receives an update request, it splits the rich text description content based on the update identifier to obtain an incremental update unit sequence.
[0079] In this application, the first thread and the second thread are only used to distinguish different threads. For example, the first thread can be as follows: Figure 4 The main thread shown is the UI thread, and the second thread can be as follows: Figure 4 The Web Worker (computation thread) shown is a separate thread used for computation.
[0080] Rich text descriptions refer to rich text descriptions that conform to Markdown syntax (which may include extended syntax such as tables, task lists, code blocks, and optional native HTML), and can also be called Markdown content or Markdown text.
[0081] An update identifier is an identifier used to mark the streaming content to be updated. For example, the update identifier in this application can be automatically generated in an auto-incrementing manner for each streaming segment text of the streaming output. For example, the update identifier for the first streaming segment text of streaming content A can be denoted as id001, the update identifier for the second streaming segment text can be denoted as id002, and so on, until all the content of streaming content A is updated.
[0082] Single-flight refers to a situation where at most one Web Worker computation task is executed at any given time. New content updates are not queued and accumulated, but are overwritten as "latest content" and processed only when the Web Worker is idle, thus avoiding the snowball effect of delays. This strategy in this application is preferably used in "high-frequency streaming update" scenarios.
[0083] Specifically, for application such as Figure 3 The following example illustrates a real-time interaction scenario in an AI dialogue product. Assume the terminal responds to the user's input as shown in the image. Figure 3 The information input operation triggered in the application page (such as a session page) shown above obtains the input text corresponding to the information input operation as "basic statistical indicators for analyzing Agricultural Bank of China stock data". Then, an AI model processes the input text "basic statistical indicators for analyzing Agricultural Bank of China stock data" and outputs rich text descriptions to update the streaming content. For example... Figure 5 As shown, this is a flowchart illustrating the main thread's processing flow. The terminal can be accessed via, for example... Figure 4The main thread (i.e., the UI thread) shown receives rich text descriptions from the AI model to update the streaming content and detects the state of the second thread, the Web Worker (computation thread). Specifically, it checks whether the Web Worker is in an "inFlight" state (i.e., a "Single-flight" state). If the second thread (Web Worker) is not idle (i.e., in an "inFlight" or "Single-flight" state), the terminal can... Figure 4 The main thread (i.e., the UI thread) shown in the diagram terminates the process directly, waiting for the Web Worker (computation thread) currently executing the computation task to complete its task. If the second thread (Web Worker) is detected to be idle (i.e., not in-Flight or not in Single-Flight), the terminal can... Figure 4 The main thread (i.e., the UI thread) sends an update request (update(id, fullContent)) to the second thread, i.e., the Web Worker (computation thread), to update the streaming content. The update request carries the full rich text description content (fullContent) and an update identifier (id) used to mark the streaming content. When the Web Worker (computation thread) receives the update request (update(id, fullContent), it splits fullContent based on the update identifier (id) to obtain an incremental update unit sequence.
[0084] In this embodiment, the "full Markdown text" is still used as the parsing input to ensure semantic correctness. However, differential processing is performed at the structure layer (tokens / segments) to avoid full redrawing every time. The CPU-intensive steps (parsing, tokenization, signing, and diffing) are placed in the Web Worker for execution. The main thread only performs the minimum DOM operations corresponding to the diff results. Redrawing propagation is reduced by framing and rendering isolation. Main thread blocking is significantly reduced, and input and scrolling are smoother. Even in high-frequency update scenarios, the correctness and performance of streaming content rendering can be taken into account at the same time, thereby comprehensively improving the efficiency and quality of streaming content rendering.
[0085] In one embodiment, the step of splitting the rich text description content based on the update identifier using a second thread to obtain an incremental update unit sequence includes:
[0086] The second thread parses the rich text description based on the update identifier to obtain a structured tag sequence, hierarchical information, and nesting information. The hierarchical information and nesting information are used to reflect the hierarchical structural relationship between the tags contained in the tag sequence.
[0087] The marker sequence is split based on hierarchical and nested information to obtain the incremental update unit sequence.
[0088] In this application, the structured token sequence can be a token sequence. For example, in this application, the terminal can parse the full content of the rich text description through a second thread to obtain the structured token sequence. Each token describes a syntactic unit (such as a paragraph, list, heading, code block, link, etc.) and may contain information such as level / nesting and attributes. The token sequence can be used for subsequent structured comparison, differencing, and rendering.
[0089] Specifically, such as Figure 5 As shown, the terminal uses, as Figure 4 As shown, after the main thread (i.e., the UI thread) sends an update request `update(id, fullContent)` to the second thread (i.e., the Web Worker, the computation thread) to update the streaming content, upon receiving the update request `update(id, fullContent)`, the following occurs: Figure 6 The diagram illustrates the Web Worker processing flow. The terminal, through a Web Worker (computation thread), parses the rich text description content `fullContent` using a Markdown parser based on the updated identifier `id`, i.e., `tokens = markdown-it.parse(fullContent)`. This yields a structured token sequence, a `token` sequence, hierarchical information `level`, and nesting information `nesting`. The `level` and `nesting` information reflect the hierarchical relationship between the tokens within the token sequence. Further, as... Figure 6The processing flow shown illustrates that the terminal uses Web Workers (computation threads) to split the token sequence based on hierarchical and nesting information. For example, `splitTopLevelRanges(tokens) → ranges` yields an incremental update unit sequence, which is composed of segments, where `segments = tokens[r.start..r.end]` (iterating through each range). This allows for the decoupling of the main thread by migrating CPU-intensive tasks such as Markdown parsing, tokenization, signature calculation, and diffing to Web Workers. This significantly reduces main thread blocking, resulting in smoother input and scrolling, and a substantial performance improvement.
[0090] In one embodiment, the step of splitting the marker sequence based on hierarchical and nesting information to obtain an incremental update unit sequence includes:
[0091] The marked sequence is split based on hierarchical and nesting information to obtain a block set;
[0092] An incremental update unit sequence is generated based on the block set; each incremental update unit in the incremental update unit sequence is a block-level semantic unit.
[0093] Here, the block set refers to the set composed of blocks. For example, the block in this application can also be called the incremental update unit, represented as segments, segments = tokens [r.start..r.end] (traversing each range).
[0094] Specifically, such as Figure 6As shown, the terminal uses Web Workers (computation threads) to split the token sequence based on hierarchical information (level) and nesting information (nesting). For example, `splitTopLevelRanges(tokens) → ranges` yields segments = tokens [r.start..r.end] (traversing each range). This involves a linear scan of the token sequence based on hierarchical and nesting information, resulting in the split segments. Furthermore, the terminal uses Web Workers to combine these segments into a set K, and generates an incremental update unit sequence based on this set K. Each incremental update unit segment in this sequence is a block-level semantic unit. This allows the main thread to perform only lightweight DOM patching by migrating Markdown parsing / tokenization and differential processing to the Worker, effectively reducing blocking. Meanwhile, by using segment structural differentiation, which elevates the update granularity to the top-level block-level semantic unit and avoids rebuilding unchanged segments, the amount of DOM changes is significantly reduced. Even in high-frequency update scenarios, it effectively avoids preemption of the event loop by user interaction, and effectively avoids the problems of input lag, scrolling frame drops, and unresponsive clicks that are easy to occur in traditional technologies. It can not only ensure the high performance of streaming content rendering, but also effectively improve the accuracy of streaming content rendering. Even in high-frequency update scenarios, it can achieve smoother scrolling and interaction, and maintain a usable experience on mobile devices or mini-programs.
[0095] In one embodiment, the step of splitting the marker sequence based on hierarchical and nesting information to obtain a block set includes:
[0096] Perform a linear scan of the labeled sequence;
[0097] When the first marker is scanned, and its identifier is the start identifier, its hierarchy information is a preset value, and its nesting information is a first value, the first marker is taken as the starting point of the block; when the second marker is scanned, and its identifier is the end identifier, the second marker is taken as the ending point of the block, resulting in a block; or, when the third marker is scanned, and its nesting information is a second value, the third marker is taken as a block; or, when the fourth marker is scanned, and it belongs to an abnormal structure, the fourth marker is taken as a block.
[0098] The scanned blocks are combined into a block set.
[0099] In this application, the first, second, third, and fourth markers are used only to distinguish different markers, i.e., tokens, in the marker sequence.
[0100] Identifiers are labels used to identify the beginning and the end. For example, the identifiers in this application include at least a start identifier (e.g., *_open) and an end identifier (e.g., *_close). In some cases, the start identifier and the end identifier in this application may also be called open and close labels: such as heading_open and heading_close, which represent the beginning and end of the heading, respectively.
[0101] The preset value in this application can be 0, the first value can be 1, and the second value can be 0. No specific restrictions are imposed here.
[0102] Specifically, such as Figure 6 As shown, when the terminal performs a linear scan of the token sequence based on the hierarchical information (level) and the nesting information (nesting) through a Web Worker (computation thread), such as... Figure 7 The diagram shown illustrates a linear scan of the token sequence. Figure 7As shown, when the first token `token1` is scanned, and its identifier is the start identifier "*_open", its level information is the preset value (level=0), and its nesting information is the first value (nesting=1), then `token1` is recorded as the starting point of the segment. When the second token `token6` is scanned, and its identifier is the end identifier "*_close", then `token6` is recorded as the end point of the segment, resulting in segment 1. For example, when the top-level start identifier `*_open` (nesting=1, level=0) is encountered during scanning, the depth is counted, and when the corresponding end identifier `*_close` (depth=0) is encountered... When the value returns to 0, the segment ends; or, when the third token 7 is scanned and its nesting information is the second value (i.e., nesting=0, such as fence / html_block / hr), the third token 7 is treated as a segment 2; or, when the fourth token 8 is scanned and it belongs to an abnormal structure, the fourth token 8 is treated as a segment 3. That is, the abnormal structure is downgraded to a single token segment to ensure robustness, and the scanned segments (such as segment1, segment2, segment3) are combined into a segment set segments={ segment1, segment2, segment3…}. This allows for the rapid and accurate acquisition of stable segment sequences. When appending to the end of the stream, only 1 to 2 segments are typically added, providing a foundation for subsequent differential patching. This segment-based differential processing elevates the update granularity to the top-level block-level semantic unit, eliminating the need to rebuild unchanged segments and significantly reducing DOM changes. Even in high-frequency update scenarios, it effectively avoids preemption of the event loop by user interaction, avoiding issues such as input lag, dropped scrolling frames, and unresponsive clicks common in traditional technologies. It ensures high performance in streaming content rendering while also improving accuracy, enabling smoother scrolling and interaction even in high-frequency update scenarios, and maintaining usability on mobile devices or mini-programs.
[0103] In one embodiment, streaming content is displayed on a session page; the step of obtaining text information for updating the streaming content includes:
[0104] In response to an input action triggered in the session page, retrieve the text information used to update the streaming content;
[0105] The method further includes: displaying streaming content in the form of conversation messages on the conversation page; wherein the streaming content includes at least text data and image data.
[0106] Specifically, for application such as Figure 3 The following example illustrates a real-time interaction scenario in an AI dialogue product. Assume the user clicks on... Figure 3 The terminal will respond to the user's input box icon as shown, and the user will enter the text "Basic statistical indicators for analyzing Agricultural Bank of China stock data". Figure 3 The information input operation triggered in the application page (such as a session page) shown above obtains the input text corresponding to the information input operation as "basic statistical indicators for analyzing Agricultural Bank of China stock data". After processing the input text "basic statistical indicators for analyzing Agricultural Bank of China stock data" using an AI model, and obtaining the text information used to update the streaming content as Markdown text, the terminal can then... Figure 3 The session page shown displays generated streaming content in the form of session messages; the streaming content includes at least text data and image data. It is understood that the streaming content displayed on the session page in this application includes, but is not limited to, multimodal content containing different data types such as tables, long code blocks, image links, and cited sources.
[0107] For example, such as Figure 8 The diagram illustrates an application scenario of the method provided in this application in AI dialogue products within the gaming field. For example, suppose a user clicks on... Figure 8 The virtual character 100 shown corresponds to an "input box" icon. If the user enters the text "How do I use the invisibility skill? I can't play anymore," the terminal will respond to the user's input. Figure 8 The aforementioned information input operation triggered in the application page shown (such as a game session page) obtains the input text corresponding to the information input operation as "How to use the invisibility skill, I can't play anymore". After processing the input text "How to use the invisibility skill, I can't play anymore" using an AI model, and obtaining the text information used to update the streaming dialogue content as Markdown text, the terminal can then... Figure 8 The conversation page shown displays the generated streaming dialogue content in the form of conversation messages. For example, the generated streaming dialogue content displayed in the form of terminal conversation messages includes: Virtual character 200 responding to virtual character 100 with the message, "The Shadow Assassin's stealth skill has indeed been nerfed by 2 seconds in this version, making it impossible to enter the fray. He was already fragile, and now he'll just evaporate in team fights. Forcibly nerfing old characters just to sell new skins is too greedy!"
[0108] like Figure 9The diagram illustrates an application scenario of the method provided in this application in AI dialogue products within the writing field. For example, suppose a user clicks on... Figure 9 If the user enters the "input box" icon shown in the image and inputs the text "Generate a rewritten ancient costume illustrated novel based on the uploaded image," the terminal will respond to the user's input. Figure 9 The aforementioned information input operation triggered in the application page (such as an interactive page) shown above obtains the input text corresponding to the information input operation as "Generate a rewritten ancient costume illustrated novel based on the uploaded image". Then, through an AI model, the input text "Generate a rewritten ancient costume illustrated novel based on the uploaded image" and the uploaded image are processed to obtain the text information used to update the streaming dialogue content as Markdown text. The terminal can then... Figure 9 The interactive page shown displays generated streaming content. For example, the terminal displays generated streaming content including text content and images. For example, the text content includes "Novel Synopsis" and "Author Introduction".
[0109] In this embodiment, the "generating and displaying simultaneously" session page on the product side not only maintains the stability of historical content but also enables partial updates of new content. It also supports interactive features (copying, referencing, and jumping) combined with streaming rendering without needing to repeatedly bind events to each node. Furthermore, it supports consistent functionality across multiple platforms (PC / mobile / mini-program) and adapts to restricted environments, effectively expanding the applicable scenarios of streaming content rendering methods and enriching the diversity of applicable scenarios for streaming content rendering methods.
[0110] In one embodiment, the text information is rich text description content; the step of obtaining text information for updating streaming content in response to an information input operation triggered in the session page includes:
[0111] In response to an input action triggered in the chat page, retrieve the input text corresponding to the input action;
[0112] The input text is processed by an AI model to obtain rich text descriptions for updating streaming content.
[0113] Specifically, for application such as Figure 3 The following example illustrates a real-time interaction scenario in an AI dialogue product. Assume the terminal responds to the user's input as shown in the image. Figure 3 The information input operation triggered in the application page (such as a session page) shown above obtains the input text corresponding to the information input operation as "basic statistical indicators for analyzing Agricultural Bank of China stock data". After processing the input text "basic statistical indicators for analyzing Agricultural Bank of China stock data" using an AI model, the output text information used to update the streaming content is rich text description content. The terminal can then use methods such as... Figure 4 The main thread (i.e., the UI thread) shown receives rich text descriptions from the AI model to update the streaming content. When the second thread, the Web Worker (computation thread), is detected to be idle (i.e., not in-Flight or not in Single-Flight), the terminal can... Figure 4 The main thread (UI thread) sends an update request (update(id, fullContent)) to the second thread, the Web Worker (computation thread), to update the streaming content. The update request carries the full rich text description content (fullContent) and an update identifier (id) to mark the streaming content. Upon receiving the update request (update(id, fullContent), the Web Worker (computation thread) splits the fullContent based on the update identifier (id) to obtain an incremental update unit sequence. This allows for both high-performance and accurate streaming content rendering, ensuring both correctness and performance even in high-frequency update scenarios, thus comprehensively improving the efficiency and quality of streaming content rendering.
[0114] In one embodiment, the step of generating a signature digest for each incremental update unit based on each incremental update unit in the incremental update unit sequence used to independently update streaming content includes:
[0115] Extract key fields from each incremental update unit;
[0116] The key fields are mixed into a rolling hash of a preset length according to a preset order to obtain a text digest; the text digest is a signature digest.
[0117] The signature digest is used to quickly locate the range of change in streaming content.
[0118] Key fields refer to the key fields of the token within each incremental update unit. For example, the key fields in this application include at least the token's type / tag / nesting / info / markup / content, as well as fields such as attrs and children.
[0119] Specifically, such as Figure 6As shown, the terminal uses a Web Worker (computation thread) to split the token sequence based on hierarchical information (level) and nesting information (nesting). For example, `splitTopLevelRanges(tokens) → ranges`. This results in an incremental update unit sequence composed of segments = tokens [r.start..r.end] (traversing each range). The terminal then uses the Web Worker to extract key fields from each incremental update unit segment. For instance, it extracts the key fields of type / tag / nesting / info / markup / content, as well as attrs and children, from each incremental update unit segment. These key fields are then mixed into a preset number of rolling hashes to obtain a text digest. For example, the terminal uses the Web Worker to extract the key fields of type / tag / nesting / info / markup / content, attrs, and children from each incremental update unit segment in a fixed order. Key fields are mixed into a dual 32-bit rolling hash (based on Math.imul), and separators are inserted at field boundaries. The final output is a signature digest `nextSegSigs` for each incremental update unit segment. This signature digest `nextSegSigs` is used to quickly locate the change range of the streaming content; in this application, `nextSegSigs` is used for subsequent diffing. Dual hashing significantly reduces the probability of collisions. Furthermore, for extremely demanding scenarios, the process can be extended to 64-bit (BigInt) or include additional text digest verification. This allows for the design of a fast differential method based on segment signatures, which can quickly and accurately calculate stable signatures for the key fields of each token. O(n) algorithms such as prefix and suffix scanning are used to generate patches, making it particularly suitable for high-frequency patterns of streaming appending or local modification at the tail. By performing differential at the structural layer (tokens / segments), full re-rendering is effectively avoided each time, resulting in convergence of the rendering range (structural-level increment). In other words, through segment-level patching, only the changed segments are updated in the DOM, and the generated segments are not destroyed and rebuilt, avoiding the jitter and flickering caused by full re-rendering, thereby effectively improving the streaming experience on the user side.
[0120] In one embodiment, the method further includes:
[0121] The signature digest is validated for correctness, and the validation result is obtained.
[0122] The step of generating an incremental update patch based on the signature digest of each incremental update unit includes:
[0123] If the verification result indicates that the verification passed, an incremental update patch is generated based on the signature digest and the hierarchical relationship between each incremental update unit.
[0124] Specifically, such as Figure 6 As shown, the terminal extracts key fields from each incremental update unit segment in the incremental update unit sequence through Web Worker (computation thread), and mixes the key fields into a rolling hash of a preset number of bits according to a preset order to obtain the signature digest nextSegSigs of each incremental update unit segment. After that, the terminal can further verify the correctness of the signature digest through Web Worker (computation thread) to obtain the verification result. If the verification result indicates that the verification is successful, the terminal generates an incremental update patch based on the hierarchical relationship between the signature digest and each incremental update unit through Web Worker (computation thread).
[0125] Furthermore, if the verification result indicates that the verification failed, the terminal uses a Web Worker (computation thread) to determine whether the incremental update unit segment corresponding to the failed signature digest is too large. Specifically, it compares the size of the incremental update unit segment with the preset block size. If the difference exceeds a threshold, it indicates that the size of the incremental update unit segment is larger than the preset block size. The terminal can then further divide the incremental update unit segment corresponding to the failed signature digest into blocks using the Web Worker. Specifically, if some segments are too large (e.g., excessively long tables / code blocks), further sub-diffing or line-by-line segmentation (e.g., table row / code line segmentation) can be performed within the segment to obtain the individual sub-blocks. A signature digest for each sub-block is then generated, and the Web Worker (computation thread) is used again to verify the correctness of the signature digests of each sub-block, yielding the verification result. This allows for further sub-diffing or row-by-row segmentation (such as table row / code line segmentation) within the incremental update unit segment, which can effectively improve the fineness of the difference and thus enhance the accuracy of incremental rendering. This can be used as an enhancement process in high-end scenarios.
[0126] In one embodiment, the step of generating an incremental update patch based on the signature digest and the hierarchical relationship between the incremental update units includes:
[0127] For each incremental update unit, the signature digest of the current incremental update unit and the signature digest of the previous incremental update unit are obtained through the second thread;
[0128] The second thread compares the signature digest of the current incremental update unit with the signature digest of the previous incremental update unit to obtain the comparison result.
[0129] Incremental update patches are generated based on the comparison results.
[0130] In this application, the terms "current incremental update unit" and "previous incremental update unit" are used only to distinguish between different incremental update units. For example, the previous incremental update unit is the incremental update unit generated 5 seconds ago, and the current incremental update unit is the latest incremental update unit generated at the current moment.
[0131] Specifically, such as Figure 6As shown, after the terminal calculates the signature digest nextSegSigs of the current (time-to-time) incremental update unit segment6 through a Web Worker (computation thread), the terminal can further verify the correctness of the signature digest through a Web Worker (computation thread), obtain the verification result, and if the verification result indicates that the verification passed, the terminal can then proceed through the Web Worker... The Worker (computation thread) retrieves the signature digest `prevSegSigs` of the previous incremental update unit (e.g., segment 5) that it maintains (the last updated state). It then compares the signature digest `nextSegSigs` of the current incremental update unit (segment 6) with the signature digest `prevSegSigs` of the previous incremental update unit (e.g., segment 5) (calculating the differential increment between `prevSegSigs` and `nextSegSigs`) to obtain the comparison result. For example, the terminal uses the Web Worker (computation thread) to compare the signature digest `nextSegSigs` of the current incremental update unit (segment 6) with the signature digest `prevSegSigs` of the previous incremental update unit (e.g., segment 5), obtaining a patch description. This patch description is a quick descriptor of "where to delete and what to insert" obtained from `prevSegSigs` and `nextSegSigs`. Further, the terminal uses the Web Worker (computation thread) to generate an incremental update patch (diff patch) based on the comparison result. This allows us to ensure the correct parsing of Markdown semantics by designing the Worker based on "full content parse" (without relying on delta for context inference), while using structural diff to achieve incremental updates. This balances correctness and performance, enabling high-performance streaming content rendering with full semantic parsing and incremental rendering. Even in high-frequency update scenarios, it can balance the correctness and performance of streaming content rendering, thereby comprehensively improving the efficiency and quality of streaming content rendering.
[0132] In one embodiment, the step of comparing the signature digest of the current incremental update unit with the signature digest of the previous incremental update unit to obtain the comparison result includes:
[0133] Perform prefix and suffix scanning on the signature digest of the current incremental update unit and the signature digest of the previous incremental update unit to obtain the head difference position of the forward scan and the tail difference position of the reverse scan.
[0134] Based on the differences at the head and tail, a description is obtained to represent the deleted or inserted content; the description is a comparison result.
[0135] In this application, the head difference position and the tail difference position are only used to distinguish the difference subscripts located in different positions during scanning. For example, the head difference position in this application can refer to the first difference position that appears during forward scanning (searching from the beginning), i.e., the first difference subscript (start), and the tail difference position in this application can refer to the last difference position that appears during reverse scanning (searching from the end), i.e., the last difference subscript.
[0136] Specifically, such as Figure 6 As shown, after the terminal calculates the signature digest nextSegSigs of the current (time-to-time) incremental update unit segment6 through a Web Worker (computation thread), the terminal obtains the signature digest prevSegSigs of the previous state (the one updated last time), i.e., the previous incremental update unit (e.g., segment5), which it maintains, through a Web Worker (computation thread). The terminal then compares the signature digest nextSegSigs of the current incremental update unit segment6 with the signature digest prevSegSigs of the previous incremental update unit (e.g., segment5) (calculating the differential increment between prevSegSigs and nextSegSigs). For example, the terminal can use a Web Worker... The Worker (computation thread) employs a prefix-suffix scan in O(n) time to perform prefix-suffix scans on the signature digests of the current incremental update unit and the previous incremental update unit: it finds the index of the first difference position from the beginning and the index of the last difference position from the end, thus obtaining the header difference position for the forward scan and the tail difference position for the reverse scan. Based on these header and tail difference positions, it obtains a description of the deleted or inserted content, namely, the description results for deleteCount and insertRange, which serve as the comparison result. Since the streaming scenario in this application primarily involves tail appending, the more complex and less profitable LCS is not used.
[0137] Furthermore, after the terminal compares the signature digest of the current incremental update unit with the signature digest of the previous incremental update unit through a Web Worker (computation thread) and obtains the comparison result, the Web Worker (computation thread) can update the signature digest prevSegSigs of the previous state (the one updated last time), which it maintains. Figure 6 The processing flow shown is that the Web Worker (computation thread) updates its own previous state prevSegSigs to prevSegSigs = nextSegSigs (updating the previous signature for use in the next comparison).
[0138] The method provided in this embodiment can stably generate small patches in scenarios such as "tail appending, tail partial modification, and occasional minor intermediate modification". Although it does not guarantee optimality for large-scale intermediate insertion / movement, the result is still correct. That is, through the segment-level patch processing method, subsequent DOM updates are only performed on the changed paragraphs, and the generated paragraphs are not destroyed and rebuilt, which effectively avoids the jitter and flickering caused by full re-rendering and achieves convergence of the rendering range. Even in scenarios with long text (multiple paragraphs, multiple tables, long code blocks) and high-frequency streaming updates, the main thread blocking is significantly reduced. Rendering only updates the tail or locally changed segments, making scrolling and interaction smoother, and maintaining a usable experience on mobile devices or mini-programs.
[0139] In one embodiment, the step of converting an incremental update patch into a document object model for rendering the streaming content includes:
[0140] The incremental update patch is converted into a document object model for rendering streaming content via a second thread;
[0141] The incremental rendering of the streaming content based on the document object model includes:
[0142] The document object model is sent from the second thread to the first thread, so that the first thread can perform incremental rendering of the streaming content based on the document object model.
[0143] Specifically, such as Figure 6 As shown, after the terminal generates an incremental update patch based on the signature digest of each incremental update unit segment using a Web Worker (computation thread), the terminal can convert the incremental update patch into a Document Object Model (DOM) patch for rendering streaming content. This involves converting the incremental update patch into a valid data structure. Since the incremental rendering in this application uses DOM manipulation, the incremental update patch needs to be converted into a data structure conforming to the DOM, such as a DOM patch. Then, the incremental update patch (DOM patch) is sent to the main thread via `postMessage`, allowing the main thread to perform incremental rendering of the streaming content based on the DOM patch. This enables rendering updates to be completed with minimal DOM changes, effectively reducing frame jitter and improving the stability and scrolling responsiveness of incremental rendering.
[0144] In one embodiment, the step of incrementally rendering streaming content based on the document object model includes:
[0145] The first thread adds the document object model to the callback function of the target interface;
[0146] When a call request to the target interface is detected, the operation of writing to the document object model is performed through the target interface to obtain the updated content of the streaming content.
[0147] The target interface can be requestAnimationFrame.
[0148] Specifically, such as Figure 5 As shown, the terminal can add the Document Object Model (DOM patch) to the callback function of the target interface through the first thread, i.e., the main thread. For example, the terminal can add the incremental update patch (DOM patch) to the callback of "requestAnimationFrame" through the main thread. When a call request to the target interface is detected, i.e., when the callback of "requestAnimationFrame" is detected, the main thread performs the operation of writing to the Document Object Model through the target interface, i.e., performing the writing (DOM patch) operation. After the execution is completed, the updated content of the streaming content can be obtained. Thus, by adopting frame merging rendering, that is, merging the application of multiple DOM patches into the `requestAnimationFrame` callback, the DOM writing operation is only performed once in the same frame, which effectively reduces the jitter of style / layout / paint and the amplification repaint. That is, the main thread only performs the minimum number of DOM operations corresponding to the diff result, and the repaint diffusion is reduced by merging and rendering isolation, effectively avoiding queue backlog and multiple DOM writings in the same frame, thereby effectively improving the stability of frame rendering.
[0149] In one embodiment, the step of incrementally rendering streaming content based on the document object model includes:
[0150] The first thread performs editing operations based on the document object model to obtain updated content in the streaming format; the editing operations include at least one of the following: insertion, deletion, and replacement operations.
[0151] The updated content will be displayed after the generated streaming content.
[0152] Editing operations refer to DOM operations performed. DOM operations include at least one of the following: insertion, deletion, and replacement operations.
[0153] Specifically, the terminal can perform DOM editing operations based on the Document Object Model (DOM) through the first thread, i.e., the main thread, to obtain updated content for the streaming content. The DOM editing operations include at least one of DOM insertion, deletion, and replacement operations, and the updated content is displayed after the generated streaming content. This allows the streaming content rendering method provided in this application to maintain the stability of historical content while "generating and displaying simultaneously," and to update new content locally. It also supports interactions combined with streaming rendering (copying, referencing, and other interactive operations) without needing to repeatedly bind events to each node. Furthermore, it supports consistent functionality across multiple platforms (PC / mobile / mini-programs) and adapts to restricted environments.
[0154] In one embodiment, after incrementally rendering the streaming content based on the document object model, the method further includes:
[0155] The process ends when the incremental content corresponding to the document object model is the end of the streaming content.
[0156] If the incremental content corresponding to the document object model is not the end content of the streaming content, the update request for updating the streaming content is sent from the first thread to the second thread.
[0157] Among them, such as Figure 5 As shown, the last content in this application may refer to the last sentence of the streaming content, i.e., lastsent.
[0158] Specifically, the terminal can incrementally render the streaming content based on the document object model (such as a DOMpatch) through the first thread, i.e., the main thread, such as... Figure 5As shown, when the incremental content corresponding to the Document Object Model (DOM) is the end of the streaming content (i.e., when the pendingContent of the previously rendered incremental content equals lastSent), the main thread can directly terminate the process. When the incremental content corresponding to the DOM is not the end of the streaming content (i.e., when pendingContent of the previously rendered incremental content does not equal lastSent), the terminal can continue to send update requests (update) to the second thread (Web Worker) through the first thread (main thread) to update the streaming content until the entire streaming content is updated. This allows for worker migration and recomputation—that is, Markdown parsing / tokenization and differential processing are migrated to the worker for execution—while the main thread only performs lightweight DOM patching operations, effectively reducing main thread blocking. Simultaneously, the single-flight + overwrite of the latest content approach avoids worker queue accumulation, maintaining an upper bound on latency even with high update frequencies. This effectively solves the "slower with each generation" experience problem in traditional technologies and significantly improves the controllability of end-to-end latency.
[0159] In one embodiment, the step of incrementally rendering streaming content based on the document object model includes:
[0160] If the number of document object models is not the preset value, and at least two document object models belong to the same frame content, at least two document object models will be cached.
[0161] Retrieve the target document object model from at least two document object models;
[0162] Based on the description in the target document object model, a preset number of nodes are deleted at the starting position, and new nodes are inserted to obtain the updated content of the streaming content.
[0163] The preset value can be 1.
[0164] The target document object model (DOM) refers to one of the multiple cached DOM models. For example, in this application, the target DOM model refers to the DOM model of the latest generated incremental update patch, or it can be the DOM model of the incremental update patch corresponding to the update identifier with the largest identifier value among the update identifiers. For instance, if the cached DOM models of the incremental update patches to be updated include dom002, dom003, and dom004, and dom004 is the DOM model of the latest generated incremental update patch to be updated, then the target DOM model refers to dom004.
[0165] Specifically, such as Figure 6As shown, the terminal uses Web Workers (computation threads) to convert the incremental update patch into a Document Object Model (DOM) patch for rendering streaming content. This means converting the incremental update patch into a valid data structure and then sending the DOM patch to the main thread via `postMessage`. Figure 5 The process shown illustrates that when the main thread receives at least two Document Object Models (DOM models) from the Web Worker (computation thread), it can cache these two DOM models (pendingPatchRef). During incremental rendering of the streaming content based on the DOM model, the main thread retrieves the target DOM model (the latest updated patch) from the cached DOM models. Based on the description in the target DOM model dom004, it deletes a preset number of nodes at the starting position and inserts new nodes to obtain the updated streaming content. This means that the rendering update is completed with minimal DOM changes, reducing frame jitter. During incremental rendering, child nodes are uniformly named ".markdown-render-new__seg", each corresponding to a segment HTML. The latest patch is used to delete `deleteCount` nodes at the start position and insert a new node using `insertHtml[]`. In essence, each patch to be rendered in this application is cached upon arrival at the main thread, and only the last patch (the latest updated patch) is applied in the RAF callback. This avoids full "innerHTML" overwriting and React tree re-rendering, effectively reducing the scope and number of layout / paint operations. It allows rendering updates to be completed with minimal DOM changes, effectively reducing frame jitter and improving the stability of incremental rendering and scrolling responsiveness.
[0166] In one embodiment, the step of incrementally rendering streaming content based on the document object model includes:
[0167] Call the preset function used to limit redraw spread;
[0168] Incremental rendering of streaming content is performed using a preset function to concentrate the redrawing of streaming content in the incremental area.
[0169] Specifically, when the terminal performs incremental rendering of streaming content based on the Document Object Model via the main thread, it can call preset functions to limit repaint diffusion through the main thread, and perform incremental rendering of streaming content through these preset functions. For example, to reduce the large clipping area formed by container-level overflow clipping, each segment in this application can use "contain: layout paint" to limit repaint diffusion, so that the repainting range of streaming content is concentrated in the incremental area. At the same time, the main thread can also create a BFC using "display: flow-root" to avoid dependence on overflow, making the repainting range more concentrated in the incremental area. If there are whole-layer composition conditions such as "filter / backdrop-filter / transform" in the outer layer, the impact range can be further avoided or converged at the style level.
[0170] Furthermore, in some embodiments, such as in the case of lazy loading of code highlighting, the streaming content rendering method provided in this application can maintain the basic code DOM output by the Worker and not highlight it during the streaming process; when the code block enters the viewport or is expanded by the user, the terminal can asynchronously load the highlighting library on the main thread and highlight the block, thus avoiding the impact of highlighting on streaming performance.
[0171] In some embodiments, for specific fragmented code blocks such as "mermaid", "OlaChatChartJSON", and "OlaChatInterruptJson", the Worker outputs placeholder nodes (containing data-type and the original payload). After patching, the main thread performs local hydrate on the newly added segment, that is, replaces the placeholder node with the corresponding React component (MermaidRenderer / Chart / Interrupt). This approach can reuse existing complex rendering capabilities while maintaining incremental updates.
[0172] In some embodiments, even when segments continuously generate patches, the technical solution provided in this application can still maintain stable and usable interactions such as copying and reference navigation without requiring repeated binding. For example, the terminal can use the main thread to attach a click detector to the container, identify ".md-code-block__copy", read the "pre > code" text and copy it, provide feedback on the Toast and button states, and replace the original reference "[citation:x]" with a "customType=citation" link capsule. When clicked, the corresponding navigation behavior is executed according to the mini-program or browser environment. This decouples the interaction logic from incremental rendering, and the detector is always a single instance, reducing the memory and GC pressure caused by repeated binding. In other words, the interaction loop is completed through event delegation, effectively avoiding the occupation of browser memory and thus avoiding impacting browser performance.
[0173] In this embodiment, by isolating the rendering impact and suppressing "full repaint", it is possible to effectively avoid large-scale repaints of historical areas triggered by new additions at the tail, thus maintaining the stability of streaming rendering. That is, by isolating the rendering impact range through mechanisms such as "contain: layout paint" and BFC, the probability of historical areas being repainted due to new additions at the tail is reduced.
[0174] In one embodiment, this application also provides an application scenario in which the above-described streaming content rendering method is applied. Specifically, the streaming content rendering method is applied in this application scenario as follows:
[0175] Scenario 1 (PC / Large Screen): The user initiates a "Generate Data Analysis Report" request to the AI. The AI model streams output, including multi-paragraph explanations, multiple tables, and SQL / code blocks. Requirements: The report should be generated and displayed simultaneously; tables should be horizontally scrollable; code blocks should be copyable with one click; and overall scrolling should be smooth.
[0176] Scenario 2 (Mobile / Mini Program): When a user views the same streaming output content on a mobile device, the content includes a source reference. When the user needs to click on the reference, the mini program should provide an interaction of "copying the link and prompting the user to open it in a browser". Due to the weaker performance of mobile devices, it is even more necessary to use the incremental rendering method of streaming content provided in this application to avoid lag.
[0177] Scenario 3 (Weak Network / Reconnection): During the generation of streaming content, network jitter causes a disconnection. The front end needs to continue displaying the generated content and continue to append without flickering after reconnection; in case of failure, display an error message and keep the generated paragraphs intact.
[0178] The method provided in this application significantly reduces main thread blocking when dealing with long text (multiple paragraphs, multiple tables, long code blocks) and high-frequency streaming updates. Rendering only updates the tail or locally changed segments, making scrolling and interaction smoother, and maintaining a usable experience on mobile devices or mini-programs.
[0179] Key terms and definitions used in this application:
[0180] AI streaming communication / streaming output: refers to the front end continuously receiving fragmented text or structured fragments output by AI models through SSE (Server-Sent Events) / WebSocket / Fetch-Stream, etc. On the front end, the same message content is continuously appended and updated, and the status is constantly changing (generating → completed / interrupted / failed).
[0181] Streaming chunks: This refers to a transmission format where AI output is split into multiple small segments and delivered to the client sequentially. Chunks may contain only newly added text, or they may contain control tags, special blocks (such as the start / end of a code block), or the full text concatenated by the backend.
[0182] Markdown content refers to rich text descriptions that conform to Markdown syntax (and may include extended syntax such as tables, task lists, code blocks, and optional native HTML). In AI scenarios, Markdown is often used to simultaneously carry natural language descriptions, tabular data, code, links, and source citations.
[0183] Rendering consistency: During streaming output, the front-end display results should meet semantic consistency: for example, the opening and closing and nesting of lists / tables / code blocks must be correct; re-rendering after breakpoint recovery or reconnection should also be consistent with the final content.
[0184] Token: A token is a sequence of structured tags obtained by the Markdown parser after parsing the input text. Each token describes a syntactic unit (such as a paragraph, list, heading, code block, link, etc.) and may contain information such as level / nesting and attributes. The token sequence can be used for subsequent structured comparison, differencing, and rendering.
[0185] AST / Syntax Tree: Refers to the abstract syntax tree structure built during the Markdown parsing process. In some solutions, the AST is converted into a React component tree; this invention preferably uses tokens / segments for incremental updates to reduce the cost of tree structure reconstruction.
[0186] A segment (or chunk) is the smallest incremental update unit obtained by dividing a token sequence according to a "top-level block structure." A segment typically corresponds to a top-level block: paragraph, list, quote, table, code, etc. The goal of segment design is to allow DOM patches to be inserted / deleted directly by child node indexes, thereby converging the rendering impact to a localized area.
[0187] Signature / Hash: A stable digest calculated from the key fields of a token within a segment, used to quickly determine whether the segment has changed. The signature is not encrypted; it is only used for efficient comparison. This application preferably uses a dual 32-bit rolling hash (or an equivalent stable digest) to reduce the risk of collisions.
[0188] A patch is a minimal set of operations required to update a list of segments from an old list to a new list of segments. In this approach, the patch is implemented by deleting `deleteCount` segments starting from `start` and then inserting `insertHtml[]`. This format corresponds one-to-one with the DOM structure of the top-level segment, allowing the main thread to quickly execute the fewest possible DOM operations.
[0189] Incremental rendering / incremental update: This refers to performing DOM insertion / deletion / replacement only on the changed segments when content is updated, without rebuilding the entire Markdown rendering result. Incremental rendering is not the same as "only passing delta text", but rather "passing the full text but performing incremental patches at the structural level" to ensure semantic correctness.
[0190] Web Worker: Refers to the multi-threading capability provided by browsers, used to migrate CPU-intensive computations (Markdown parsing, tokenization, diffing) to independent threads, reducing main thread blocking. Workers exchange structured data with the main thread via postMessage.
[0191] Single-flight: This refers to a situation where only one worker task is executed at a time. New content updates are not queued and accumulated; instead, they are overwritten as the "latest content" and processed only when the worker is idle, thus avoiding snowballing delays. This strategy is preferred for "high-frequency streaming update" scenarios.
[0192] Backpressure: When the content update frequency exceeds the parsing / rendering capacity, scheduling strategies (single-fly, frame merging, throttling) are needed to suppress queuing and ensure that the upper limit of end-to-end latency is controllable.
[0193] Render Frame Batching (RAF batching): This refers to merging the application of multiple DOM patches into the requestAnimationFrame callback, performing DOM writing only once in the same frame, reducing style / layout / paint jitter and amplification repaint.
[0194] CSS Containment (Render Isolation): This refers to using CSS properties such as contain: layout paint to limit the scope of layout / painting, so that adding a new ending paragraph does not cause the repainting of previously rendered content to spread.
[0195] BFC (Block Formatting Context): Refers to a layout context created using methods such as display: flow-root, used to avoid side effects such as margin collapse, and to avoid the need for redrawing due to the increased clipping area caused by overflow: hidden.
[0196] Event delegation: refers to the unified detection of events on container nodes, locating target elements and handling interactions through methods such as closest(), avoiding the need to bind detectors to a large number of nodes generated by innerHTML, thereby reducing memory and event handling overhead.
[0197] Security boundary: When enabling native Markdown HTML (e.g., `html: true`) and using innerHTML injection, if the input content is untrusted, it must be sanitized / whitelisted to avoid XSS risks. This security boundary should be clearly defined at the system level: trusted content can be rendered directly, while untrusted content must be sanitized.
[0198] Business Scenarios and Pain Points: In React projects, AI dialogue messages are output via streaming communication, resulting in continuous updates of the Markdown text for the same message. Traditionally, libraries like react-markdown / remark / rehype are used to fully parse and re-render the entire content with each update. As the content grows longer and the update frequency increases, this causes noticeable stuttering, unresponsive scrolling, slower input, and delays in copying / clicking. This is especially true when AI output includes tables, long code blocks, image links, and source citations, where the parsing and rendering costs are even higher, and the stuttering is more pronounced.
[0199] With the widespread adoption of large-scale model applications, front-ends need to support longer and more complex rich text content in conversational products. Markdown has become the de facto standard: it can simultaneously express headings, lists, tables, code blocks, and citation links, and can be extended with custom blocks. On the other hand, streaming output is a key interaction method for reducing waiting time and enhancing the "generating experience": users expect to see the first screen of content within 1-2 seconds, and then read it as it is generated.
[0200] Traditional technical solutions include: Typical Solution A: ReactMarkdown full rendering
[0201] In the React ecosystem, react-markdown is often used in conjunction with plugins such as remark-gfm and rehype-raw to parse Markdown text into an Abstract Syntax Tree (AST) and then map it to a React component tree. The advantages of this approach are strong componentization and high controllability; the disadvantages are that when the content changes frequently, the parsing and component tree reconstruction costs are high, the coordination process consumes CPU, and can cause lag.
[0202] Typical Solution B: Full innerHTML coverage after string conversion
[0203] Another approach uses marked / markdown-it to convert Markdown to HTML, and then executes `container.innerHTML = html` on each update. This approach is simple to implement but causes the entire DOM to be destroyed on every update, triggering a large-scale layout / paint, resulting in more noticeable lag, and node-level interactions (such as the copy button) need to be repeatedly bound or re-queried.
[0204] Typical Solution C: Local Throttling / Frequency Reduction
[0205] In engineering practice, cost-saving measures are also employed: for example, rendering only once every 100ms or every 200ms to reduce the number of updates. This approach can only reduce the frequency and cannot change the nature of "each time is still a full parse + full rendering"; when the content is very long, the overhead of a single parse will still be large, and it will introduce visual delay and word skipping.
[0206] Typical Solution D: Incremental text concatenation without structural difference
[0207] Some solutions only "append" at the text level, such as directly appending the new chunk to the end of the string and then rendering the entire string again. Because Markdown structure depends on context (such as the opening and closing of lists / tables / code blocks), appending at the text level is not the same as incrementally appending at the structure level; it will still trigger a full parsing and redraw.
[0208] Problems with traditional technologies:
[0209] (1) Lack of structured incremental rendering capability for “streaming high frequency + long content”: existing solutions either render the entire React or cover the entire DOM. Even if only the tail section is added, it still results in a large-scale update.
[0210] (2) Severe main thread blocking: Parsing, rendering and browser layout / drawing all occur on the main thread. High-frequency updates and user interactions compete for the event loop, causing input delays, scrolling frame drops, and unresponsive clicks.
[0211] (3) Performance deteriorates linearly with content: As the content length increases, the overhead of a single parse / render increases continuously; the cumulative overhead during streaming is superimposed, eventually leading to obvious lag or even page freeze.
[0212] (4) Rendering pipeline magnification: In actual products, there are often CSS features such as bubble containers, shadows, blur layers, and clipping areas; when innerHTML is fully covered, paint invalidation may spread to a whole-block repaint.
[0213] (5) It is difficult to balance correctness and performance: If only delta text is transmitted, it is easy to make mistakes at the boundaries of Markdown structure; if the full text is transmitted, the performance is not good.
[0214] Therefore, to address the aforementioned issues, this application proposes a Markdown incremental rendering method and apparatus for AI streaming scenarios, the core idea of which is:
[0215] (1) The "full Markdown text" is still used as the parsing input to ensure semantic correctness;
[0216] (2) But differential processing is performed at the structure layer (tokens / segments) to avoid full redrawing every time;
[0217] (3) Put CPU-intensive steps (parsing, tokenization, signing and diffing) into Web Worker;
[0218] (4) The main thread performs only the minimum DOM operations corresponding to the diff result, and reduces repaint spread by merging frames and isolating rendering;
[0219] (5) By controlling end-to-end delay through single flight and back pressure scheduling, the queuing of streaming updates is avoided from causing "the more it is generated, the more it gets stuck".
[0220] The key innovations of the technical solution in this application include, but are not limited to:
[0221] (1) Top-level segment generation method based on tokens: Construct a stable top-level block list by utilizing the hierarchical / nesting information of tokens, so that the smallest unit of rendering is improved from "character / line" to "block-level semantic unit", and can be directly mapped to DOM child nodes.
[0222] (2) Fast differential method based on segment signature: Calculate stable signature for key fields of each token, and generate patch using O(n) algorithm such as prefix and suffix scanning, which is particularly suitable for high-frequency mode of streaming tail appending / tail local modification.
[0223] (3) Incremental rendering mechanism with cross-thread collaboration: The Worker maintains the signature state of the previous frame, and the main thread ensures the consistency of the state by resetting / updating the id; in case of an error, it recovers by "clearing + resetting + re-rendering the latest content" to ensure the stability of the link.
[0224] (4) Streaming back pressure and scheduling strategy: single-flight + covering the latest content to avoid worker queue accumulation; apply patch by combining frames through RAF batching to reduce layout / paint jitter caused by multiple DOM writes in the same frame.
[0225] (5) Rendering isolation strategy: By using CSS mechanisms such as contain and BFC, newly added tail content is redrawn locally as much as possible, reducing the spread of historical content redrawing, and improving the streaming experience from the "browser rendering pipeline" level.
[0226] The problems that the technical solution of this application can solve include:
[0227] (1) Recalculation via Worker migration: Markdown parsing / tokenization and differential processing are performed in the Worker, while the main thread only performs lightweight DOM patching to reduce blocking.
[0228] (2) By segment structure difference: the update granularity is raised to the top-level block-level semantic unit, and the unchanged segments are not reconstructed, which significantly reduces the amount of DOM changes.
[0229] (3) Through signature and O(n) diff: quickly locate the change range, especially suitable for appending at the end; maintain controllable differential overhead when the content is very long.
[0230] (4) By using single-flight and RAF batching: avoid queue accumulation and multiple DOM writes in the same frame, and improve frame stability.
[0231] (5) Use CSS containment to limit redraw diffusion, so that newly added end paragraphs do not affect the drawing of historical paragraphs as much as possible.
[0232] (6) Stability is ensured through reset / recovery strategy: It can be quickly recovered when switching messages or parsing abnormalities, without affecting the user's continuous use.
[0233] On the product side, 1.1 Application Product Description and Overall Interface Structure
[0234] like Figure 3 As shown, this application can be applied to AI dialogue products / data assistant products. A typical interface includes: a top navigation area, a message list area, and an input area. System messages in the message list are Markdown rich text, continuously added as AI outputs.
[0235] 1.2 Product-side summary:
[0236] (1) Maintain the stability of historical content in "generating and displaying simultaneously" and update new content locally;
[0237] (2) Supports interactive features (copy, reference jump) combined with streaming rendering without needing to bind events repeatedly for each node;
[0238] (3) Support consistent functionality across multiple platforms (PC / mobile / mini-program) and adapt to restricted environments.
[0239] On the technical side, 2.1 Application Environment and Engineering Constraints
[0240] (1) Front-end framework: React 18; Builder: Vite; Module: ESM.
[0241] (2) Multithreading: Use the browser’s native Web Worker, which runs as {type: module} to support ESM dependencies.
[0242] (3) Markdown parsing: Use markdown-it; enable html and linkify to meet the product's rich text requirements.
[0243] (4) Interaction: Use event delegation to handle copying and reference jumps to avoid a large number of detectors.
[0244] (5) Style: Use Less; converge and redraw via contain / BFC.
[0245] 2.2 Technical Architecture Diagram
[0246] like Figure 4The diagram shown is a technical architecture diagram (thread and module division) of the technical solution provided in this application.
[0247] 2.3 Main Flowchart (Detailed Technical Flow)
[0248] like Figure 5 The diagram shown is a flowchart of the main thread's processing (single-flight + combined frames).
[0249] like Figure 6 The diagram shown is a flowchart of the Worker's processing (parse → block division → signature → diff → local render).
[0250] 2.4 Sequence Diagram (with accompanying diagram to explain key interactions)
[0251] like Figure 10 The diagram shown is a timing diagram of the streaming content rendering method provided in this application (one streaming update cycle).
[0252] 2.5 Technical Process Steps (in order of execution)
[0253] Step 1: Construct the top-level segment
[0254] (1) Step objectives: First, break down the complete Markdown into top-level segments that can be updated independently, so that subsequent patches can be directly mapped to DOM child node indices.
[0255] (2) Execution action: Linear scanning is performed based on the level and nesting of the markdown-it token: the depth is counted when the top-level *_open (nesting=1, level=0) is encountered, and the scanning ends when the corresponding *_close (depth returns to 0) is reached, which is a segment; the top-level self-contained token (nesting=0, such as fence / html_block / hr) is a separate segment; abnormal structures are downgraded to single token segments to ensure robustness.
[0256] (3) Step output: A stable segment sequence is obtained. When appending to the tail of the streaming process, only 1 to 2 segments are usually added to provide a basis for subsequent small patches.
[0257] Step 2: Generate segment signatures and control collisions
[0258] (1) Step objective: Quickly determine whether each segment has changed, avoiding the extra overhead caused by character-by-character comparison.
[0259] (2) Execution action: Extract the type / tag / nesting / info / markup / content and atrs, children and other fields of the token, mix them into a double 32-bit rolling hash (based on Math.imul) in a fixed order, insert separators at the field boundaries, and finally output the a:b signature.
[0260] (3) Output of the steps: The nextSegSigs is generated for diffing; double hashing significantly reduces the probability of collision. For extremely demanding scenarios, it can be extended to 64-bit (BigInt) or text digest verification can be added.
[0261] Step 3: Calculate the incremental diff patch
[0262] (1) Step objective: Quickly obtain the patch description of "where to delete and what to insert" from prevSegSigs and nextSegSigs.
[0263] (2) Execution of actions: Prefix and suffix scanning is used in O(n): find the first difference index start from the beginning and the last difference position from the end to obtain deleteCount and insertRange. Streaming scenarios mainly involve appending to the end, so LCS, which has higher complexity and limited benefits, is not used.
[0264] (3) Step output: Small patches can be generated stably in scenarios of "tail appending, tail local modification, and occasional intermediate minor modification"; although the optimality is not guaranteed for large-scale intermediate insertion / movement, the result is still correct.
[0265] Step 4: Synchronize across threads and maintain state consistency
[0266] (1) Step objective: Ensure that the main thread only applies the currently valid results to avoid errors caused by concurrency and expired responses.
[0267] (2) Execution actions: The Worker maintains stateful prevSegSigs; the main thread triggers reset when the messageKey changes or a Worker error occurs (parsing exception, etc.); the response uses an incrementing ID for verification and is combined with single-flight to control concurrency.
[0268] (3) Step output: A controllable and recoverable cross-thread update chain is formed. In case of an exception, the DOM can be cleared, the Worker can be reset, and the correct display can be restored immediately by recalculating with the latest content.
[0269] Step 5: Apply DOM patch and merge frames on the main thread.
[0270] (1) Step objectives: Complete the rendering update with minimal DOM changes and reduce frame jitter.
[0271] (2) Execution action: The direct child nodes of the container are uniformly .markdown-render-new__seg, and each child node corresponds to a segment HTML; delete Count nodes at the start position according to the patch and insert a new node insertHtml[]; after the patch arrives, it is cached first, and only the last patch is applied in the RAF callback.
[0272] (3) Step output: Avoid full innerHTML coverage and React tree re-rendering, and reduce the scope and number of layout / paint.
[0273] Step Six: Isolate rendering effects and suppress "full redraw".
[0274] (1) Step objectives: Avoid large-scale repaints in the historical area at the tail and maintain stable streaming rendering.
[0275] (2) Actions: Reduce the large clipping area formed by container-level overflow clipping; use contain:layout paint to limit repainting diffusion for each seg; create BFC through display: flow-root to avoid dependency on overflow.
[0276] (3) Step output: The redraw range is more concentrated in the incremental area. If there are whole-layer compositing conditions such as filter / backdrop-filter / transform in the outer layer, the influence range needs to be further avoided or reduced at the style level.
[0277] Step 7: Complete the interaction loop through event delegation
[0278] (1) Step objectives: To keep the copy, reference jump and other interactions stable and available without repeated binding when the segment is continuously patched.
[0279] (2) Execution action: The main thread only hangs a click detector in the container; after recognizing md-code-block__copy, it reads the pre > code text and copies it, and provides feedback on the Toast and button status; [citation:x] / [^x] is replaced with customType=citation link capsule, and when clicked, the corresponding jump behavior is executed according to the mini program / browser environment.
[0280] (3) Step output: The interaction logic and incremental rendering are decoupled, and the detector is always a single instance, reducing the memory and GC pressure caused by repeated binding.
[0281] 2.6 Safety and Controllability
[0282] (1) This solution supports raw HTML (html: true) to meet the needs of rich text, similar to the traditional rehypeRaw, but this will bring the risk of XSS.
[0283] (2) Security boundary recommendations: Untrusted input must be sanitized (tag / attribute whitelist) before entering the renderer, or HTML should be turned off and only Markdown syntax should be allowed; trusted input can be rendered directly to improve performance.
[0284] (3) Code content escape: The code block content and language field are escaped when output on the Worker side to avoid the code from destroying the DOM structure.
[0285] The beneficial effects of the technical solution provided in this application include:
[0286] (1) Performance improvement (main thread decoupling): CPU-intensive tasks such as Markdown parsing, tokenization, signature calculation, and diffing are migrated to Worker, which significantly reduces main thread blocking and makes input and scrolling smoother.
[0287] (2) Rendering range convergence (structural increment): Through segment-level patch, only the changed segments are updated in the DOM, and the generated segments are not destroyed and rebuilt, avoiding the jitter and flicker caused by full re-rendering.
[0288] (3) End-to-end latency controllable (back pressure scheduling): single-flight+ covers the latest content to avoid worker queue accumulation; maintains the upper limit of latency even when the update frequency is high, solving the experience problem of "the more it is generated, the slower it becomes".
[0289] (4) Enhanced frame stability (combined frame writing to DOM): RAF frame combining reduces layout / paint jitter caused by multiple DOM writes in the same frame, improving rendering stability and scrolling responsiveness.
[0290] (5) Repaint diffusion suppression (CSS containment): By using mechanisms such as contain: layout paint and BFC, the rendering impact range is isolated, reducing the probability of historical areas being repainted due to new additions at the tail.
[0291] (6) Correctness guarantee (full semantic parsing + incremental rendering): Worker ensures the semantic correctness of Markdown based on "full content parse" (context inference without relying on delta), while implementing incremental updates with structure diff, taking into account both correctness and performance.
[0292] (7) Stability and recoverability: In the event of message switching, abnormal parsing or disconnection, the system can reset and re-render the latest content to ensure system availability; in weak network / reconnection scenarios, the system can continue to add content without destroying historical content.
[0293] (8) The project is feasible and scalable: The solution is based on the standard Web Worker + markdown-it, which is easy to integrate into existing React projects; and it can be extended to support complex blocks (Mermaid / charts / interruption questions) and lazy highlighting and other enhanced capabilities.
[0294] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.
[0295] Based on the same inventive concept, this application also provides a streaming content rendering apparatus for implementing the streaming content rendering method described above. The solution provided by this apparatus is similar to the implementation described in the above method; therefore, the specific limitations in one or more streaming content rendering apparatus embodiments provided below can be found in the limitations of the streaming content rendering method described above, and will not be repeated here.
[0296] In one embodiment, such as Figure 11 As shown, a streaming content rendering apparatus is provided, comprising: an acquisition module 1102, a processing module 1104, a generation module 1106, a conversion module 1108, and a rendering module 1110, wherein:
[0297] The acquisition module 1102 is used to acquire text information for updating streaming content.
[0298] The processing module 1104 is used to split the text information to obtain an incremental update unit sequence.
[0299] The generation module 1106 is used to generate a signature digest of each incremental update unit based on each incremental update unit in the incremental update unit sequence that is used to independently update the streaming content; and to generate an incremental update patch based on the signature digest of each incremental update unit.
[0300] The conversion module 1108 is used to convert the incremental update patch into a document object model for rendering the streaming content.
[0301] The rendering module 1110 is used to perform incremental rendering of the streaming content based on the document object model.
[0302] In one embodiment, the text information is rich text description content; the apparatus further includes: an end module, configured to end the process when the second thread is detected to be in a non-idle state by the first thread; a sending module, configured to send an update request for updating the streaming content to the second thread when the second thread is detected to be in an idle state by the first thread; the update request carries the full amount of the rich text description content and an update identifier for marking the streaming content; the processing module is further configured to, when the second thread receives the update request, perform splitting processing on the rich text description content based on the update identifier by the second thread to obtain an incremental update unit sequence.
[0303] In one embodiment, the processing module is further configured to parse the rich text description content based on the update identifier through the second thread to obtain a structured tag sequence, hierarchical information, and nesting information; the hierarchical information and the nesting information are used to reflect the hierarchical structural relationship between the tags contained in the tag sequence; and the tag sequence is split based on the hierarchical information and the nesting information to obtain an incremental update unit sequence.
[0304] In one embodiment, the processing module is further configured to split the marker sequence based on the hierarchical information and the nesting information to obtain a block set; the generation module is further configured to generate an incremental update unit sequence based on the block set; each incremental update unit in the incremental update unit sequence is a block-level semantic unit.
[0305] In one embodiment, the processing module is further configured to perform a linear scan of the marker sequence; when a first marker is scanned, and the identifier of the first marker is a start identifier, the hierarchical information of the first marker is a preset value, and the nesting information of the first marker is a first value, the first marker is used as the starting point of a block; when a second marker is scanned, and the identifier of the second marker is an end identifier, the second marker is used as the ending point of a block, thus obtaining a block; or, when a third marker is scanned, and the nesting information of the third marker is a second value, the third marker is used as a block; or, when a fourth marker is scanned, and the fourth marker belongs to an abnormal structure, the fourth marker is used as a block; and the blocks obtained by the scan are combined into the block set.
[0306] In one embodiment, the streaming content is displayed on a session page; the acquisition module is further configured to acquire text information for updating the streaming content in response to an information input operation triggered on the session page; the device further includes: a display module, configured to display the streaming content in the session page in the form of a session message; wherein the streaming content includes at least text data and image data.
[0307] In one embodiment, the text information is rich text description content; the acquisition module is further configured to acquire the input text corresponding to the information input operation triggered in the session page in response to the information input operation; the processing module is further configured to process the input text through an AI model to obtain the rich text description content used to update the streaming content.
[0308] In one embodiment, the apparatus further includes: an extraction module for extracting key fields within each incremental update unit; and a mixing module for mixing the key fields into a rolling hash of a preset number of bits in a preset order to obtain a text digest; wherein the text digest is the signature digest; and the signature digest is used to quickly locate the change range of the streaming content.
[0309] In one embodiment, the apparatus further includes: a verification module for verifying the correctness of the signature digest and obtaining a verification result; and a generation module for generating an incremental update patch based on the hierarchical relationship between the signature digest and each of the incremental update units, provided that the verification result indicates that the verification has passed.
[0310] In one embodiment, the acquisition module is further configured to acquire, for each incremental update unit, the signature digest of the current incremental update unit and the signature digest of the previous incremental update unit through a second thread; the apparatus further includes: a comparison module, configured to compare the signature digest of the current incremental update unit and the signature digest of the previous incremental update unit through the second thread to obtain a comparison result; the generation module is further configured to generate an incremental update patch based on the comparison result.
[0311] In one embodiment, the apparatus further includes: a scanning module, configured to perform prefix and suffix scanning on the signature digest of the current incremental update unit and the signature digest of the previous incremental update unit to obtain the head difference position of the forward scan and the tail difference position of the reverse scan; based on the head difference position and the tail difference position, to obtain a description result for representing deleted or inserted content; the description result is the comparison result.
[0312] In one embodiment, the conversion module is further configured to convert the incremental update patch into a document object model for rendering the streaming content via a second thread; the apparatus further includes: a sending module configured to send the document object model to a first thread via the second thread, so that the first thread performs incremental rendering of the streaming content based on the document object model.
[0313] In one embodiment, the apparatus further includes: an adding module, configured to add the document object model to the callback function of the target interface via the first thread; and an execution module, configured to, upon detecting a call request from the target interface, perform an operation to write the document object model through the target interface to obtain the updated content of the streaming content.
[0314] In one embodiment, the apparatus further includes: a caching module, configured to cache at least two document object models when the number of document object models is not a preset value and at least two document object models belong to the same frame content; an acquisition module is further configured to acquire a target document object model from the at least two document object models; and a processing module is further configured to delete a preset number of nodes at the starting position and insert new nodes based on the description in the target document object model to obtain the updated content of the streaming content.
[0315] The modules in the aforementioned streaming content rendering apparatus can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in hardware within or independently of the processor in a computer device, or stored in software within the memory of the computer device, so that the processor can invoke and execute the operations corresponding to each module.
[0316] In one embodiment, a computer device is provided, which may be a terminal or a server. In this embodiment, the computer device is described as a terminal, and its internal structure diagram is as follows. Figure 12 As shown, the computer device includes a processor, memory, input / output interfaces, a communication interface, a display unit, and an input device. The processor, memory, and input / output interfaces are connected via a system bus, and the communication interface, display unit, and input device are also connected to the system bus via the input / output interfaces. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The input / output interfaces are used for exchanging information between the processor and external devices. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, mobile cellular networks, NFC (Near Field Communication), or other technologies. When the computer program is executed by the processor, it implements a method for rendering streaming content. The display unit of the computer device is used to form a visually visible image. It can be a display screen, a projection device, or a virtual reality imaging device. The display screen can be an LCD screen or an e-ink screen. The input device of the computer device can be a touch layer covering the display screen, or buttons, trackballs, or touchpads set on the casing of the computer device, or external keyboards, touchpads, or mice, etc.
[0317] Those skilled in the art will understand that Figure 12 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0318] In one embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above-described method embodiments.
[0319] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps in the above method embodiments.
[0320] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.
[0321] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.
[0322] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, etc., and are not limited to these.
[0323] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0324] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. A method for rendering streaming content, characterized in that, The method includes: Retrieve the text information used to update the streaming content; The text information is split to obtain an incremental update unit sequence; Based on each incremental update unit in the incremental update unit sequence used to independently update the streaming content, a signature digest of each incremental update unit is generated. Based on the signature digest of each incremental update unit, an incremental update patch is generated; The incremental update patch is converted into a document object model for rendering the streaming content; Incremental rendering of the streaming content is performed based on the document object model.
2. The method according to claim 1, characterized in that, The text information is rich text description content; After obtaining the text information used to update the streaming content, the method further includes: The process terminates if the first thread detects that the second thread is not idle. If the first thread detects that the second thread is in an idle state, an update request for updating the streaming content is sent to the second thread; the update request carries the full amount of the rich text description content and an update identifier for marking the streaming content; The step of splitting the text information to obtain an incremental update unit sequence includes: When the second thread receives the update request, the rich text description content is split based on the update identifier to obtain an incremental update unit sequence.
3. The method according to claim 2, characterized in that, The step of splitting the rich text description content based on the update identifier through the second thread to obtain an incremental update unit sequence includes: The second thread parses the rich text description content based on the update identifier to obtain a structured tag sequence, hierarchical information, and nesting information; the hierarchical information and the nesting information are used to reflect the hierarchical structural relationship between the tags contained in the tag sequence. The marker sequence is split based on the hierarchical information and the nesting information to obtain an incremental update unit sequence.
4. The method according to claim 3, characterized in that, The step of splitting the marker sequence based on the hierarchical information and the nesting information to obtain the incremental update unit sequence includes: The marker sequence is split based on the hierarchical information and the nesting information to obtain a block set; An incremental update unit sequence is generated based on the block set; each incremental update unit in the incremental update unit sequence is a block-level semantic unit.
5. The method according to claim 4, characterized in that, The step of splitting the marker sequence based on the hierarchical information and the nesting information to obtain a block set includes: Perform a linear scan on the labeled sequence; When a first marker is scanned, and the identifier of the first marker is a start identifier, the hierarchy information of the first marker is a preset value, and the nesting information of the first marker is a first value, the first marker is taken as the starting point of a block; when a second marker is scanned, and the identifier of the second marker is an end identifier, the second marker is taken as the ending point of a block, thus obtaining a block; or, when a third marker is scanned, and the nesting information of the third marker is a second value, the third marker is taken as a block; or, when a fourth marker is scanned, and the fourth marker belongs to an abnormal structure, the fourth marker is taken as a block. The obtained blocks are combined into the block set.
6. The method according to claim 1, characterized in that, The streaming content is displayed on the session page; The step of obtaining text information for updating streaming content includes: In response to an information input operation triggered in the session page, obtain text information for updating streaming content; The method further includes: displaying the streaming content in the form of a session message on the session page; wherein the streaming content includes at least text data and image data.
7. The method according to claim 6, characterized in that, The text information is rich text description content; The method of obtaining text information for updating streaming content in response to an information input operation triggered on the session page includes: In response to an information input operation triggered in the session page, the input text corresponding to the information input operation is obtained; The input text is processed by an AI model to obtain the rich text description content used to update the streaming content.
8. The method according to claim 1, characterized in that, The step of generating a signature digest for each incremental update unit based on the incremental update unit sequence used to independently update the streaming content includes: Extract the key fields from each incremental update unit; The key fields are mixed into a rolling hash of a preset number of bits according to a preset order to obtain a text digest; the text digest is the signature digest. The signature digest is used to quickly locate the range of change in the streaming content.
9. The method according to claim 1, characterized in that, The method further includes: The signature digest is validated for correctness, and the validation result is obtained. The step of generating an incremental update patch based on the signature digest of each incremental update unit includes: If the verification result indicates that the verification is successful, an incremental update patch is generated based on the signature digest and the hierarchical relationship between each incremental update unit.
10. The method according to claim 9, characterized in that, The step of generating an incremental update patch based on the signature digest and the hierarchical relationship between the incremental update units includes: For each incremental update unit, the signature digest of the current incremental update unit and the signature digest of the previous incremental update unit are obtained through the second thread. The second thread compares the signature digest of the current incremental update unit with the signature digest of the previous incremental update unit to obtain the comparison result. An incremental update patch is generated based on the comparison results.
11. The method according to claim 10, characterized in that, The step of comparing the signature digest of the current incremental update unit with the signature digest of the previous incremental update unit to obtain a comparison result includes: Perform prefix and suffix scanning on the signature digest of the current incremental update unit and the signature digest of the previous incremental update unit to obtain the head difference position of the forward scan and the tail difference position of the reverse scan. Based on the head difference position and the tail difference position, a description result is obtained to represent the deleted or inserted content; the description result is the comparison result.
12. The method according to claim 1, characterized in that, The step of converting the incremental update patch into a document object model for rendering the streaming content includes: The incremental update patch is converted into a document object model for rendering the streaming content via a second thread; The incremental rendering of the streaming content based on the document object model includes: The second thread sends the document object model to the first thread, so that the first thread performs incremental rendering of the streaming content based on the document object model.
13. The method according to claim 12, characterized in that, The incremental rendering of the streaming content based on the document object model includes: The first thread adds the document object model to the callback function of the target interface; When a call request to the target interface is detected, the operation of writing to the document object model is performed through the target interface to obtain the updated content of the streaming content.
14. The method according to claim 1, characterized in that, The incremental rendering of the streaming content based on the document object model includes: If the number of document object models is not a preset value and at least two document object models belong to the same frame content, at least two document object models will be cached. Obtain the target document object model from at least two of the document object models; Based on the description in the target document object model, a preset number of nodes are deleted at the starting position, and new nodes are inserted to obtain the updated content of the streaming content.
15. A rendering apparatus for streaming content, characterized in that, The device includes: The acquisition module is used to acquire text information for updating streaming content; The processing module is used to split the text information to obtain an incremental update unit sequence; The generation module is used to generate a signature digest of each incremental update unit based on each incremental update unit in the incremental update unit sequence that is used to independently update the streaming content; and to generate an incremental update patch based on the signature digest of each incremental update unit. A conversion module is used to convert the incremental update patch into a document object model for rendering the streaming content; The rendering module is used to perform incremental rendering of the streaming content based on the document object model.
16. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 14.
17. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 14.
18. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 14.