Feature image-based interpretation and interaction method, device, equipment and medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing user profiles and combining them with natural language processing and visual generation technologies, the problem of users understanding complex text in health insurance underwriting systems has been solved, enabling personalized information presentation and interactive feedback, and improving the efficiency and accuracy of the underwriting process.

CN122196140APending Publication Date: 2026-06-12PING AN HEALTH INSURANCE CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: PING AN HEALTH INSURANCE CO LTD
Filing Date: 2026-04-14
Publication Date: 2026-06-12

Application Information

Patent Timeline

14 Apr 2026

Application

12 Jun 2026

Publication

CN122196140A

IPC: G06F16/3329; G06F16/334; G06F16/335; G06F40/30; G06F40/205; G06F40/56; G06Q40/08

AI Tagging

Application Domain

Natural language translation Digital data information retrieval

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A language model accurate translation method and device based on term evaluation
CN121981132BNatural language translation Semantic analysis
A method and apparatus for speech translation based on cooperative attention
CN115953999BNatural language translation Speech recognition
system
JP2026096476ANatural language translationCreation/generation of source code
Cross-language word alignment method and device based on evolutionary learning and generative adversarial network, and medium
CN115759056BNatural language translation Biological models
Language modelling with factorization memory
US20260161894A1Natural language translation

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122196140A_ABST

Patent Text Reader

Abstract

The application relates to the technical field of semantic parsing, and discloses an explanation and interaction method and device based on a feature portrait, equipment and a medium, which comprises the following steps: acquiring source professional text data and historical interaction behavior data, constructing a target object feature portrait and screening associated business entities; extracting key business entities through a natural language processing model, converting the source professional text data into natural language explanation content; synthesizing corresponding dynamic visual demonstration content by using a visual generation model; outputting the natural language explanation content and the dynamic visual demonstration content on an interaction terminal; and generating interaction feedback information based on the target object feature portrait and the natural language explanation content, according to a real-time query instruction. The application can be applied to business scenarios such as financial technology, and through semantic reconstruction, entity association and visual synthesis, the application realizes the collaborative presentation of text, vision and interaction information, reduces the difficulty of professional information understanding, and improves information acquisition efficiency and interaction experience.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of semantic parsing technology, and in particular to a method, apparatus, device, and medium for interpretation and interaction based on feature profiling. Background Technology

[0002] In the current fintech business, especially in online health insurance applications, users are presented with a large amount of textual content characterized by legal rigor and dense technical terminology, including explanations of insurance liabilities, health disclosure terms, and claims service instructions. This content is typically complex in structure, logically rigorous, and involves numerous interdisciplinary medical and legal concepts. Ordinary policyholders, lacking the relevant professional knowledge, often struggle to accurately understand the internal logic of the terms, the boundaries of exclusions, and the interrelationships between different clauses within a limited timeframe.

[0003] Most existing health insurance application systems use a uniform, static text display method, presenting complex terms and conditions with a fixed layout. This information presentation model lacks the ability to adapt to individual differences among policyholders, failing to personalize content organization and highlight key points based on users' different reading comprehension abilities, risk awareness levels, past health conditions, and focus areas. Policyholders are prone to comprehension difficulties due to information overload, potentially overlooking crucial disclosures closely related to their health conditions, or misinterpreting the specific meaning of important coverage areas and exclusions.

[0004] Furthermore, existing systems generally lack the ability to translate rigorous legal clauses into easily understandable everyday interpretive language, and also lack the technical means to assist users in understanding complex business processes (such as claims application steps and health declaration completion logic) through visualization. The entire insurance application process remains a one-way, passive information transmission model. When users encounter difficulties in understanding, they cannot obtain timely and targeted answers and guidance, resulting in insufficient accuracy in health declaration completion, low efficiency in insurance decision-making, and potential cognitive biases and disputes in subsequent claims services. Summary of the Invention

[0005] The main objective of this invention is to provide a method, apparatus, device, and storage medium for interpretation and interaction based on feature profiling, aiming to solve the technical problem that existing technologies cannot perform semantic reconstruction, visualization, and interactive interpretation of complex professional text content based on user characteristics, resulting in significant obstacles for users in information understanding, key point identification, and real-time feedback acquisition.

[0006] To achieve the above objectives, the present invention provides an interpretation and interaction method based on feature profiling, comprising: Acquire source professional text data for the business scenario to be processed, and retrieve historical interaction behavior data of the target object from the database; The historical interaction behavior data is subjected to feature analysis to construct a feature profile of the target object, and related business entities are filtered from the source professional text data based on the feature profile of the target object; The source professional text data is semantically analyzed using a natural language processing model to extract key business entities, and the source professional text data is converted into natural language explanation content based on the key business entities and the related business entities. Based on the source professional text data and the key business entities, dynamic visual demonstration content corresponding to the natural language explanation content is synthesized using a visual generation model. The natural language explanation and the dynamic visual demonstration are output on the interactive terminal. Based on the target object feature profile and the natural language interpretation content, interactive feedback information is generated for the real-time query command input by the interactive terminal.

[0007] Furthermore, to achieve the above objectives, the present invention provides an interpretation and interaction device based on feature profiling, comprising: The data acquisition and parsing module is used to acquire source professional text data in the business scenario to be processed, and to retrieve historical interaction behavior data of the target object from the database; The user profile building module is used to perform feature analysis on the historical interaction behavior data to build a target object feature profile, and to filter related business entities from the source professional text data based on the target object feature profile; The semantic explanation generation module is used to perform semantic analysis on the source professional text data using a natural language processing model to extract key business entities, and to convert the source professional text data into natural language explanation content based on the key business entities and the related business entities. The visual content generation module is used to synthesize dynamic visual presentation content corresponding to the natural language interpretation content based on the source professional text data and the key business entities using a visual generation model. A multimodal display control module is used to output the natural language explanation content and the dynamic visual demonstration content on the interactive terminal; The intelligent interactive feedback module is used to generate interactive feedback information for real-time query commands input by the interactive terminal based on the target object feature profile and the natural language interpretation content.

[0008] Furthermore, to achieve the above objectives, the present invention also provides a computer device, the computer device including a memory, a processor, and a feature-based profile interpretation and interaction program stored in the memory and executable on the processor, wherein when the feature-based profile interpretation and interaction program is executed by the processor, it implements the steps of the feature-based profile interpretation and interaction method as described above.

[0009] Furthermore, to achieve the above objectives, the present invention also provides a computer-readable storage medium storing a feature-based profile interpretation and interaction program, wherein the feature-based profile interpretation and interaction program, when executed by a processor, implements the steps of the feature-based profile interpretation and interaction method as described above.

[0010] Beneficial Effects: This invention relates to the field of semantic parsing technology, and discloses a method, apparatus, device, and medium for explanation and interaction based on feature profiling. The method includes: acquiring source professional text data and historical interaction behavior data; constructing a feature profile of the target object and filtering related business entities; extracting key business entities through a natural language processing model; converting the source professional text data into natural language explanation content; synthesizing corresponding dynamic visual demonstration content using a visual generation model; outputting the natural language explanation content and dynamic visual demonstration content on an interactive terminal; and generating interactive feedback information for real-time query commands based on the target object feature profile and the natural language explanation content. This invention can be applied to business scenarios such as fintech, achieving the coordinated presentation of text, visual, and interactive information through semantic reconstruction, entity association, and visual synthesis, reducing the difficulty of understanding professional information, and improving information acquisition efficiency and interactive experience. Attached Figure Description

[0011] The present invention will be further described below with reference to the accompanying drawings and embodiments. In the accompanying drawings: Figure 1 This is a schematic diagram of an application environment for the feature-based profiling interpretation and interaction method in one embodiment of the present invention; Figure 2 This is a flowchart illustrating an embodiment of the feature-based profiling interpretation and interaction method of the present invention; Figure 3 This is a schematic diagram of the functional modules of a preferred embodiment of the feature-based profiling interpretation and interaction device of the present invention; Figure 4 This is a schematic diagram of the structure of a computer device according to an embodiment of the present invention; Figure 5 This is another structural schematic diagram of a computer device according to one embodiment of the present invention. Detailed Implementation

[0012] It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of the invention.

[0013] The feature-based profiling-based interpretation and interaction method provided in this invention can be applied to, for example... Figure 1 In this application environment, the client communicates with the server via a network. The server can obtain source professional text data and historical interaction behavior data from the client, construct a target object feature profile, and filter related business entities; extract key business entities through a natural language processing model, converting the source professional text data into natural language explanation content; synthesize corresponding dynamic visual demonstration content using a visual generation model; output natural language explanation content and dynamic visual demonstration content on the interactive terminal; and generate interactive feedback information for real-time query commands based on the target object feature profile and natural language explanation content. This invention can be applied to business scenarios such as fintech, achieving the collaborative presentation of text, visual, and interactive information through semantic reconstruction, entity association, and visual synthesis, reducing the difficulty of understanding professional information, and improving information acquisition efficiency and interactive experience. The client can be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be implemented using a standalone server or a server cluster consisting of multiple servers. The invention will be described in detail below through specific embodiments.

[0014] Please see Figure 2 , Figure 2 This is a flowchart illustrating an embodiment of the feature-based profiling interpretation and interaction method provided by the present invention. It should be noted that although the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than that shown here.

[0015] like Figure 2 As shown, the feature-based profiling-based interpretation and interaction method proposed in this invention includes the following steps: S10: Obtain source professional text data in the business scenario to be processed, and retrieve historical interaction behavior data of the target object from the database; In this embodiment, obtaining source professional text data in the business scenario to be processed involves extracting text content that can be used for semantic processing from the interactive interface or document carrier carrying business information. The business scenario to be processed refers to the page environment used to display terms, explanations, and prompts in the insurance application process. The page environment includes page structure, text node distribution, text node hierarchy, and dynamically loaded areas. Source professional text data refers to the text set extracted and organized from the page environment. The text set maintains the integrity of business semantics and has reusable paragraph boundaries. During implementation, the page document object model structure is parsed to locate text nodes containing business semantics. Text nodes are located based on tag position, hierarchical path, or node attributes. Traversal extraction is performed on the located text nodes, and the extraction results include node text, node order, and node hierarchy. To avoid styles and scripts polluting the text semantics, cascading style sheet code and script code are removed from the extraction results. The removal operation is completed through tag filtering, attribute filtering, or text pattern matching. After removal, a pure text character sequence is obtained. The pure text character sequence maintains the semantic expression of the original business content but removes presentation layer control characters. The plain text character sequence is denoised and cleaned to remove redundant whitespace, duplicate segments, abnormal characters, and invalid placeholders. Denoising and cleaning may include character normalization, merging duplicate segments, and deleting invalid paragraphs. The denoised and cleaned plain text character sequence is then restructured into paragraphs to restore the business narrative structure. Paragraph restructuring is based on punctuation boundaries, line break boundaries, heading style tags, or node hierarchy relationships, and the output is source professional text data with paragraph boundaries.

[0016] Retrieving historical interaction data of a target object from the database involves restoring the target object's interaction records in business pages and business processes into an analyzable data set. The target object refers to the main entity initiating the business operation, which possesses a unique identifier on the system side. Historical interaction data refers to the set of records of browsing, clicking, and business input behaviors generated by the target object in past business sessions. This set includes event type, event occurrence time, event target location, and event-related parameters. During implementation, the target object's login session credentials are parsed to obtain a unique identifier. The login session credentials contain a session identifier, a token field, or a signature field. The unique identifier is obtained through field parsing, signature verification, or session mapping. A related data retrieval request is initiated to the database using the unique identifier, specifying the time range, event type range, and business scenario range. The database retrieval returns past browsing logs, function click logs, and business application records bound to the unique identifier. To ensure consistent usage across different log sources, formatting is performed on the past browsing logs, function click logs, and business application records. Formatting includes field normalization, event type standardization, and parameter field extraction. The formatted results undergo time-series alignment, which is based on timestamp sorting, session segmentation, and event deduplication to ensure that the event sequence within the same session maintains its true occurrence order. The output of formatting and time-series alignment constitutes historical interaction behavior data, which has a unified structure and can express the interaction trajectory of the target object in the business scenario.

[0017] This embodiment locates text nodes by parsing the page document object model structure and performs style and script removal, noise removal and cleaning, and paragraph reorganization, so that the source professional text data has semantic integrity and paragraph boundary consistency. By parsing login session credentials to obtain unique identifiers and performing database retrieval, formatting and time sequence alignment, the historical interaction behavior data has a unified structure and the actual occurrence order, reducing the sensitivity of subsequent processing to data noise and structural differences.

[0018] S20, perform feature analysis on the historical interaction behavior data to construct a target object feature profile, and filter related business entities from the source professional text data based on the target object feature profile; In this embodiment, feature analysis of historical interaction behavior data is used to transform scattered and heterogeneous interaction records into a structured expression that can characterize the stable preferences and attention orientations of the target object. Historical interaction behavior data comes from browsing logs, function click logs, and business application records. This data includes time dimensions, behavior types, behavior objects, and contextual parameters, reflecting the target object's actual operational trajectory in the business scenario. Feature analysis focuses on behavior frequency, behavior sequence, dwell time, repeated access patterns, and explicit selection results. By statistically aggregating different behavioral dimensions, data expressions reflecting the target object's attention intensity and decision-making tendency are obtained. To avoid bias caused by single behaviors, a preference weighting mechanism is introduced, assigning different weights to different behavior types, so that long-term stable behaviors occupy a higher proportion in the overall analysis. By combining the weighted multi-dimensional behavioral features, a target object feature profile is formed. The feature profile expresses the target object's differentiated characteristics in terms of business understanding ability, risk tolerance, and information focus in a structured form.

[0019] After constructing the target object's feature profile, this profile is used to selectively filter the source professional text data. The source professional text data contains multiple business semantic units, each differing in business importance and comprehension difficulty. The process of filtering related business entities uses the target object's feature profile as a constraint, mapping the focus reflected in the profile to the text semantic space. Through semantic matching, weight mapping, or similarity judgment, text entities highly relevant to the target object's concerns are identified. Related business entities refer to text entities that have a direct impact on the target object's decision-making behavior in the current business scenario. These entities retain the original text's business semantics while also reflecting their relevance to the target object's individual characteristics. The filtering process ensures that the output related business entities maintain semantic consistency with the target object's feature profile, thus providing a targeted content foundation for subsequent explanation and interaction.

[0020] This embodiment performs multi-dimensional feature analysis on historical interaction data and constructs a feature profile of the target object, so that individual preferences and focus are stably expressed. Then, based on the feature profile, business entities in the source professional text data are filtered, which can reduce the interference of irrelevant information and enable subsequent processing to focus on the business content most relevant to the target object's decision-making, thereby improving the targeting and effectiveness of information matching.

[0021] S30, use a natural language processing model to perform semantic analysis on the source professional text data to extract key business entities, and convert the source professional text data into natural language explanation content based on the key business entities and the related business entities. In this embodiment, a natural language processing (NLP) model is used to perform semantic analysis on the source technical text data. The aim is to transform the originally tightly structured, terminologically dense, and logically nested text content into a semantic expression that can be understood by the computing system. To handle the complex and technically terminological text content in insurance clauses, the system employs a NLP model based on the Transformer architecture. The core task of this model is twofold: first, it needs to deeply understand the semantics of the clause text, automatically identifying and extracting key business entities such as "insurance amount," "exclusions," "list of specific hospitals," and "waiting period"; second, it needs to be able to use these extracted entities and related entities selected through user-specific feature analysis as guidance to transform the originally obscure legal provisions into a coherent and easily understood conversational explanation.

[0022] The model uses its encoder to parse the entire clause text using a self-attention mechanism, grasping the logical relationship between clauses such as "exclusions" and "insurance liabilities," as well as the medical implications of various inquiries in the "health declaration." A dedicated entity recognition module acts like a highlighter, annotating all important business concepts and their types in the text. When an explanation is needed, the model's decoder begins to work. It not only relies on the full-text context understood by the encoder but also pays special attention to "relevant entities" highly relevant to the current user (e.g., a clause on "pre-existing conditions" that a user with a specific medical history would be interested in), thus selectively adjusting the focus and level of detail in the generated explanatory text.

[0023] To enable the model to accurately handle the specificities of the insurance field, its training process includes a large amount of professional corpus such as insurance contracts, claims cases, and health declarations, familiarizing it with the precise meanings of professional terms such as "cash value," "premium waiver," and "specific drug list." Furthermore, the system integrates a knowledge graph of the insurance field to assist in disambiguation during entity recognition and ensure that the stated coverage, exclusions, and other factual relationships are strictly consistent with the original intent of the terms when generating interpretations, avoiding any misleading information.

[0024] In practice, the model's input consists of cleaned and standardized insurance policy text, and a list of related entities derived from user profiling. Its output comprises two parts: a structured list of key business entities, indicating the location and category of each entity in the original text; and a generated, complete natural language explanation. For example, for a complex policy containing coverage for "proton and heavy ion therapy," the model can not only extract the key entity "proton and heavy ion therapy," but also, considering the user's potential focus on "advanced therapy coverage," generate the following explanation: "This insurance covers advanced cancer radiotherapy technology—proton and heavy ion therapy. This means that if you unfortunately suffer from an applicable disease as stipulated in the contract, reasonable expenses related to receiving this specific treatment can be reimbursed within the scope of insurance liability." In this way, the model transforms professional insurance language into intuitive information that is easily understood and used by users, effectively improving the efficiency and experience of the insurance application process.

[0025] Source technical text data typically consists of clauses, limiting statements, conditional descriptions, and result explanations. The text contains numerous technical terms and complex semantic relationships, which can create comprehension barriers if directly used for interactive presentation. Semantic analysis revolves around text segmentation, entity boundary recognition, syntactic relation parsing, and contextual semantic modeling. By identifying the logical dependencies between terms in the text, candidate term sequences with semantic labels are obtained.

[0026] The extraction of key business entities relies on matching candidate term sequences with a predefined domain knowledge structure. The candidate term sequences contain terms describing different categories such as business objects, conditional constraints, scope of responsibility, and operational results. These terms are aligned with standard terms in the domain knowledge structure through semantic mapping to identify key business entities that hold a core semantic position in the current business scenario. Key business entities represent the most business-meaning units in the text, preserving the original semantics while possessing structured features for easy subsequent reorganization and expression.

[0027] After obtaining the key business entities, related business entities obtained from the previous screening process are introduced. These related business entities embody the focus of the target object's feature profile in the textual semantic space, while the key business entities embody the core semantics of the text itself; the two are fused at the semantic level. This fusion process is achieved by assigning attention weights to different entities, reflecting the importance of each entity in the interpretation and expression. Based on these attention weights, the source professional text data is weighted and encoded to form an encoded input sequence. This encoded input sequence preserves the original semantic logic of the text while highlighting semantic content highly relevant to the target object.

[0028] After the encoded input sequence is fed into the natural language processing model, the model's semantic decoding capability transforms the structured semantic expression into natural language explanations that conform to everyday language habits. While maintaining the accuracy of the business meaning, the natural language explanations de-emphasize technical jargon and enhance semantic coherence and readability, transforming complex text into easily understandable explanations.

[0029] This embodiment extracts key business entities through semantic analysis, performs attention-weighted encoding in conjunction with related business entities, and then uses a natural language processing model for semantic decoding. This transforms professional text into easily understandable explanatory expressions while maintaining business accuracy, thereby reducing the understanding threshold caused by technical terms and improving the clarity and readability of information transmission.

[0030] S40, Based on the source professional text data and the key business entities, use a visual generation model to synthesize dynamic visual demonstration content corresponding to the natural language explanation content; In this embodiment, visual content is generated based on source professional text data and key business entities. The aim is to transform abstract semantic interpretations into visual expressions with spatial morphology and temporal variation characteristics. The source professional text data contains semantic elements such as business object relationships, conditional constraints, and behavioral results. Key business entities represent the core objects and logical nodes among these semantic elements.

[0031] To transform complex policy terms into intuitive, dynamic visual presentations, the system employs a multi-module collaborative visual generation model. The core task of this model is to automatically create a narrative-driven animation or video content that strictly corresponds to the original insurance policy text, extracted key business entities, and pre-generated natural language interpretations. Its implementation relies on a composite architecture integrating generative adversarial networks, temporal prediction models, and a rendering engine. It primarily includes core modules such as script parsing, scene and character modeling, sequence generation, and multimodal synchronization. These modules work collaboratively and progressively through clearly defined data interfaces.

[0032] The model's operation begins with the script parsing and structuring module. This module receives natural language interpretations and, through a Transformer-based visual language understanding layer, deconstructs the text description into a structured representation containing entities, actions, spatial relationships, and temporal order. For example, for the description "submit claim materials," the module identifies the action (submission), the subject (user), the object (materials), the implicit scene (insurance counter), and the action's temporal position within the overall process. The output is a machine-executable dynamic storyboard, which defines in detail the visual elements, character behaviors, shot transitions, and duration of each shot.

[0033] Next, the scene and character modeling module, based on the aforementioned script and combined with richer contextual details from the source professional text data (such as the architectural features of "Level II and above public hospitals") and the specific objects pointed to by key business entities (such as the style of "insurance claim application form"), creates or invokes visual elements. This module contains a conditional generative network (such as StyleGAN or a latent diffusion model), trained on a large number of insurance business scene images, UI animations, and healthcare-related visual materials. This network can generate credible visual assets that meet the stringent requirements of the insurance industry based on text descriptions. For example, when a "hospital" scene is needed, it can generate or match a generic yet detailed 3D model or 2D background of a hospital lobby from the material library; when a "claims adjuster" role is needed, it can generate a cartoon or realistic character model in professional attire. The module outputs a series of visual material objects with key attribute tags that match the script requirements.

[0034] Then, the animation sequence generation and rendering module is responsible for compositing the storyboard and visual footage into a continuous dynamic sequence. At the core of this module is a timing generation model (such as a Transformer-based video prediction model or a controllable animation compositing engine). Following the timeline of the storyboard, it drives the character models to perform the actions specified in the script (such as walking or delivering documents), controls smooth scene transitions, and generates intermediate frames to ensure fluidity. For example, in a demonstration of "from diagnosis to filing a claim," this module generates a continuous frame sequence showing the patient moving from the hospital scene to the insurance company scene. Simultaneously, a rendering submodule adds lighting, textures, and basic visual effects to these sequences, outputting a high-quality continuous visual image sequence.

[0035] Finally, the multimodal temporal alignment and synthesis module ensures precise synchronization between visual content and audio. This module receives a digital speech stream generated from the natural language interpretation content, along with its detailed phoneme timestamp information. Using a timeline mapping algorithm, this module adjusts the display time of each frame in the continuous visual image sequence, ensuring that visual events (such as displaying specific clause text or highlighting key entities) are perfectly aligned with the corresponding words and phrases in the speech narration. The adjusted image sequence and audio stream are then fed into a media container, ultimately synthesized and output as a dynamic visual presentation content file (such as MP4 format) that can be directly played on web pages or mobile devices.

[0036] To achieve deep integration with the insurance sector, the training set for the entire visual generation model includes a large amount of video-text pair data consisting of insurance business processes, medical and health science animations, and compliant financial product promotional videos. This ensures that the generated visual expressions are accurate in content (such as correctly displaying policy document styles) and professional and appropriate in tone. During the inference phase, the model can also be guided by a lightweight insurance domain knowledge constraint, which, based on rules or a small discriminative network, performs compliance checks on the generated keyframes to prevent misleading visual expressions.

[0037] Before visual generation, the natural language explanation content needs to undergo semantic segmentation and action semantic recognition. Semantic segmentation breaks down continuous explanation statements into multiple content units with independent expressive meaning, while action semantic recognition identifies implicit behavioral relationships, state changes, and causal relationships within the explanation statements. This process generates a storyboard, which includes scene descriptions, object relationships, temporal sequence, and action instructions.

[0038] Key business entities are used to identify the core representation objects in the visual scene. Based on the specific business objects represented by these key business entities, corresponding virtual character models and environment models are constructed. The virtual character model embodies the concrete form of the business object, while the environment model embodies the semantic context in which the business occurs; together, they form the spatial basis of visual expression. Conditional relationships, limiting descriptions, and logical sequences in the source professional text data are used to determine the switching methods and display order between different scenes.

[0039] After the storyboard, virtual character model, and environment model are input into the visual generation model, the model generates a continuous sequence of visual images frame by frame according to the semantic instructions in the storyboard. The continuous sequence of visual images reflects the changes in logical relationships, state evolution, and condition-triggered results between key business entities.

[0040] The natural language interpretation is converted into a digital speech stream by a speech synthesis module. This digital speech stream contains time-dimensional phoneme markers. These phoneme markers represent the precise position of each speech segment on the timeline. The continuous visual image sequence is then time-aligned based on these phoneme markers, ensuring that visual changes are synchronized with the speech interpretation. The synchronized visual image sequence is then fused with the digital speech stream to create dynamic visual presentation content that possesses temporal continuity, semantic consistency, and auditory responsiveness.

[0041] This embodiment drives the construction of visual scenes through key business entities, combines semantic segmentation to form a storyboard script, and then uses a visual generation model to generate continuous images and synchronize them with the voice explanation. This transforms abstract text explanations into dynamic visual expressions with spatial form and temporal changes, thereby improving the understanding, intuitiveness and memorability of complex information.

[0042] S50, the natural language explanation content and the dynamic visual demonstration content are output to the interactive terminal; In this embodiment, the interactive terminal outputs natural language explanations and dynamic visual demonstrations. The aim is to present the generated text and visual expressions in a unified human-computer interaction format, enabling the two types of content to coordinate in terms of time and spatial layout. The interactive terminal has the capability to display an interface and support interactive controls. The display interface is used to display both the text and visual demonstration areas, while the interactive controls are used to implement unified control actions.

[0043] The text display area presents natural language explanations, while the visual presentation area presents dynamic visual presentations. Media playback components are deployed within the visual presentation area to drive playback, pause, and progress adjustments of the dynamic visual presentations. The natural language explanations are presented in a scrollable format within the text display area, giving the text time-based controllability.

[0044] A synchronized mapping is established between the text scrolling timeline and the media playback timeline. The text scrolling timeline represents the scrolling progress of the natural language explanation content during the presentation, while the media playback timeline represents the playback progress of the dynamic visual presentation content. This synchronized mapping ensures that the text scrolling progress and the visual playback progress remain consistent.

[0045] A unified playback control is deployed in the display interface to simultaneously control both text scrolling and visual playback progress. After receiving user commands, the unified playback control transmits control signals to both the text display area and the media playback component, ensuring that both change synchronously under simultaneous control.

[0046] Text segments corresponding to key business entities in the natural language interpretation content are identified, and interactive markers are added to these text segments. Interactive markers are used to respond to user actions. When a user triggers an interactive marker, the dynamic visual presentation content in the visual presentation area performs visual focusing, making the key business entities stand out in the visual display.

[0047] This implementation achieves consistency between text explanation and visual presentation at the interaction level through the collaborative presentation of text display area and visual demonstration area, and through timeline synchronization and unified control, thereby enhancing the linkage and comprehensibility of multimodal information.

[0048] S60, based on the target object feature profile and the natural language interpretation content, generate interactive feedback information for the real-time query command input by the interactive terminal.

[0049] In this embodiment, based on the target object feature profile and natural language explanation, interactive feedback information is generated for the real-time query command input by the interactive terminal. The aim is to achieve dynamic understanding and personalized response based on the existing explanation content. After receiving the real-time query command, the interactive terminal performs semantic encoding processing on the real-time query command, converting the query content in natural language form into a computable query intent vector. The query intent vector is used to express the position of the query content in the semantic space, enabling the query content to be matched with existing semantic information.

[0050] The natural language explanation content is segmented into semantic segments during the generation process, with each segment corresponding to a relatively complete semantic unit of explanation. The query intent vector is matched with each semantic segment for similarity; semantic segments whose matching results meet a threshold condition are identified as the response knowledge context. This response knowledge context serves as the basis for the response, avoiding the need to re-retrieve from external knowledge sources and directly locating the most relevant semantic fragments within the already generated explanation content.

[0051] The target user profile includes user preference tags, which reflect users' preferences for expression style, information density, and focus. Personalized response style constraint parameters are constructed based on these user preference tags. These parameters limit the expression of subsequent generated content, ensuring that the generated content conforms to users' understanding habits.

[0052] The query intent vector, the response knowledge context, and personalized response style constraint parameters are jointly input into the dialogue generation model. Based on semantic understanding, the dialogue generation model combines style constraints to generate text and output interactive feedback information. This feedback information maintains consistency with the original explanation while also reflecting the personalized expression embodied in the user's profile.

[0053] This implementation uses semantic matching and personalized style constraints to make interactive feedback information based on existing semantic explanations and combined with user characteristics, thereby improving the accuracy and personalization of responses.

[0054] In one embodiment, step S10 includes: S101, parse the document object model structure of the front-end page of the business scenario to be processed in order to locate the text node containing business terms information; S102, traverse the text nodes and remove cascading style sheet code and script code to extract plain text character sequences, and perform noise removal and paragraph reorganization on the plain text character sequences to form source professional text data; S103, parse the login session credentials of the target object to obtain a unique identifier, and use the unique identifier to initiate a related data retrieval request to the database; S104, retrieve from the database the past browsing logs, function click logs and business application records bound to the unique identifier; S105, the past browsing logs, function click logs and business application records are formatted and time-series aligned to generate historical interaction behavior data.

[0055] In this embodiment, the document object model (DOM) structure of the front-end page of the business scenario to be processed is parsed to locate text nodes containing business terms information. The DOM structure here expresses the hierarchical relationship of page elements, node types, and node attributes. The parsing process includes obtaining the node tree after page rendering, reading node tag types, extracting node attribute fields, and identifying the relative position of nodes in the parent-child hierarchy. When locating text nodes containing business terms information, this information refers to a collection of content carried in text form on the page, such as coverage conditions, liability descriptions, exclusions, and service descriptions. A text node is a node unit that carries text content. The location process includes filtering text-carrying nodes based on node type, filtering nodes containing terms identifiers based on node attributes, filtering nodes located in the terms area based on node path, and filtering nodes that can be displayed to the target object based on node visibility. Through the above parsing and location, subsequent processing focuses on the set of text nodes corresponding to the business terms information, reducing interference from irrelevant page elements in text extraction.

[0056] The process involves traversing text nodes and removing Cascading Style Sheet (CSS) and script code to extract plain text character sequences. These sequences are then denoised, cleaned, and restructured to form source professional text data. The traversal of text nodes involves reading the text content and associated attributes of each node individually. Removing CSS and script code eliminates the pollution of text content caused by display style definitions and interactive scripts. Removal actions include filtering embedded style fragments in nodes, filtering script fragments in node attributes, removing markers and control characters from the text content, and removing invisible characters introduced by rendering. The plain text character sequence is a continuous set of characters extracted and concatenated from multiple text nodes. The concatenation rules can be determined based on the order of nodes in the Document Object Model (DOM) structure, the layout order of nodes on the page, and the clause segmentation identifiers of the nodes. Denoising and cleaning improve the parsability and consistency of the plain text character sequence, including standardizing whitespace, merging duplicate punctuation, standardizing line breaks and indentation, replacing abnormally encoded characters, and cleaning up advertising prompts or non-clause prompts. Paragraph reorganization is used to restore the semantic and structural boundaries of clauses, transforming plain text character sequences into a set of paragraphs usable for subsequent semantic analysis. Paragraph reorganization actions include segmenting paragraphs based on heading patterns, numbering patterns, semantic pauses, and merging or splitting adjacent sentence groups based on keyword anchors. The source professional text data is a structured text expression after paragraph reorganization, which can simultaneously preserve the semantic order of clauses and paragraph boundaries, providing stable input for subsequent entity extraction and interpretation generation.

[0057] The process involves parsing the login session credentials of the target object to obtain a unique identifier, and then using this unique identifier to initiate a related data retrieval request to the database. The login session credentials are credential data generated during the session between the interactive terminal and the server. The parsing process includes verifying the credential format, decoding the credential payload fields, extracting the identifier fields bound to the account or session, and identifying the validity period and signature information of the identifier fields. The unique identifier is a data identifier that can uniquely locate the target object in the database. It can be in the form of an internal account identifier, a session-bound identifier, or a device-bound identifier. The acquisition process must ensure a consistent mapping relationship between the identifier across different sessions and terminals. The related data retrieval request submits the unique identifier as a search condition to the database. The retrieval request must include at least an identifier field, log type filtering conditions, and time range filtering conditions. The time range is used to constrain the statistical window of historical interaction behavior data, and the log type is used to distinguish different data sources such as past browsing logs, function click logs, and business application records, ensuring the traceability and aggregability of the retrieval results.

[0058] The database retrieves past browsing logs, function click logs, and business application records bound to a unique identifier. The database serves as a persistent storage environment for this interactive behavior data. The retrieval process includes index lookup based on the unique identifier, filtering matching records by log type, cropping the record set by time range, and filtering records consistent with the business scenario to be processed by business scenario identifier. Past browsing logs characterize the target object's reading trajectory of page content, with fields including page identifier, dwell time, scroll position, and access time. Function click logs characterize the target object's operational intent on the interactive terminal, with fields including control identifier, number of clicks, click time, and control's region. Business application records characterize the target object's filling and submission behavior of business elements, with fields including application field identifier, fill value type, submission status, and submission time. By retrieving these three types of records, historical interactive behavior data covers three dimensions: reading behavior, operational behavior, and submission behavior, facilitating subsequent feature analysis to form a characteristic profile of the target object.

[0059] Past browsing logs, function click logs, and business application records are formatted and time-series aligned to generate historical interaction behavior data. Formatting converts the field structures and data types of records from different sources into a unified representation. This conversion includes field naming standardization, field type standardization, enumeration value mapping, missing field filling, and outlier removal. Time-series alignment maps time information from different log sources to a unified timeline, avoiding sequence errors caused by differences in timestamp precision, time zone differences, or collection delays. Time-series alignment includes unifying time formats, correcting time zone offsets, time clipping based on session boundaries, and aggregation based on time windows. The historical interaction behavior data is a collection of formatted and time-series aligned behavior data. Its data structure can simultaneously express behavior type, behavior object, behavior time, and behavior intensity, providing a clear data foundation for subsequent multi-dimensional statistics and preference-weighted processing.

[0060] This embodiment locates text nodes by parsing the document object model structure of the front-end page and performs plain text character sequence extraction, noise removal and cleaning, and paragraph reorganization to form source professional text data with clear semantic boundaries. It obtains a unique identifier by parsing the login session credentials and initiates a related data retrieval request to the database. It retrieves past browsing logs, function click logs, and business application records and performs formatting and time sequence alignment to generate historical interaction behavior data with a unified structure and consistent time. This makes the source professional text data and historical interaction behavior data stable and consistent in terms of content source, data structure, and time dimension, which can be used for subsequent processing.

[0061] In one embodiment, step S20 above includes: S201, Perform multi-dimensional statistical analysis and preference weighting on the historical interaction behavior data to generate an anomaly tolerance feature vector and a focus distribution feature vector. S202, map the abnormal tolerance feature vector and the attention point distribution feature vector to a preset user tag library, and aggregate to construct a feature profile of the target object; S203, Perform topic semantic modeling and clause attribute decomposition on the source professional text data to obtain multiple independent candidate business entities containing attribute tags; S204, convert the target object feature profile into a profile vector representation, and convert each candidate business entity into an entity vector representation based on the attribute labels; S205, determine the cosine similarity between the image vector representation and each entity vector representation; S206, the candidate business entities whose cosine similarity exceeds the preset matching threshold are regarded as associated business entities.

[0062] In this embodiment, historical interaction behavior data undergoes multi-dimensional statistical analysis and preference weighting to generate anomaly tolerance feature vectors and attention point distribution feature vectors. This historical interaction behavior data comes from formatted and time-series aligned results of past browsing logs, function click logs, and business application records. The multi-dimensional statistics revolve around the dimensions of behavior intensity, time, content, and result. The behavior intensity dimension characterizes metrics such as click frequency, scrolling amplitude, dwell time, and number of repeated views. The time dimension characterizes day-night distribution, relative time sequence within a session, and dwell position on key pages. The content dimension characterizes the identifiers of viewed clause paragraphs, clicked controls, and filled fields. The result dimension characterizes the application submission status, number of modifications, and withdrawal and re-entry records. Preference weighting transforms the multi-dimensional statistical results into comparable numerical contributions. Weights can be derived from the percentage of dwell time on different clause paragraphs, click density on different controls, and intensity of repeated modifications on different application fields. Alternatively, they can be derived from a pre-defined list of risk-sensitive paragraphs in the business scenario to add additional weight to the statistical items. The anomaly tolerance feature vector expresses the target audience's tolerance for information formats such as anomaly prompts, rejection prompts, and supplementary inquiry prompts. The vector dimension can be composed of changes in dwell time after an anomaly prompt, return rate after an anomaly prompt, percentage of continued input after an anomaly prompt, and frequency of customer service inquiries after an anomaly prompt. Normalization is used to compress indicators with different dimensions into a unified numerical range. The attention distribution feature vector expresses the distribution structure of content of interest to the target audience. The vector dimension can be composed of the distribution of clause paragraph categories, health disclosure question categories, claims service description paragraphs, and fee description paragraphs. Category sources can be from page paragraph identifiers, paragraph title thesaurus, and field identifier mapping tables. The distribution vector is obtained through count normalization or normalization based on dwell time.

[0063] The anomaly tolerance feature vector and the attention distribution feature vector are mapped to a pre-defined user tag library to aggregate and construct a feature profile of the target object. The user tag library provides a unified carrier for tag sets, tag definitions, and mapping rules. The tag set can include information preference tags, risk sensitivity tags, explanation granularity preference tags, interaction intensity tags, etc. The tag definition specifies the judgment conditions and value space of the tag, and the mapping rule specifies how the dimension value of the feature vector triggers the tag. The mapping action can use threshold mapping, interval mapping, or relative ranking mapping. Threshold mapping compares the vector dimension with a pre-defined threshold to output a Boolean tag or a level tag. Interval mapping maps the numerical interval into which the vector dimension falls to a discrete tag value. Relative ranking mapping maps the quantile of the target object in the group to a tag. The aggregated feature profile of the target object combines multiple tags and their confidence scores into a unified profile structure. The profile structure can include tag identifiers, tag values, confidence scores, time windows, and a set of evidence fields. The set of evidence fields records the statistical sources that trigger the tags, such as dwell time statistics, click density statistics, and field modification statistics, making the profile traceable and reusable in subsequent stages.

[0064] The source professional text data undergoes topic semantic modeling and clause attribute decomposition to obtain multiple independent candidate business entities containing attribute labels. The source professional text data is obtained through denoising, cleaning, and paragraph reorganization. Topic semantic modeling extracts topic structure and topic boundaries from the paragraph set. Topic structure can be represented as topic vectors, topic keyword sets, or topic paragraph clusters. Topic boundaries define the scope of paragraphs covered by the same topic. Topic semantic modeling can be achieved through paragraph vector clustering, topic probability distribution estimation, or keyword co-occurrence mapping. Paragraph vectors can come from word embedding aggregation, sentence vector encoding, or paragraph-level encoding. Clustering strategies can be based on cosine distance or density clustering. Clause attribute decomposition breaks down the clause content into entityable attribute units. Attribute units can correspond to attributes such as scope of liability, exclusions, waiting period, deductible, payment conditions, material requirements, and timeliness. Decomposition can be based on title patterns, numbering patterns, key phrase patterns, and punctuation structure patterns. Candidate business entities are used to represent associative objects extracted from text. The independence of candidate business entities is manifested in their clear boundaries and unique identifiers. A candidate business entity can include an entity name, entity category, entity location, entity context fragments, and attribute tags. Attribute tags are used to express the set of attributes carried by the candidate business entity. Attribute tags are derived from the fusion of clause attribute decomposition results and topic semantic modeling results. For example, a candidate business entity may be located under the "Exclusions" topic and contain both "Exclusions" attribute tags and "Trigger Conditions" attribute tags.

[0065] The target object feature profile is converted into a profile vector representation, and each candidate business entity is converted into an entity vector representation based on attribute labels. The profile vector representation encodes the label values and confidence scores in the target object feature profile into numerical representations in a vector space. The encoding method can be label embedding lookup table combined with weighted summation of confidence scores, or mapping label values to dense vectors and then concatenating them to form a fixed-dimensional vector. Time window information can be used to assign different decay coefficients to historical and recent labels. The entity vector representation encodes candidate business entities into vector representations. The entity vector can consist of an entity name semantic vector, an entity context fragment semantic vector, and an attribute label vector. The attribute label vector is obtained by multi-hot encoding or embedding aggregation of the attribute label set. Attribute-based transformation actions are used to explicitly inject attribute constraints into entity vectors. The injection methods can be manifested as applying attribute gating weights to the entity semantic vector, embedding attribute tags and concatenating them with the entity semantic vector, or using different projection matrices for different attribute tags. This ensures that candidate business entities with the same name but different attributes remain distinguishable in the vector space. For example, if two candidate business entities related to payment have different attribute tags, such as one carrying a waiting period attribute tag and the other carrying a deductible attribute tag, the attribute components represented by the entity vectors will differ, thus affecting the subsequent similarity judgment.

[0066] The cosine similarity between the profile vector representation and each entity vector representation is determined. Cosine similarity measures the directional consistency between the profile vector representation and the entity vector representation, reflecting the degree of consistency between profile preferences and entity semantics and attribute expressions. The determination process includes calculating the vector norm of the profile and entity vector representations, performing a dot product, and dividing the dot product by the norm product to obtain the similarity value. Numerical stabilization processing can be performed on the similarity value to avoid outliers caused by extremely small norms. To improve the interpretability of the similarity, the profile vector representation can be decomposed into a focus distribution component and an anomaly tolerance component, and the entity vector representation can be decomposed into a semantic component and an attribute component. The similarity of each component is calculated separately and then weighted and summed to obtain the final cosine similarity. The weighting coefficients can be derived from the tag importance configuration in the user tag library, ensuring that the similarity reflects not only semantic matching but also attribute matching and risk sensitivity matching.

[0067] Candidate business entities with a cosine similarity exceeding a preset matching threshold are designated as associated business entities. The preset matching threshold transforms consecutive similarity values into filtering criteria. The threshold setting can be related to the business scenario; for example, a higher threshold can be used for health-related topics to reduce false associations, while a relatively lower threshold can be used for service description topics to improve coverage. The threshold can also be dynamically adjusted based on changes in the confidence level in the target object's feature profile; for example, a stricter threshold can be used when the confidence level is high. The filtering process includes traversing the candidate business entity set and reading the corresponding cosine similarity, performing threshold comparisons, and writing the candidate business entities that meet the criteria into the associated business entity set. The set structure can retain entity identifiers, similarity values, and the set of triggered tag evidence fields, facilitating the reuse and traceability of associated business entities in subsequent processes. To avoid an excessive or insufficient entity set due to relying solely on a single threshold, parallel constraints can be introduced without changing the filtering logic. For example, an upper limit can be set on the number of associated business entities under each topic, and the list can be truncated by cosine similarity, ensuring consistency between topic coverage and quantity control.

[0068] This embodiment performs multi-dimensional statistical analysis and preference weighting on historical interaction data to generate anomaly tolerance feature vectors and attention distribution feature vectors. Combined with user tag library mapping and aggregation, it constructs a target object feature profile, ensuring that the target object's preferences and sensitivities in reading, operation, and reporting behaviors are stably encoded and have traceable evidence. By performing topic semantic modeling and clause attribute decomposition on the source professional text data, candidate business entities containing attribute tags are obtained. The target object feature profile is then converted into a profile vector representation, and an entity vector representation is formed based on the attribute tags. Combined with cosine similarity and a preset matching threshold, related business entities are obtained through screening. This ensures that the screening results are simultaneously influenced by semantic information and clause attribute constraints, reducing mismatches caused by homonyms or inconsistent attributes, and improving the consistency and stability of related business entities and target object preferences.

[0069] In one embodiment, step S30 above includes: S301, perform word segmentation and entity boundary annotation on the source professional text data to obtain a candidate term sequence containing semantic tags; S302, map the candidate term sequence to a preset domain knowledge graph, and identify the successfully matched terms as key business entities; S303, merge the associated business entity and the key business entity to form an explanatory entity set, and assign a corresponding attention weight to each entity in the explanatory entity set; S304, The source professional text data is weighted and encoded based on the attention weights to generate an encoded input sequence; S305, input the encoded input sequence into the natural language processing model; S306, the encoded input sequence is decoded using the natural language processing model to generate natural language explanation content.

[0070] In this embodiment, the source professional text data undergoes word segmentation and entity boundary annotation to obtain a sequence of candidate terms containing semantic tags. Word segmentation divides continuous characters in the source professional text data into a computable sequence of lexical units. Segmentation is based on dictionary matching, sub-word unit decomposition, statistical language model scores, or sequence annotation results based on the encoder. The output format supports lexical text, lexical start and end positions, and lexical normalized form. Entity boundary annotation identifies the start and end points of business entities on the lexical sequence. The annotation granularity covers words, phrases, and cross-lexical combinations. The annotation can be implemented using the BIO or BILOU annotation system, with the sequence annotation model outputting a tag sequence, and then the entity span is obtained through tag merging. Semantic tags are used to express the semantic category and attribute indication of each candidate term in the candidate term sequence. The semantic category can cover clause liability, exceptions, payment conditions, time limit requirements, material requirements, cost-related items, etc., and the attribute indication can cover negation relations, condition trigger words, scope limiting words, comparison words, etc. The source of semantic tags can be the multi-task output of the entity boundary annotation model, or it can be the supplementary annotation after the candidate term sequence is matched with the category vocabulary and attribute trigger vocabulary, so that the candidate term sequence retains both structured boundary information and semantic constraint information that can be used for subsequent alignment and filtering.

[0071] Candidate term sequences are mapped to a predefined domain knowledge graph. Successfully matched terms are identified as key business entities. The mapping process converts candidate terms in the sequence into searchable query representations within the domain knowledge graph. These representations can include standardized strings, extended sets of synonyms, sets of spelling variations, and semantic tag constraints. The domain knowledge graph provides structured knowledge carriers for entity nodes, attribute nodes, and relationship edges. Nodes can include canonical names, alias sets, definition text, scope of application, and constraint attributes. Relationship edges can include inclusion, subordination, mutual exclusion, conditional triggering, and time constraint relationships. Successful matching is determined by string similarity, alias matching, semantic tag consistency verification, and contextual consistency verification. String similarity can be achieved using edit distance or vector similarity. Semantic tag consistency verification filters candidate terms based on their semantic tags and the types of nodes in the domain knowledge graph. Contextual consistency verification reorders candidate nodes based on their proximity to other terms, syntactic dependencies, or paragraph topic information in the source professional text data. The formation of key business entities is based on successful matching results. In addition to entity identifiers, key business entities can also carry a set of evidence fields, which include the matched aliases, similarity scores, triggered semantic tags, and contextual fragment positions. This enables the subsequent generation process to trace the source of key business entities and use them as generation constraints.

[0072] Related business entities and key business entities are merged to form an explanatory entity set. Each entity in the explanatory entity set is assigned a corresponding attention weight. This fusion unifies the two types of entities into the same entity space and under the same constraint perspective. The entity elements in the explanatory entity set are derived from the union of related and key business entities, and duplicate references are eliminated through deduplication rules. These rules can be based on the same node identifier, alias equivalence class, or high similarity threshold of entity vectors in the domain knowledge graph. The explanatory entity set can also incorporate inter-entity relationship information as a structured supplement. This relationship information comes from the relationship edges between key business entities and the connection paths between related and key business entities in the domain knowledge graph. This relationship information is used to reflect logical dependencies and explanation order constraints in the subsequent attention allocation stage. Attention weights quantify the intensity of attention given to each entity when generating natural language explanations. The allocation process can employ a learnable attention scoring function. The input includes entity semantic representation, entity type representation, entity relationship representation, and personalized preference signals corresponding to related business entities. These personalized preference signals are derived from the screening results and matching similarity distribution of related business entities. Attention weights can be normalized by entity dimension to form a weight distribution, or they can be grouped and normalized by entity type to avoid excessive consumption of generation budget by a single type of entity. Attention weights can also introduce threshold pruning and smoothing strategies to maintain the numerical stability of the weight distribution and avoid extreme weights that lead to insufficient interpretation coverage.

[0073] The source professional text data is weighted and encoded based on attention weights to generate an encoded input sequence. This weighted encoding injects the attention intensity of the explanatory entity set into the encoded representation of the source professional text data, enabling the encoded input sequence to simultaneously carry the original text semantics and entity guidance signals. In implementation, attention weights can be mapped to a set of lexical units aligned with the entity span at the lexical level. The alignment method is based on the span position obtained from entity boundary annotations, assigning attention weights corresponding to entities to lexical units within the span and applying background weights to lexical units outside the span. When a lexical unit is covered by multiple entity spans, a strategy prioritizing maximum weight, weighted summation, or relational constraints can be used for synthesis. Weighting methods can include applying weight gating to the lexical embedding vector, applying a bias term to the positional encoding, applying additional gain to entity-related key-value pairs in self-attention calculation, or encoding entity identifiers and attention weights as additional markers and inserting them into the encoded input sequence to form explicit control signals. The encoded input sequence consists of at least a word sequence and aligned weight information, and may also include a semantic tag sequence and entity relation markers. The semantic tag sequence is used to constrain the explanatory tone and key point extraction, while the entity relation markers are used to constrain the logical order of generation and dependency expression, so that the encoded input sequence has controllability and interpretability.

[0074] The encoded input sequence is fed into the Natural Language Processing (NLP) model. The NLP model performs representation learning and sequence-to-sequence generation control on the encoded input sequence. The NLP model can be an encoder-decoder structure, a decoder-only structure, or a generative structure with retrieval enhancements. The input interface converts the encoded input sequence into a tensor representation that the model can accept. This tensor representation includes a token identifier tensor, a position tensor, a weight tensor, and an optional label tensor. During the input phase, length alignment and truncation strategies can be implemented. Length alignment controls the sequence length while ensuring coverage of key and related business entities. Truncation strategies can be selected based on paragraph priority or cumulative coverage based on attention weights to avoid truncating segments related to high-weight entities, which could lead to missing interpretations. Numerical validation can also be performed during the input phase. Validation includes checking the normalized range of the weight tensor, the valid range of the label tensor, and the closure of entity labels to ensure that the NLP model obtains a stable and consistent input representation.

[0075] Natural language processing (NLP) models decode the encoded input sequence to generate NLP explanations. Decoding is used internally within the model to generate the target sequence based on the semantic representation and weight constraints of the encoded input sequence. The generation of NLP explanations can employ autoregressive generation combined with a constraint-based decoding strategy. This strategy can include key business entity coverage constraints, entity relationship expression constraints, and semantic label consistency constraints. Key business entity coverage constraints ensure that the output text contains the explanation points corresponding to key business entities. This can be achieved through forced inclusion of phrases, forced inclusion of synonym rewrite sets, or decoding bias guided by coverage loss. Entity relationship expression constraints ensure that the output text reflects logical structures such as dependency, conditional, and mutual exclusion relationships. This can be achieved through soft constraints on relationship templates or attention alignment guided by relationship tags. Semantic label consistency constraints ensure that the output terminology matches the attribute of the clauses, avoiding the rewriting of negative conditions as positive conditions or the loss of scope qualifiers. In addition to the main text, the NLP explanations can also output structured alignment information, such as the alignment mapping between explanation paragraphs and key business entities, and the contribution distribution of explanation sentences and attention weights. This facilitates reuse in subsequent interactive terminal display and interactive feedback generation stages.

[0076] This embodiment obtains a candidate term sequence containing semantic tags through word segmentation and entity boundary annotation, enabling the entity span and semantic category in the source professional text data to be structurally expressed and directly participate in knowledge alignment. The candidate term sequence is mapped to a domain knowledge graph, and successfully matched terms are identified as key business entities, providing standardized identifiers and traceable evidence, thereby reducing the impact of homonyms and contextual ambiguities on interpretation generation. By fusing related business entities and key business entities to form an interpretation entity set and assigning attention weights, the generation process is simultaneously constrained by personalized preferences and professional knowledge points. Weighted encoding of the source professional text data based on attention weights generates an encoded input sequence, which is then input into a natural language processing model. The natural language processing model then decodes and generates natural language interpretation content, ensuring that the output text covers key business entities while maintaining semantic and attribute consistency with the source professional text data, reducing omissions of key points and deviations in logical relationship expression.

[0077] In one embodiment, step S40 above includes: S401, Perform scene segmentation and action intent recognition on the natural language interpretation content to generate a storyboard script containing action instructions and scene description; S402, Based on the scene context provided by the source professional text data and the specific objects referred to by the key business entities, construct a matching virtual character model and background environment model; S403, input the storyboard script, the virtual character model and the background environment model into the visual generation model, and generate a continuous visual image sequence that presents the logical relationship of key business entities through the visual generation model; S404, The speech synthesis module is used to convert the natural language interpretation content into a digital speech stream, and the phoneme timestamp information synchronized with the digital speech stream is extracted; S405, the continuous visual image sequence is time-adjusted according to the phoneme timestamp information, and the time-adjusted continuous visual image sequence is merged with the digital voice stream to generate dynamic visual demonstration content.

[0078] In this embodiment, source professional text data and key business entities are used to constrain the semantic and object boundaries of the dynamic visual presentation content. The source professional text data provides contextual information such as scene descriptions, terms and conditions, object attributes, and event sequences, while the key business entities provide the set of objects and object relationships to be presented. The visual generation model is used to convert textual semantics and object constraints into visual expressions. The model input includes storyboards, virtual character models, and background environment models, and may also include entity identifiers, entity attributes, entity relationship triples, and visualization mapping parameters of key business entities. The formation of dynamic visual presentation content uses natural language interpretation as the semantic target, key business entities as the presentation objects, and source professional text data as the scene basis, thereby maintaining semantic and object consistency in the generated results.

[0079] The natural language interpretation content is segmented into scenes and its action intent is recognized to generate storyboards. Scene segmentation divides the natural language interpretation content into a set of fragments with temporal continuity and semantic integrity. Fragment boundaries are derived from paragraph structure, time-triggered words, conditional trigger words, role-switching words, and logical transition words. The output includes scene fragment identifiers, fragment start and end positions, and fragment summary statements. Action intent recognition extracts visual action targets and action types from each scene fragment. Action targets come from key business entities and their related attribute phrases. Action types cover display, emphasis, contrast, focus, movement, appearance, disappearance, connection, separation, etc. Recognition can be achieved using sequence labeling models or intent classification models combined with semantic tags for constraints. The storyboard carries a set of executable visual instructions. The storyboard includes action instructions and scene descriptions. Action instructions include role identifiers, action types, action parameters, and duration. Scene descriptions include background elements, camera angles, camera movements, subtitle prompts, and key business entity association information. The storyboard may also include transition methods between scenes, such as fade-in / fade-out, wipe, switch, push-pull, and masking, to ensure the temporal continuity of the continuous visual image sequence.

[0080] The scene context and key business entities provided by the source professional text data are used to construct the virtual character model and background environment model. The scene context comes from the text range in the source professional text data corresponding to the scene fragments in the storyboard. The text range is jointly limited by the scene segmentation result and the span position of the key business entities. The scene context includes location elements, environmental elements, interactive object elements, and constraint elements. The key business entities are used to determine the role type, appearance attributes, key features, and interactive parts of the virtual character model. The role type can cover the materialized expression of human roles, device roles, document roles, form roles, process node roles, and abstract concept roles. Appearance attributes can be derived from descriptive words, entity attribute fields, or definition text of the domain knowledge graph in the source professional text data. The construction of the virtual character model includes geometric shape generation, material texture generation, skeleton binding, and motion controller configuration. Geometric shape generation can be retrieved from the template library or directly synthesized by the generated model. Material texture generation can be generated by texture generation networks or parametric material systems. Skeleton binding and motion controller configuration are used to support the execution of action instructions in the storyboard. The background environment model is used to express the space and elements in the scene description. The construction content includes scene layout, lighting parameters, camera parameters and background element set. The layout can be generated by grid layout or constraint layout. The lighting parameters can select different lighting templates and adjust the brightness and color temperature based on scene semantics. The camera parameters include viewpoint, focal length, depth of field and path parameters to match the lens movement.

[0081] The storyboard, virtual character model, and background environment model are input into the visual generation model to generate a continuous sequence of visual images. The input structure can adopt multimodal conditional input, where the storyboard provides time-series conditions, and the virtual character model and background environment model provide 3D rendering conditions or structured prior conditions. The visual generation model can adopt a diffusion generation structure, a neural rendering structure, or a keyframe interpolation-based generation structure. The diffusion generation structure can generate keyframes in each storyboard segment and generate intermediate frames through temporal consistency constraints. The neural rendering structure can first generate basic rendering frames based on the virtual character model and background environment model and then perform stylization and detail completion. The keyframe interpolation structure can interpolate key poses under action command constraints and complete appearance consistency through a generative network. The continuous visual image sequence needs to present the logical relationships of key business entities. The expression of logical relationships can be achieved through spatial relationship encoding and connection annotation encoding. Spatial relationship encoding is used to map the relationships between entities to relative positions, relative sizes, hierarchical occlusion, and color grouping. Connection annotation encoding is used to map conditional relationships, inclusion relationships, and mutual exclusion relationships to connection edge styles, arrow directions, annotation text, and highlighted areas. When a visual generative model outputs a continuous sequence of visual images, consistency constraints can be introduced. These constraints cover consistency in character appearance, background, temporal smoothing, and entity identification. Entity identification consistency is achieved by maintaining the appearance features and identifier anchors of key business entities in each frame, thus preventing the same key business entity from shifting its form in different frames.

[0082] The speech synthesis module converts natural language interpretations into a digital speech stream and extracts phoneme timestamp information. It also converts text sequences into audio sampling sequences. The digital speech stream includes sampling rate, encoding format, and audio frame sequences. The synthesis process can employ an end-to-end speech synthesis network and supports parameters such as speech rate, pauses, stress, and intonation. Phoneme timestamp information expresses the start and end times at the phoneme or word level in the digital speech stream. Timestamp generation can originate from the alignment matrix within the speech synthesis module or be obtained through a forced alignment model that performs secondary alignment between the digital speech stream and the natural language interpretations. The output includes phoneme identifiers, phoneme start and end times, and corresponding text positions. Phoneme timestamp information can be further aggregated into word timestamps, phrase timestamps, and sentence timestamps. Aggregation is based on word segmentation results and punctuation boundaries, supporting multi-granularity alignment for subsequent temporal adjustments.

[0083] Phoneme timestamp information is used to time-adjust continuous visual image sequences and merge them with digital speech streams to generate dynamic visual presentation content. Time-adjustment aligns the frame display time points of the continuous visual image sequence with the pronunciation progress of the digital speech stream. The adjustment objects include the frame timestamp sequence and the scene segmentation boundary time points. The implementation of time-adjustment includes timeline construction, alignment mapping generation, and frame resampling. Timeline construction uses phoneme timestamp information as a basis to form a speech timeline and the duration of action instructions from the storyboard script to form a visual action timeline. Alignment mapping generation establishes the mapping relationship between visual action segments and speech segments. The mapping is based on the correspondence between scene segments and sentence boundaries, the correspondence between action intent and semantic keywords, and the correspondence between the appearance position of key business entities and the mention position in the text. Frame resampling adjusts the frame display duration or inserts transition frames while maintaining the content of the continuous visual image sequence without distortion. Resampling can employ time stretching and time compression strategies while keeping keyframe anchor points from drifting. The merging process encapsulates the time-adjusted continuous visual image sequence and digital audio stream into a unified media container. The media container contains video and audio tracks and writes synchronization information. The output dynamic visual presentation content contains playable media files or a collection of media segments that can be streamed, while retaining an optional subtitle track for linkage with the text display area in the interactive terminal.

[0084] This embodiment generates a storyboard by segmenting the natural language explanation content into scenes and recognizing action intentions. This enables the visual generation model to obtain executable time-series instructions and scene descriptions, thereby transforming the semantic structure of the explanation text into a visual expression structure. By constructing virtual character models and background environment models based on the scene context provided by the source professional text data and the specific objects referred to by key business entities, the generation process is constrained by object boundaries and scene boundaries, reducing character drift and scene inconsistencies. By inputting the storyboard, virtual character model, and background environment model into the visual generation model, a continuous visual image sequence presenting the logical relationships of key business entities is generated. This allows the conditional relationships, inclusion relationships, and mutual exclusion relationships between key business entities to be visualized and maintain temporal consistency. The speech synthesis module generates a digital speech stream and extracts phoneme timestamp information. Then, based on the phoneme timestamp information, the timing is adjusted and the audio and video are merged. The dynamic visual demonstration content forms an aligned synchronous relationship between the speech progress and the image changes, reducing comprehension deviations caused by the disconnect between narration and demonstration.

[0085] In one embodiment, step S50 above includes: S501, create a text display area and a visual demonstration area in the display interface of the interactive terminal, and configure a media playback component in the visual demonstration area; S502, render the natural language interpretation content to the text display area, and load the dynamic visual demonstration content into the media playback component; S503, establish a synchronous mapping relationship between the text scrolling timeline of the text display area and the media playback timeline of the media playback component; S504, render a unified playback control control in the display interface, and use the unified playback control control to control the scrolling progress of the natural language explanation content and the playback progress of the dynamic visual demonstration content. S505, identify the text paragraph corresponding to the key business entity in the natural language interpretation content, and add an interactive marker to the text paragraph. The interactive marker is used to respond to user operations to trigger visual focus on the key business entity in the visual demonstration area.

[0086] In this embodiment, the interactive terminal serves as the runtime environment for presentation and interaction. The display interface of the interactive terminal provides a visual output carrier and can be a browser page, a native mobile interface, or a desktop window. The text display area presents natural language interpretation content. The boundaries of the text display area are defined by a layout container, which can use a linear layout, constrained layout, or grid layout to stably control text layout and scrolling behavior. The visual presentation area presents dynamic visual presentation content. The visual presentation area and the text display area are arranged side-by-side or vertically within the same display interface, with the distribution determined by the terminal screen size, resolution, and interaction preferences. A media playback component is configured in the visual presentation area. The media playback component is used to load, decode, and play dynamic visual presentation content. The media playback component can correspond to a video rendering view, an animation rendering canvas, or a multitrack media player instance. The media playback component includes a playback buffer, a decoder interface, a rendering clock, and a progress callback interface. The progress callback interface outputs the current time point and playback status of the media playback timeline.

[0087] Rendering natural language interpretation content to the text display area is a text-to-interface element mapping process. Natural language interpretation content can be segmented into paragraphs, sentences, or semantic units to form structured text unit sets. These structured text unit sets support segment-by-segment rendering, on-demand loading, and targeted highlighting. The rendering process includes font style selection, line spacing settings, paragraph spacing settings, line wrapping strategy configuration, and binding to a scrollable container. The scrollable container provides scroll offsets and scroll event callbacks to support the construction of the text scrolling timeline. Loading dynamic visual presentation content into the media playback component is a media resource binding process. The loading process includes media resource location, resource verification, pre-buffering, and decoding preparation. Media resource location can be accomplished through local file paths, content distribution addresses, or memory media stream handles. Resource verification includes media format verification and duration information reading. Pre-buffering reduces stuttering during initial playback. Decoding preparation establishes the rendering pipeline for the media playback component and initializes the media playback timeline.

[0088] The text scrolling timeline depicts the temporal progress and spatial relationship of the natural language interpretation content within the text display area. It can be defined by the layout height of the text paragraphs, the scroll offset of the scrollable container, and the scrolling speed parameter. The unit of the text scrolling timeline can be either a time unit or a normalized progress value. The media playback timeline depicts the playback progress of the dynamic visual presentation content within the media playback component. It is driven by the rendering clock of the media playback component and outputs the current playback time point. The synchronization mapping relationship establishes the correspondence between the text scrolling timeline and the media playback timeline. This relationship can include a set of key anchor points and an interpolation strategy. The key anchor point set binds the semantic segmentation boundaries in the natural language interpretation content to the scene segmentation boundaries in the dynamic visual presentation content. The interpolation strategy calculates the mapping value at any time point between the key anchor points. The generation of synchronous mapping relationships can be based on the semantic segmentation position of natural language interpretation content, the layout position of text paragraphs, the aggregation result of phoneme timestamp information of dynamic visual presentation content, or the chapter mark information of media playback components. The mapping output can be a bidirectional mapping table from text scrolling position to media playback time point, thereby supporting text-driven media positioning or media-driven text positioning.

[0089] The unified playback control provides a unified interactive entry point across regions. It can include control elements such as play, pause, drag, fast forward, rewind, and progress bars. These control elements share the same control state machine with the text display area and the visual presentation area. When using the unified playback control to control the scrolling progress of natural language explanation content and the playback progress of dynamic visual presentation content, the controlled objects include the text scrolling timeline and the media playback timeline. Control methods include direct control and mapped control. Direct control converts the input events of the unified playback control into text scroll offset updates or positioning calls to the media playback component. Mapped control, after updating any timeline, uses a synchronous mapping relationship to deduce the target position of the other timeline and execute a linked update. To avoid control jitter, a dejitter parameter and a threshold window can be introduced. Within the threshold window, the current scrolling or playback position remains unchanged. The dejitter parameter limits the number of linked updates per unit time. The state of the unified playback control needs to be consistent with the playback state of the media playback component. Playback states include playing, paused, buffering, and finished. State synchronization can be achieved through the progress callback interface and state callback interface of the media playback component.

[0090] The identification of text paragraphs corresponding to key business entities in natural language interpretation content is used to establish the text-to-object relationship. Text paragraphs can be paragraph units or semantic segmentation units within the natural language interpretation content. The identification process can be completed through entity name matching, synonym matching, entity alias matching, and context disambiguation of key business entities. Context disambiguation is used to distinguish between entities with the same name or entities with multiple meanings. Disambiguation criteria can come from entity attributes in the domain knowledge graph or neighboring word groups in the natural language interpretation content. Adding interactive tags to text paragraphs is part of the interactive element injection process. Interactive tags can exist in the form of highlight styles, underline styles, icon styles, or clickable hotspots. Interactive tags contain key business entity identifiers, text paragraph position indexes, and interactive event binding information. Interactive event binding information includes event types such as click, long press, hover, or focus switching. Interactive tags are used to respond to user operations and trigger visual focus on key business entities in the visual demonstration area. Visual focus is part of the target object highlighting processing in the visual demonstration area. Highlighting can be achieved through lens center shifting, scaling, spotlight masking, color emphasis, or object outlining. The visual focus target is determined by the key business entity identifier carried by the interaction mark. The execution object of the visual focus is the current frame or the current scene within the media playback component. The execution process may include locating the time segment containing the key business entity, overlaying a highlight layer, and starting the focus animation. The location basis may come from the reverse mapping of the synchronization mapping relationship or the entity time segment index built into the dynamic visual presentation content, thereby realizing object-level linkage between the text paragraph and the visual presentation area after the user triggers the interaction mark.

[0091] This embodiment achieves parallel loading and independent rendering of natural language explanation content and dynamic visual presentation content by creating a text display area and a visual presentation area in the display interface of the interactive terminal and configuring a media playback component. By rendering the natural language explanation content to the text display area and loading the dynamic visual presentation content into the media playback component, the text and media presentation objects have clear interface loading boundaries and playback control interfaces. By establishing a synchronous mapping relationship between the text scrolling timeline and the media playback timeline and combining it with a unified playback control control, the linkage control and bidirectional positioning of the text scrolling progress and the media playback progress are realized, reducing the time deviation caused by the user switching between the two types of content. By identifying text paragraphs corresponding to key business entities and adding interactive markers, the interactive markers trigger visual focus on key business entities in the visual presentation area, enabling users to quickly align their attention to the corresponding objects and segments in the dynamic visual presentation content while reading the natural language explanation content, improving information acquisition efficiency and interaction consistency.

[0092] In one embodiment, step S60 above includes: S601, in response to the input request of the interactive terminal, a real-time query instruction is received, and the real-time query instruction is semantically encoded to generate a query intent vector; S602, determine the matching similarity between the query intent vector and each semantic segment in the natural language explanation content, and take the semantic segments whose matching similarity meets the preset threshold as the response knowledge context; S603, parse the user preference tags in the target object feature profile, and construct personalized response style constraint parameters based on the user preference tags; S604, input the query intent vector, the response knowledge context, and the personalized response style constraint parameters into the pre-trained dialogue generation model to generate interactive feedback information.

[0093] In this embodiment, the real-time query command input by the interactive terminal is instantaneous text input data generated on the interactive terminal side. The real-time query command can originate from input box submission, text submission after speech-to-text transcription, or query phrase submission triggered by selecting an interactive marker. Receiving a real-time query command in response to an input request from the interactive terminal is part of the input event processing process. The input request can consist of a keyboard submission event, a button trigger event, or an interface call event. The receiving action can be completed through a front-end event listener, a terminal-side input service, or a session message queue. To maintain session continuity, when receiving a real-time query command, the session identifier, the interactive terminal device identifier, and the target object identifier can be simultaneously acquired. The target object identifier is used to associate the target object's feature profile, the interactive terminal device identifier is used to select the output format and presentation granularity, and the session identifier is used to retain the context scope and backtracking capability.

[0094] Semantic encoding of real-time query commands converts them into measurable and matchable vector representations. The input to semantic encoding is the character sequence or word segmentation sequence of the real-time query command, and the output is a query intent vector. Semantic encoding can be accomplished through a combination of text normalization, word segmentation, sub-word segmentation, and vectorized representation. Text normalization can include case unification, number format normalization, and symbol cleansing. Word segmentation is used to segment word boundaries, and sub-word segmentation is used to reduce representational bias caused by out-of-vocabulary words. Vectorized representation can output a fixed-dimensional vector through a semantic embedding model. The dimension and normalization method of the query intent vector need to be consistent with the subsequent calculation method for matching similarity. Normalization can use vector norm normalization to support cosine similarity or distribution normalization to support inner product similarity. The query intent vector can also carry timestamps and session segment identifiers for deduplication and overwriting strategies in multi-turn inputs.

[0095] The semantic segments within the natural language interpretation content represent structured partitioning of the content. These segments can be formed based on paragraph boundaries, sentence boundaries, punctuation boundaries, or topic boundaries. The granularity of the semantic segments affects the coverage and accuracy of the response knowledge context. To support stable calculation of matching similarity, semantic segments need to be converted into vector representations within the same domain as the query intent vector. This conversion can be achieved using the same encoder as the semantic encoding or by using a shared word vector table to maintain vector space consistency. Determining the matching similarity between the query intent vector and each semantic segment within the natural language interpretation content is part of the vector retrieval and similarity evaluation process. Matching similarity can be achieved using cosine similarity, inner product similarity, or relevance scoring based on a cross-encoder. The cross-encoder can directly input the real-time query command and the semantic segment text and output a relevance score. Semantic segments whose matching similarity meets a preset threshold are used as the response knowledge context, which is part of the candidate selection and context aggregation process. The preset threshold controls the balance between recall precision and recall coverage. This threshold can be a fixed threshold or dynamically adjusted based on the real-time query command length, the anomaly tolerance feature vector in the target object's feature profile, or the attention distribution feature vector. The response knowledge context can be composed of a single semantic segment or multiple semantic segments ordered by matching similarity and then concatenated. The concatenation needs to retain the original order index of the semantic segments to reduce semantic jumps. Separators can be inserted between multiple semantic segments to distinguish the source paragraphs. At the same time, the matching similarity of each semantic segment is recorded for subsequent generation control.

[0096] The target object feature profile is a set of object-level features extracted and aggregated from historical interaction behavior data. User preference tags within the target object feature profile are discretized expressions of the target object's preference direction. These tags can include tags for topics of interest, expressions of preference, depth of explanation, risk sensitivity, and interaction rhythm. Parsing user preference tags in the target object feature profile is a tag extraction and structuring process. Parsing actions can include reading tag key-value pairs, reading tag weights, and resolving tag conflicts. Tag conflict resolution handles situations where the same target object exhibits mutually exclusive tags such as concise preferences and detailed preferences. Conflict resolution can be completed based on tag update time, tag confidence, or session stage information. Constructing personalized response style constraint parameters based on user preference tags is a process of converting discrete tags into generation control parameters. Personalized response style constraint parameters can include parameters such as response length limit, terminology explanation density, example ratio, tone intensity, structuring level, and disclaimer insertion strategy. The response length limit restricts the length of interactive feedback information; terminology explanation density controls the frequency of explanations of technical terms; example ratio controls the proportion of example sentences; tone intensity controls the degree of affirmation; structuring level controls whether to output bullet points or paragraphs; and disclaimer insertion strategy controls the insertion position and frequency of compliance prompts. The value range of personalized response style constraint parameters needs to be consistent with the control interface of the pre-trained dialogue generation model. The control interface can be a prompt word template, a control tag sequence, or a parameterized decoding configuration. The parameterized decoding configuration can include parameters such as temperature, top-k, top-p, and repetition penalty to influence generation diversity and stability.

[0097] In intelligent interaction scenarios within health insurance, pre-trained dialogue generation models bear the core task of generating accurate and personalized feedback. These models are typically built upon an encoder-decoder framework based on the Transformer architecture or large-scale autoregressive language models (such as the GPT series), and are specifically designed to collaboratively process three key inputs: a query intent vector representing the core of the user's question, relevant response knowledge context retrieved from the generated personalized explanations, and personalized response style constraint parameters parsed from the user's feature profile. The model implementation comprises several key modules and data flow layers: The input representation and fusion module first aligns and encodes the heterogeneous inputs, for example, by concatenating the knowledge context as a prefix sequence with historical dialogues, and injecting the intent vector and style parameters as learnable control tokens or through a cross-attention mechanism into the intermediate layer of the encoder; The multi-layer Transformer encoder then performs deep semantic understanding on the fused input sequence, capturing the complex relationships between the query, knowledge background, and style requirements; The autoregressive decoder, based on the rich contextual representation output by the encoder, generates grammatically correct, informationally accurate, and style-appropriate response text word by word. During the generation process, it ensures priority is given to key facts in the knowledge context through specific attention masks and is subject to soft constraints from style parameters to adjust the formality, level of detail, rhythm, or emotional tendency of the wording.

[0098] The training of this model is a multi-stage process. First, basic pre-training is performed on a large-scale general corpus containing multi-turn dialogues, aiming to master language modeling and basic dialogue logic. Then, domain-adaptive pre-training is conducted using massive amounts of insurance customer service dialogue records, insurance policy question-and-answer pairs, and health consultation texts, enabling the model to deeply understand the professional terminology and typical interaction patterns in areas such as "claims application," "exclusions," and "health disclosure." Finally, supervised fine-tuning is performed. Training data is either manually constructed or extracted from real interaction logs. Each sample includes a user query, a corresponding knowledge context fragment, style tags (such as "concise and professional" and "detailed and reassuring"), and a high-quality standard response written by a human. In this stage, the model learns how to generate responses based on given intent, knowledge, and style by maximizing conditional probability. The loss function typically uses cross-entropy loss, the optimizer uses AdamW, and learning rate warm-up and decay strategies are employed. Key training parameters include model size (e.g., 12-layer encoder, 12-layer decoder, 768-dimensional hidden layers), batch size, learning rate (e.g., 2e-5), and a temperature parameter to control the diversity of generation.

[0099] During online reasoning, the model can receive real-time queries, quickly match them to the knowledge base, and generate feedback in real time based on user profiles. For example, when faced with a user sensitive to the details of the terms and conditions asking whether a disease diagnosed after the waiting period is guaranteed to be covered, the model will locate the relevant knowledge context of "waiting period" and "insurance liability" in the explanatory terms, and combine this with the user's "rational and cautious" style tag to generate feedback such as, "After the waiting period stipulated in the contract, if the disease is diagnosed by a hospital designated in the contract and meets the disease definition in the contract, the insurance company will assume the liability for compensation according to the contract. However, it should be noted that if the disease falls under the 'exclusion' clause (such as a pre-existing condition before the insurance was purchased), it is still not covered." This generation not only ensures the rigor of the information but also reflects a communication style that matches the user's cognitive preferences through sentence structure and vocabulary selection, thus achieving a closed loop from understanding to personalized communication.

[0100] Inputting query intent vectors, response knowledge context, and personalized response style constraint parameters into a pre-trained dialogue generation model is a multi-source conditional input fusion process. The input format can be a concatenated model input sequence or a multi-channel input structure. The concatenated model input sequence can include a discretized representation of the query intent vector or a keyword summary retrieved from the query intent vector. The response knowledge context is written in the form of original text fragments with semantic segmentation boundary markers. The personalized response style constraint parameters are written in the form of control markers or template constraints and are separated from the response knowledge context. The multi-channel input structure can use the response knowledge context as a retrieval enhancement context channel, the personalized response style constraint parameters as a control channel, and the real-time query command as the main query channel. During the decoding phase, the pre-trained dialogue generation model assigns different attention weights to each channel to achieve synergy between content constraints and style constraints. Generating interactive feedback information is part of the decoding output process of the dialogue generation model. The interactive feedback information can be natural language text or contain structured fields for rendering on the interactive terminal. Structured fields can include a semantic segmentation index, a list of key business entity references, and recommended operation prompt phrases. To improve consistency, consistency checks can be performed after generating interactive feedback information. Consistency checks can include response knowledge context coverage checks and style parameter conformity checks. Response knowledge context coverage checks are used to prevent generated content from deviating from the response knowledge context, while style parameter conformity checks are used to prevent interactive feedback information from exceeding the response length limit or deviating from the terminology explanation density.

[0101] For example, in the practical application scenario of health insurance, the system begins to operate when a user visits the online health insurance product application page. The page contains a large amount of complex and technically terminologically advanced insurance clause text, such as sections on "Insurance Liability," "Exclusions," "Health Disclosure Requirements," and "Insurance Claim and Payment." The system first extracts this raw text content from the page's underlying code, then cleans, parses, and restructures it into standardized text data that can be processed by the machine. Simultaneously, based on the user's identity, the system retrieves past records from their historical behavior database, including browsing history of different types of insurance products, frequency of clicks on premium calculation tools, and whether they have previously submitted an insurance application.

[0102] Next, the system conducts in-depth analysis of this historical behavioral data. By statistically analyzing user behavior such as the duration of time spent on specific clauses (e.g., "outpatient medical reimbursement" and "definition of critical illness") and the number of times they repeatedly view them, the system quantifies their risk preferences and focus, constructing a characteristic profile of the user. For example, a user who frequently views "malignant tumor protection" and "targeted drug catalog" might have a profile showing "focus on cancer protection and sensitivity to new treatments." Based on this profile, the system automatically scans all insurance clauses and, through semantic matching calculations, identifies clauses such as the disease definition of "malignant tumor - severe," restrictions on medical treatment within "specific hospital areas," and "waiting period" as the core business entities most relevant to the user.

[0103] Subsequently, the system performs semantic parsing on the original terms and conditions text. It identifies key concepts such as "insurance period," "insured amount," "proton and heavy ion therapy," and "genetic testing," and uses a knowledge graph in the insurance field to confirm the precise meaning of these entities. The system then merges the entities highly relevant to the user (such as "proton and heavy ion therapy") selected in the previous step with these key entities, assigning them higher explanatory weights. Then, using a natural language processing model, guided by these weighted entities, the system rewrites the originally obscure legal provisions into a coherent and easily understandable narrative explanation. For example, the clause regarding "malignant tumor coverage" is transformed into: "The 'malignant tumor - severe' coverage you are interested in covers all serious cancers listed in the contract. If diagnosed, the insurance company will pay the agreed amount in a lump sum. The 'proton and heavy ion therapy' mentioned is an advanced radiotherapy technology and is also within the coverage scope. However, please note that there is a 'waiting period' after the contract takes effect, during which time you may not be eligible for compensation if you suffer an illness." Simultaneously, the system initiates multimodal content generation. Based on the aforementioned natural language interpretation, it automatically plans dynamic storyboards containing key scenarios such as "hospital diagnosis," "submission of materials," "insurance company review," and "claims payment." Combining the details described in the original terms, the system constructs corresponding virtual characters (such as patients, doctors, and claims adjusters) and scene models (such as hospitals and insurance companies). The visual generation model, based on the script and models, renders a continuous animation clearly demonstrating the entire process from illness to claims settlement. The system also synthesizes synchronized voice narration for this animation, ensuring that each narration line is precisely aligned with the corresponding animated scene on the timeline.

[0104] On the interactive terminal, the system integrates and presents the generated content. The insurance application page is divided into a text explanation area and an animation demonstration area. The plain natural language explanations are displayed in the text area with a clear paragraph structure, while the dynamic demonstration animation is embedded in the video player. The system establishes a time mapping relationship between text scrolling and video playback at the underlying level and provides a unified control bar. When the user drags the video progress bar, the text automatically jumps to the explanation of the corresponding clause; conversely, when scrolling the text, the video will also position itself at the corresponding screen. Furthermore, in the text explanations, key terms such as "proton and heavy ion therapy" and "waiting period" are marked with interactive tags. When the user clicks on these tags, the animation on the right immediately focuses on and highlights the visual elements related to the concept.

[0105] Finally, when users have questions during the understanding process and ask them directly, the system can provide accurate intelligent feedback. For example, if a user asks, "Will genetic testing at a hospital not designated in the contract affect claims?" the system first understands the intent of the question and quickly locates the sections on "hospital scope" and "testing institutions" in the generated personalized explanation as the basis for the answer. Simultaneously, the system refers to the user's profile (such as their sensitivity to clause details) and generates a reply in a rigorous, clause-referenced tone: "According to the terms of the contract, genetic testing generally needs to be conducted at an institution recognized by the insurance company. Reports from tests conducted at non-designated institutions may not be accepted as grounds for claims. Please refer to the contract appendix for a specific list of recognized institutions." In this way, the system completes a full interactive loop from personalized interpretation and multimodal demonstration to intelligent question answering, significantly lowering the barrier for users to understand complex insurance clauses and improving the efficiency and confidence of insurance decisions.

[0106] This embodiment receives real-time query commands input from an interactive terminal and generates a query intent vector, enabling the real-time query command to obtain a measurable semantic representation and directly participate in subsequent retrieval and generation. By calculating the matching similarity between the query intent vector and each semantic segment in the natural language explanation content and forming a response knowledge context, the content of the interactive feedback information is stably converged within the range of the natural language explanation content, reducing irrelevant expansion. By parsing user preference tags in the target object feature profile and constructing personalized response style constraint parameters, the interactive feedback information is kept consistent with the target object feature profile in terms of length, explanation density, and expression style. By inputting the query intent vector, response knowledge context, and personalized response style constraint parameters into a pre-trained dialogue generation model to generate interactive feedback information, context-traceable and style-controllable interactive feedback output for real-time query commands is achieved.

[0107] In one embodiment, a feature-based profile-based explanation and interaction device is provided, which corresponds one-to-one with the feature-based profile-based explanation and interaction method described in the above embodiments. (Refer to...) Figure 3 , Figure 3 This is a schematic diagram of the functional modules of a preferred embodiment of the feature-based profiling interpretation and interaction device of the present invention. The modules include a data acquisition and parsing module 10, a user profile construction module 20, a semantic interpretation generation module 30, a visual content generation module 40, a multimodal display control module 50, and an intelligent interactive feedback module 60. Detailed descriptions of each functional module are as follows: The data acquisition and parsing module 10 is used to acquire source professional text data in the business scenario to be processed, and to retrieve historical interaction behavior data of the target object from the database; User profile building module 20 is used to perform feature analysis on the historical interaction behavior data to build a target object feature profile, and to filter related business entities from the source professional text data based on the target object feature profile; The semantic explanation generation module 30 is used to perform semantic analysis on the source professional text data using a natural language processing model to extract key business entities, and to convert the source professional text data into natural language explanation content based on the key business entities and the related business entities. The visual content generation module 40 is used to synthesize dynamic visual presentation content corresponding to the natural language interpretation content based on the source professional text data and the key business entities using a visual generation model. The multimodal display control module 50 is used to output the natural language explanation content and the dynamic visual demonstration content on the interactive terminal; The intelligent interactive feedback module 60 is used to generate interactive feedback information for the real-time query command input by the interactive terminal based on the target object feature profile and the natural language interpretation content.

[0108] In one embodiment, the data acquisition and parsing module 10 is specifically used for: Parse the document object model structure of the front-end page of the business scenario to be processed in order to locate the text node containing business terms information; The text nodes are traversed and cascading style sheet code and script code are removed to extract plain text character sequences. The plain text character sequences are then denoised, cleaned, and reorganized into paragraphs to form source professional text data. Parse the login session credentials of the target object to obtain a unique identifier, and use the unique identifier to initiate a related data retrieval request to the database; Retrieve from the database past browsing logs, function click logs, and business application records that are bound to the unique identifier; The historical browsing logs, function click logs, and business application records are formatted and time-series aligned to generate historical interaction behavior data.

[0109] In one embodiment, the user profile building module 20 is specifically used for: The historical interaction data is subjected to multi-dimensional statistical analysis and preference weighting to generate an anomaly tolerance feature vector and a focus distribution feature vector. The abnormal tolerance feature vector and the attention point distribution feature vector are mapped to a preset user tag library, and the feature profile of the target object is constructed by aggregation. Perform topic semantic modeling and clause attribute decomposition on the source professional text data to obtain multiple independent candidate business entities that contain attribute tags; The target object feature profile is converted into a profile vector representation, and each candidate business entity is converted into an entity vector representation based on the attribute labels; Determine the cosine similarity between the image vector representation and each entity vector representation; Candidate business entities whose cosine similarity exceeds a preset matching threshold are designated as associated business entities.

[0110] In one embodiment, the semantic interpretation generation module 30 is specifically used for: The source professional text data is processed by word segmentation and entity boundary annotation to obtain a candidate term sequence containing semantic tags; The candidate term sequence is mapped to a preset domain knowledge graph, and the successfully matched terms are identified as key business entities; The related business entities and the key business entities are merged to form an explanatory entity set, and a corresponding attention weight is assigned to each entity in the explanatory entity set. The source professional text data is weighted and encoded based on the attention weights to generate an encoded input sequence; The encoded input sequence is then input into the natural language processing model. The encoded input sequence is decoded using the natural language processing model to generate a natural language explanation.

[0111] In one embodiment, the visual content generation module 40 is specifically used for: The natural language interpretation content is segmented into scenes and the action intent is recognized to generate a storyboard script containing action instructions and scene descriptions. Based on the scene context provided by the source professional text data and the specific objects referred to by the key business entities, a matching virtual character model and background environment model are constructed. The storyboard, the virtual character model, and the background environment model are input into the visual generation model, and a continuous visual image sequence that presents the logical relationship between key business entities is generated through the visual generation model. The natural language interpretation content is converted into a digital speech stream using a speech synthesis module, and phoneme timestamp information synchronized with the digital speech stream is extracted; The continuous visual image sequence is time-adjusted according to the phoneme timestamp information, and the time-adjusted continuous visual image sequence is merged with the digital speech stream to generate dynamic visual presentation content.

[0112] In one embodiment, the multimodal display control module 50 is specifically used for: Create a text display area and a visual presentation area in the display interface of the interactive terminal, and configure media playback components in the visual presentation area; The natural language interpretation content is rendered onto the text display area, and the dynamic visual presentation content is loaded into the media playback component; Establish a synchronous mapping relationship between the text scrolling timeline of the text display area and the media playback timeline of the media playback component; A unified playback control control is rendered in the display interface, and the unified playback control control is used to control the scrolling progress of the natural language explanation content and the playback progress of the dynamic visual demonstration content. Identify the text paragraphs in the natural language interpretation content that correspond to the key business entity, and add interactive markers to the text paragraphs. The interactive markers are used to respond to user operations to trigger visual focus on the key business entity in the visual demonstration area.

[0113] In one embodiment, the intelligent interactive feedback module 60 is specifically used for: In response to the input request from the interactive terminal, a real-time query instruction is received, and the real-time query instruction is semantically encoded to generate a query intent vector; Determine the matching similarity between the query intent vector and each semantic segment in the natural language explanation content, and use the semantic segments whose matching similarity meets a preset threshold as the response knowledge context; Parse the user preference tags in the feature profile of the target object, and construct personalized response style constraint parameters based on the user preference tags; The query intent vector, the response knowledge context, and the personalized response style constraint parameters are input into a pre-trained dialogue generation model to generate interactive feedback information.

[0114] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 4 As shown, the computer device includes a processor, memory, network interface, and database connected via a system bus. The processor provides determination and control capabilities. The memory includes non-volatile and / or volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface is used to communicate with external clients via a network connection. When executed by the processor, the computer program implements server-side functions or steps of a feature-based profiling interpretation and interaction method.

[0115] In one embodiment, a computer device is provided, which may be a client, and its internal structure diagram may be as follows: Figure 5 As shown, the computer device includes a processor, memory, network interface, display screen, and input devices connected via a system bus. The processor provides determination and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface is used to communicate with an external server via a network connection. When executed by the processor, the computer program implements client-side functions or steps of a feature-based profiling interpretation and interaction method.

[0116] In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to perform the following steps: Acquire source professional text data for the business scenario to be processed, and retrieve historical interaction behavior data of the target object from the database; The historical interaction behavior data is subjected to feature analysis to construct a feature profile of the target object, and related business entities are filtered from the source professional text data based on the feature profile of the target object; The source professional text data is semantically analyzed using a natural language processing model to extract key business entities, and the source professional text data is converted into natural language explanation content based on the key business entities and the related business entities. Based on the source professional text data and the key business entities, dynamic visual demonstration content corresponding to the natural language explanation content is synthesized using a visual generation model. The natural language explanation and the dynamic visual demonstration are output on the interactive terminal. Based on the target object feature profile and the natural language interpretation content, interactive feedback information is generated for the real-time query command input by the interactive terminal.

[0117] In one embodiment, a computer-readable storage medium is provided, which may be non-volatile or volatile, and a computer program is stored thereon, which, when executed by a processor, performs the following steps: Acquire source professional text data for the business scenario to be processed, and retrieve historical interaction behavior data of the target object from the database; The historical interaction behavior data is subjected to feature analysis to construct a feature profile of the target object, and related business entities are filtered from the source professional text data based on the feature profile of the target object; The source professional text data is semantically analyzed using a natural language processing model to extract key business entities, and the source professional text data is converted into natural language explanation content based on the key business entities and the related business entities. Based on the source professional text data and the key business entities, dynamic visual demonstration content corresponding to the natural language explanation content is synthesized using a visual generation model. The natural language explanation and the dynamic visual demonstration are output on the interactive terminal. Based on the target object feature profile and the natural language interpretation content, interactive feedback information is generated for the real-time query command input by the interactive terminal.

[0118] It should be noted that the functions or steps that can be implemented by the computer-readable storage medium or computer device described above can be referred to the relevant descriptions on the server side and client side in the foregoing method embodiments. To avoid repetition, they will not be described one by one here.

[0119] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), Rambus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

[0120] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is used as an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above.

[0121] It should be noted that any AI models, software tools, or components not belonging to this company appearing in the embodiments of this application are merely illustrative examples and do not represent actual use. All user personal information involved in the embodiments of this application has been authorized (with the knowledge and consent) by the relevant parties or has been fully authorized by all parties, and the executing entity may obtain it through various legal and compliant means. The collection, storage, use, processing, transmission, provision, and disclosure of the information, data, and signals involved all comply with relevant laws and regulations and do not violate public order and good morals.

[0122] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.

Claims

1. A method for interpretation and interaction based on feature profiling, characterized in that, Includes the following steps: Acquire source professional text data for the business scenario to be processed, and retrieve historical interaction behavior data of the target object from the database; The historical interaction behavior data is subjected to feature analysis to construct a feature profile of the target object, and related business entities are filtered from the source professional text data based on the feature profile of the target object; The source professional text data is semantically analyzed using a natural language processing model to extract key business entities, and the source professional text data is converted into natural language explanation content based on the key business entities and the related business entities. Based on the source professional text data and the key business entities, dynamic visual demonstration content corresponding to the natural language explanation content is synthesized using a visual generation model. The natural language explanation and the dynamic visual demonstration are output on the interactive terminal. Based on the target object feature profile and the natural language interpretation content, interactive feedback information is generated for the real-time query command input by the interactive terminal.

2. The interpretation and interaction method based on feature profiling as described in claim 1, characterized in that, Obtain the source professional text data for the business scenario to be processed, and retrieve the historical interaction behavior data of the target object from the database, including: Parse the document object model structure of the front-end page of the business scenario to be processed in order to locate the text node containing business terms information; The text nodes are traversed and cascading style sheet code and script code are removed to extract plain text character sequences. The plain text character sequences are then denoised, cleaned, and reorganized into paragraphs to form source professional text data. Parse the login session credentials of the target object to obtain a unique identifier, and use the unique identifier to initiate a related data retrieval request to the database; Retrieve from the database past browsing logs, function click logs, and business application records that are bound to the unique identifier; The historical browsing logs, function click logs, and business application records are formatted and time-series aligned to generate historical interaction behavior data.

3. The interpretation and interaction method based on feature profiling as described in claim 1, characterized in that, The historical interaction behavior data is subjected to feature analysis to construct a feature profile of the target object, and related business entities are filtered from the source professional text data based on the feature profile of the target object, including: The historical interaction data is subjected to multi-dimensional statistical analysis and preference weighting to generate an anomaly tolerance feature vector and a focus distribution feature vector. The abnormal tolerance feature vector and the attention point distribution feature vector are mapped to a preset user tag library, and the feature profile of the target object is constructed by aggregation. Perform topic semantic modeling and clause attribute decomposition on the source professional text data to obtain multiple independent candidate business entities that contain attribute tags; The target object feature profile is converted into a profile vector representation, and each candidate business entity is converted into an entity vector representation based on the attribute labels; Determine the cosine similarity between the image vector representation and each entity vector representation; Candidate business entities whose cosine similarity exceeds a preset matching threshold are designated as associated business entities.

4. The interpretation and interaction method based on feature profiling as described in claim 1, characterized in that, Semantic analysis of the source professional text data is performed using a natural language processing model to extract key business entities. Based on the key business entities and related business entities, the source professional text data is converted into natural language explanations, including: The source professional text data is processed by word segmentation and entity boundary annotation to obtain a candidate term sequence containing semantic tags; The candidate term sequence is mapped to a preset domain knowledge graph, and the successfully matched terms are identified as key business entities; The related business entities and the key business entities are merged to form an explanatory entity set, and a corresponding attention weight is assigned to each entity in the explanatory entity set. The source professional text data is weighted and encoded based on the attention weights to generate an encoded input sequence; The encoded input sequence is then input into the natural language processing model. The encoded input sequence is decoded using the natural language processing model to generate a natural language explanation.

5. The interpretation and interaction method based on feature profiling as described in claim 1, characterized in that, Based on the source professional text data and the key business entities, a visual generation model is used to synthesize dynamic visual presentation content corresponding to the natural language explanation content, including: The natural language interpretation content is segmented into scenes and the action intent is recognized to generate a storyboard script containing action instructions and scene descriptions. Based on the scene context provided by the source professional text data and the specific objects referred to by the key business entities, a matching virtual character model and background environment model are constructed. The storyboard, the virtual character model, and the background environment model are input into the visual generation model, and a continuous visual image sequence that presents the logical relationship between key business entities is generated through the visual generation model. The natural language interpretation content is converted into a digital speech stream using a speech synthesis module, and phoneme timestamp information synchronized with the digital speech stream is extracted; The continuous visual image sequence is time-adjusted according to the phoneme timestamp information, and the time-adjusted continuous visual image sequence is merged with the digital speech stream to generate dynamic visual presentation content.

6. The interpretation and interaction method based on feature profiling as described in claim 1, characterized in that, The interactive terminal outputs the natural language explanation and the dynamic visual demonstration, including: Create a text display area and a visual presentation area in the display interface of the interactive terminal, and configure media playback components in the visual presentation area; The natural language interpretation content is rendered onto the text display area, and the dynamic visual presentation content is loaded into the media playback component; Establish a synchronous mapping relationship between the text scrolling timeline of the text display area and the media playback timeline of the media playback component; A unified playback control control is rendered in the display interface, and the unified playback control control is used to control the scrolling progress of the natural language explanation content and the playback progress of the dynamic visual demonstration content. Identify the text paragraphs in the natural language interpretation content that correspond to the key business entity, and add interactive markers to the text paragraphs. The interactive markers are used to respond to user operations to trigger visual focus on the key business entity in the visual demonstration area.

7. The interpretation and interaction method based on feature profiling as described in claim 1, characterized in that, Based on the target object feature profile and the natural language interpretation content, interactive feedback information is generated for the real-time query command input by the interactive terminal, including: In response to the input request from the interactive terminal, a real-time query instruction is received, and the real-time query instruction is semantically encoded to generate a query intent vector; Determine the matching similarity between the query intent vector and each semantic segment in the natural language explanation content, and use the semantic segments whose matching similarity meets a preset threshold as the response knowledge context; Parse the user preference tags in the feature profile of the target object, and construct personalized response style constraint parameters based on the user preference tags; The query intent vector, the response knowledge context, and the personalized response style constraint parameters are input into a pre-trained dialogue generation model to generate interactive feedback information.

8. A feature-based profiling-based explanation and interaction device, characterized in that, The feature-based profile-based interpretation and interaction device includes: The data acquisition and parsing module is used to acquire source professional text data in the business scenario to be processed, and retrieve historical interaction behavior data of the target object from the database; The user profile building module is used to perform feature analysis on the historical interaction behavior data to build a target object feature profile, and to filter related business entities from the source professional text data based on the target object feature profile; The semantic explanation generation module is used to perform semantic analysis on the source professional text data using a natural language processing model to extract key business entities, and to convert the source professional text data into natural language explanation content based on the key business entities and the related business entities. The visual content generation module is used to synthesize dynamic visual presentation content corresponding to the natural language interpretation content based on the source professional text data and the key business entities using a visual generation model. A multimodal display control module is used to output the natural language explanation content and the dynamic visual demonstration content on the interactive terminal; The intelligent interactive feedback module is used to generate interactive feedback information for real-time query commands input by the interactive terminal based on the target object feature profile and the natural language interpretation content.

9. A computer device, characterized in that, The computer device includes a memory, a processor, and a feature-based profile interpretation and interaction program stored in the memory and executable on the processor. When executed by the processor, the feature-based profile interpretation and interaction program implements the steps of the feature-based profile interpretation and interaction method as described in any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, The storage medium stores an interpretation and interaction program based on feature profiling, which, when executed by a processor, implements the steps of the interpretation and interaction method based on feature profiling as described in any one of claims 1-7.