A multi-agent collaborative guided tour method and system driven by a script engine

By using a script engine-driven multi-agent collaborative tour guide method, which utilizes large language models and virtual avatars to generate personalized tour scripts in real time, the problem of outdated content and insufficient interactivity in museum tour guide systems is solved, thereby improving the visitor experience and the efficiency of cultural acquisition.

CN122197939APending Publication Date: 2026-06-12SHENZHEN KUAIYU TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN KUAIYU TECHNOLOGY CO LTD
Filing Date
2026-02-14
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing museum tour guide systems are outdated, lack interactivity, and fail to be personalized, resulting in low efficiency in visitor experience and cultural acquisition.

Method used

It adopts a script engine-driven multi-agent collaborative tour guide method, which generates personalized tour scripts in real time through a large language model, uses virtual intelligent agents to interact with the audience, and provides immersive tours by combining indoor positioning and multimodal databases.

Benefits of technology

It has improved the interactivity, informativeness, and fun of museum explanations, enabled personalized guided tours, and enhanced visitor participation and cultural enrichment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122197939A_ABST
    Figure CN122197939A_ABST
Patent Text Reader

Abstract

The present application relates to artificial intelligence application technology, aiming to provide a multi-agent collaborative guided tour explanation method and system driven by a script engine. It includes: setting up multiple agents with virtual images, obtaining audience personal basic information and agent parameters as the portrait features of each audience; based on the layout of the exhibition space and the related data of the exhibits, using a large language model to generate a guided tour script in real time according to the audience portrait features; according to the script interpretation content and the position change of the audience on the visiting route, using the agent running on the mobile intelligent terminal to provide the audience with exhibit explanation and guided tour services in the form of voice interaction and image display. The present application uses artificial intelligence and Internet of Things technology, through hierarchical content design, dynamic updating mechanism, cross-media narrative fusion and strong interaction technology application, to reconstruct the content system of museum guided tours centered on audience needs. It can dynamically generate and closed-loop optimize the cultural relic explanation script, greatly improving the service quality and content quality of museum explanation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of artificial intelligence application technology, and specifically relates to a method and system for multi-agent collaborative guided tour explanation driven by a script engine through dynamic generation and closed-loop optimization. Background Technology

[0002] Guided museum tours serve as a crucial bridge connecting exhibits and visitors, unlocking the historical, cultural, and artistic secrets behind artifacts, bringing static exhibits to life, and helping visitors understand their value and significance, avoiding a superficial visit. They also guide the pace of the tour, answer questions, enhance the visitor experience and knowledge acquisition, and truly realize the cultural dissemination function of museums.

[0003] However, existing museum guided tour systems generally suffer from a lack of both content and emotional engagement, manifesting as insufficient information depth, monotonous formats, outdated updates, and a lack of interactivity and personalized customization. This lack of content directly impacts the visitor experience and the efficiency of cultural acquisition. Specifically, it manifests in the following ways:

[0004] (1) Outdated and rigid content: Museum tour guide scripts need to be written by personnel with professional knowledge backgrounds. Most museum tour guide scripts have not been updated for many years and cannot incorporate new research results in a timely manner, thus conveying outdated information. (2) Superficial information: The content of the tour guide only includes basic information such as the name, age, and material of the exhibits, and lacks in-depth interpretation of the historical background, cultural connotation, craftsmanship and social significance of the cultural relics. (3) Lack of narrative storytelling: The language is mainly "explanatory" and straightforward, lacking storytelling. The exhibits are only described in general terms, without exploring the historical events or stories of the people behind the exhibits, which weakens the emotional resonance of the audience. (4) Lack of personalized and layered design: Ignoring group differences such as children and international audiences. The tour guide content is not written for children's cognitive characteristics, making it difficult for young audiences to read; the content also does not incorporate gamified exploration and situational performance, resulting in low participation of children. (5) Not adapted to interests and preferences: The depth of the content is not dynamically adjusted according to data such as the audience's dwell time and interactive behavior. For example, no academic extension materials (such as the latest archaeological reports) are provided to professional visitors, while popular analogies are lacking for general visitors. (6) One-way indoctrination: The guided tour system is mainly based on one-way output of "guide → audience" or "audio → listener". The audience cannot ask questions, provide feedback or participate in content co-creation, and passively receiving information can easily lead to "museum fatigue". (7) The content defects of the museum guided tour system not only weaken the educational function, but also hinder the breadth and depth of cultural dissemination. Cultural relics become "silent exhibits", and historical memory is difficult to be effectively passed on; the visitor's stay time is shortened and the revisit rate is reduced.

[0005] While some museums have begun experimenting with integrating indoor positioning and computer technology into their guided tour systems in an attempt to achieve intelligent explanations, this still remains at the level of a simple knowledge base retrieval strategy: "location → data retrieval → broadcast." The so-called intelligence merely acts as a selector switch between the visitor and the knowledge base. Meanwhile, artificial intelligence technology, which is increasingly being applied across various industries, has not yet been incorporated into museum guided tours.

[0006] This invention aims to propose an innovative solution to address the aforementioned problems. Summary of the Invention

[0007] The technical problem to be solved by the present invention is to overcome the shortcomings of the prior art and provide a multi-agent collaborative guided tour explanation method and system driven by a script engine.

[0008] To solve the technical problem, the solution of the present invention is:

[0009] A script engine-driven multi-agent collaborative guided tour explanation method is provided, including:

[0010] Based on the role positioning determined by the exhibition theme, multiple intelligent entities with virtual images are set up to provide visitors with human-like guidance and explanations;

[0011] The user's basic personal information and the parameters of the selected intelligent agent are obtained by using mobile smart terminals, which serve as the profile features of each user.

[0012] Based on the exhibition space layout and exhibit-related data, a large language model is used to generate a guided tour script in real time according to the characteristics of the audience profile; the intelligent agent selected by the audience is used as the protagonist in the script, and multiple other intelligent agents are matched to interact with the protagonist according to the script content;

[0013] Based on the script's content and the audience's changing locations along the tour route, an intelligent agent running on a mobile smart terminal provides exhibit explanations and guided tours through voice interaction and video displays.

[0014] As a preferred embodiment of the present invention, the exhibit-related data is stored in a multimodal database and dynamically associated with the coordinate position of the exhibit in the exhibition space; the multimodal database includes: a knowledge graph database storing exhibit and related information, a vector database embedding exhibit feature vectors, and a database related to the services provided by the exhibition hall.

[0015] As a preferred embodiment of the present invention, a full-scene spatial labeling is established based on the exhibition hall infrastructure and exhibition space layout, and multiple location coordinates are mapped in real time to form the spatial decision-making basis for guided tours; the location coordinates include at least: the exhibit booths, the real-time flow of visitors, the exhibition hall service facilities, and the staff positions.

[0016] As a preferred embodiment of the present invention, a set option is provided in the mobile smart terminal, and personal basic information and intelligent agent parameters are obtained after the audience agrees and confirms; the personal basic information includes at least gender, age range, and interest preferences; the intelligent agent parameters include at least identity, personality, image, knowledge domain, emotional preferences, and voice features; the audience's profile features are used as influencing factors to adjust the intelligent agent features and generate script content in real time according to preset rules.

[0017] As a preferred embodiment of the present invention, the method further includes using a mobile smart terminal to obtain questions temporarily raised by the audience, and after searching a multimodal database, providing answers related to the exhibits themselves or the exhibition hall services in the form of voice, text, or video.

[0018] As a preferred embodiment of the present invention, the steps of generating a guided tour script using a large language model include:

[0019] S1. Multimodal context awareness and data initialization

[0020] Construct the context vector of the current interaction :

[0021] = { , , , }

[0022] in: To create a profile of the audience; Real-time spatial coordinates based on indoor positioning technology ; The ID of the currently triggered exhibit; It is a collection of routes that the audience has already traveled and knowledge points they have already acquired;

[0023] S2, Instantiation and Personalization Modeling of Intelligent Agent Roles

[0024] Retrieve associated sets of agents from a pre-created agent IP library. For each intelligent agent Perform vectorized modeling:

[0025]

[0026]

[0027]

[0028]

[0029]

[0030]

[0031] in, Define a vector for the role's identity; For identity type; For the social relationship graph index, define the role and other agents ( The relationship between ) Define the function of the script; A personality parameter matrix; A specific coefficient used to reflect the identity or personality traits of a character; Affinity / Gentleness coefficient; Humor coefficient; This represents the professionalism coefficient. This is a dedicated knowledge boundary vector; The boundaries of time perception; Domain knowledge weights; Long / Short Memory Indicator; For digital image feature vectors; Basic model index; Features of clothing and props; For facial feature parameters; A library of habitual actions; Vectors representing interaction and expression patterns; For language style; Set the tone; For output length preference; The threshold for triggering the action;

[0032] S3. Knowledge Graph-Based Retrieval Enhancement Generation and Collaborative Script Arrangement

[0033] (1) Using the currently triggered exhibit as the anchor point, retrieve the related subgraphs in the knowledge graph that contain exhibit descriptions, historical facts, explanations, and relationships between related figures. And extract key knowledge point vectors ;

[0034] (2) The context vector constructed in step S1 The role setting implemented in step S2 Related subgraphs The knowledge constraints in the text, and the key knowledge point vector extracted in the previous step (1) As a constraint, it is injected into the system prompt words of the large language model;

[0035] (3) Based on multi-agent collaborative reasoning, the director-actor model is used for script generation;

[0036] The script generation logic follows these constraints:

[0037]

[0038] in, This refers to the generated script; It refers to a large language model; This refers to the directing team; This refers to the actor level;

[0039] During the script generation process, the director layer is used to plan the script structure and assign roles to various agents. The dialogue rounds; the actor layer is used by each agent based on the personality parameter matrix. Generate dialogues that match the character's persona. Using a large language model (LLM) based on a knowledge graph, historical consistency is verified to ensure that the deduced content does not contradict the key knowledge point vectors. The core facts;

[0040] S4, Emotional Speech Synthesis and Multimodal Rendering

[0041] The system analyzes emotional tags in the script, calculates real-time acoustic parameters, and converts the generated text script into an audio stream; it also introduces an emotion-acoustic mapping model into the dynamic acoustic parameter adjustment algorithm to achieve anthropomorphic performance.

[0042] Based on the tension of the script's plot, a pre-trained language AI model based on the BERT architecture is used for real-time reasoning and calculation to obtain the current character's anxiety / emotion index. ;

[0043] Calculate speech synthesis parameters and adjust acoustic parameters:

[0044] in, This refers to the fundamental frequency of the final synthesized speech. This refers to the fundamental frequency of the character's voice in a normal state; This refers to the emotional intensity regulation coefficient; It is a specific coefficient for a character, used to reflect the character's identity or personality traits;

[0045] Combined with a text-to-speech (TTS) engine, the final audio is synthesized and output with breathing, pauses, and intonation variations.

[0046] As a preferred embodiment of the present invention, the time spent by visitors at each guided tour point, the completion rate, the interactive content during the tour, and the evaluation content at the end of the tour are collected, and optimization is achieved in the subsequent process of generating guided tour scripts according to preset rules.

[0047] As a preferred embodiment of the present invention, the method further includes closed-loop optimization of the guided tour script based on reinforcement learning:

[0048] (1) Record the duration of the audience’s stay at each explanation point. Audio completion rate And the intention of interactive questions during the exhibition viewing process. ;

[0049] (2) Establish a script quality rating function based on feedback :

[0050]

[0051] in: Information density; For emotional resonance; Potential for interaction; These are the weighting coefficients;

[0052] (3) Fine-tuning optimization strategy: As a reward signal, the near-end policy optimization algorithm is used to optimize the prompt word template and the agent. The parameters are fine-tuned to make the next generated script more suitable for the preferences of this type of audience;

[0053] (4) After multiple iterations, the automatic convergence of the profile features of various audiences is achieved until the optimal script structure is obtained.

[0054] The present invention further provides a multi-agent collaborative guided tour system driven by a script engine. The system includes a three-layer architecture: an infrastructure layer, an AI processing layer, and an application layer. The infrastructure layer includes an intelligent space management system and an exhibit knowledge management system. The AI ​​processing layer includes an intelligent agent management and setting system and a script engine system. The application layer includes a mobile guided tour program module.

[0055] The intelligent space management system is used to achieve sub-meter level precision indoor positioning inside the exhibition hall building and to conduct physical and spatial synchronous interaction based on the real-time location of visitors in the exhibition space.

[0056] The exhibit knowledge management system is used to integrate exhibit data, graphic descriptions, high-definition images, 3D models and audio-visual materials, and to build a multimodal database that is dynamically associated with the coordinates of the exhibit location. This allows the script engine system to call up reliable information when using a large language model for reasoning, and to provide explanations or Q&A based on audience questions.

[0057] The intelligent agent management and setting system creates an intelligent agent library based on the historical background of the exhibition theme, constructs intelligent agent roles that conform to the script's deductive logic, and sets the intelligent agent's identity, personality, knowledge domain, image, emotion, voice characteristics, long-term memory, short-term memory, and interaction style.

[0058] The script engine system uses a structured multimodal database as its data foundation and combines parameters generated from audience profile features to generate guided tour scripts in real time using a large language model.

[0059] The mobile tour guide module runs on the mobile smart terminal provided to each audience member. It obtains real-time location information and script performance progress through wireless positioning and communication. Through two-way human-computer interaction, the virtual image of the intelligent agent provides personalized, anthropomorphic explanation services to the audience.

[0060] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the aforementioned script engine-driven multi-agent collaborative guided tour explanation method.

[0061] Compared with the prior art, the beneficial effects of the present invention are:

[0062] 1. This invention utilizes advanced technologies such as large language models, speech recognition, speech synthesis, sentiment analysis, intelligent agents, and indoor navigation, along with artificial intelligence technology, to reconstruct the content system of museum guided tours by focusing on audience needs and through layered content design, dynamic update mechanisms, cross-media narrative integration, and the application of highly interactive technologies.

[0063] 2. This invention utilizes advanced artificial intelligence and Internet of Things technologies to automate information collection and content search, and to intelligently generate and deliver content. It reduces the reliance on manually written content by industry experts for museum explanations, elevating information delivery to a level of cultural empathy, and enhancing the interactivity, educational value, engagement, and timeliness and accuracy of explanations in cultural and museum venues.

[0064] 3. This invention acquires basic visitor information through smart terminals and combines it with data collected by an indoor positioning system, such as visitor routes and dwell times within the museum, to create personalized profiles for each visitor. Based on this, an artificial intelligence-powered script engine is used to dynamically generate and continuously optimize artifact interpretation scripts, significantly improving the efficiency, quality, and engagement of the interpretation content. Furthermore, IoT technologies, such as smart spaces, are used to collect user feedback for timely optimization of the interpretation content. Finally, multiple intelligent agents provide tailored explanations, presenting each visitor with a unique script through performance, ensuring that every visitor receives an interpretation tailored to their individual needs. Therefore, this invention significantly enhances the service quality and content of museum interpretation services, filling a gap in existing technologies.

[0065] 4. This invention deeply integrates AI technology to significantly improve the efficiency, depth, interest, and timeliness of the content generated; it realizes the transformation from one-way indoctrination to immersive empathy, from static display to dynamic narrative, and from "object"-centered to "human"-centered, significantly enhancing audience participation and cultural gain, and has significant technological progress and practical value. Attached Figure Description

[0066] Figure 1 This is an architecture diagram of a multi-agent collaborative guided tour system.

[0067] Figure 2 This is a sample interface diagram of the mobile navigation program module.

[0068] Figure 3 This is a structural diagram of the exhibit knowledge management system.

[0069] Figure 4 This is a structural block diagram of an intelligent space management system.

[0070] Figure 5 Define the system structure diagram for agent management.

[0071] Figure 6 This is a structural diagram of the script engine system.

[0072] Figure 7 This is a flowchart illustrating the implementation of the multi-agent collaborative guided tour explanation method described in this invention.

[0073] Figure 8 This is a flowchart of a multi-agent collaborative algorithm based on a script engine. Detailed Implementation

[0074] First, it should be noted that this invention relates to artificial intelligence technology, specifically an application of computer technology in the niche field of exhibition hall tour guiding. The implementation of this invention involves the application of multiple software functional modules. The applicant believes that, after carefully reading the application documents and accurately understanding the implementation principles and objectives of this invention, and in conjunction with existing publicly known technologies, those skilled in the art can fully utilize their software programming skills to implement this invention. All references in this application fall within this scope, and the applicant will not list them all further.

[0075] Those skilled in the art will understand that, besides implementing a portion of the system and its various devices, modules, and units provided by this invention in the form of purely computer-readable program code, the same functions can be achieved entirely through logical programming of the method steps, enabling the system and its various devices, modules, and units to function in the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, the system and its various devices, modules, and units provided by this invention can be considered a hardware component, and the devices, modules, and units included therein for implementing various functions can also be considered structures within the hardware component; alternatively, the devices, modules, and units for implementing various functions can be considered both software modules implementing the method and structures within the hardware component.

[0076] It should also be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0077] Part 1: Overview of the Implementation Schemes of the Invention

[0078] I. Multi-Agent Collaborative Guided Tour System

[0079] This invention belongs to the field of artificial intelligence application technology and aims to solve the common problems of existing guide systems in museums, exhibition halls, and cultural relics sites, such as lack of content, outdated updates, monotonous formats, and lack of interactivity and personalized adaptation. The core of this invention lies in its pioneering script engine and multi-agent collaborative architecture, the architecture of which is as follows: Figure 1As shown, it can be divided into three layers: infrastructure layer (intelligent space management system + exhibit knowledge management system), AI processing layer (intelligent agent management setting system + script engine system), and application layer (mobile terminal tour guide program module).

[0080] (a) Infrastructure layer

[0081] 1. Intelligent Space Management System

[0082] Based on high-precision indoor positioning technology (such as Bluetooth AOA indoor positioning), the real-time accurate positioning of visitors and exhibits can be achieved, triggering explanations and interactions based on spatiotemporal location.

[0083] Specifically, utilizing Bluetooth AOA positioning technology to achieve sub-meter level indoor positioning accuracy (≤0.5 meters) can provide underlying technical support for functions such as real-time positioning, precise navigation, intelligent management, and intelligent analysis of operational data by upgrading and transforming the museum's information infrastructure. It can dynamically trigger explanations of the nearest exhibits, breaking away from the traditional museum visitor-centric approach and centering the visit on the "person" within the intelligent space. Based on the visitor's real-time location, it achieves precise physical-real-time spatiotemporal synchronous interaction, providing precise guided tours and interactive games based on spatiotemporal location. It can achieve full-scene spatial labeling: real-time mapping of four types of location coordinates: artifact exhibits, visitor flow, facilities such as restrooms / elevators, and staff locations, forming the basis for spatial decision-making.

[0084] 2. Exhibit Knowledge Management System

[0085] This system is used to construct a structured, multimodal (text / image / 3D model / audio) knowledge graph (using a graph database + vector database), providing a reliable data source for script generation and intelligent agent interaction, and dynamically binding it to the exhibition location.

[0086] Specifically, this system integrates relevant data (age, material, historical background), textual and graphic descriptions, high-definition images, 3D models, and audio-visual materials of exhibits / artifacts to construct a multimodal knowledge graph. This graph is used by the tour guide program to call upon reliable information during large language model inference, providing visitors with explanations and Q&A sessions of varying breadth and depth. The system provides a multimodal database, including: a knowledge graph database storing exhibit and related information, a vector database embedding exhibit feature vectors, and a database related to the services provided by the exhibition hall. It dynamically associates artifact data with exhibition location coordinates (such as exhibition hall number and display case ID), supporting location-triggered explanations. A knowledge graph database (such as Neo4j) is used to store artifact relationships, combined with the vector database to achieve semantic retrieval.

[0087] For example, the knowledge graph database (Neo4j) stores the relationships between cultural relics (such as "Oracle Bone Inscriptions → Excavation History of Yin Ruins in Anyang → Shang Dynasty Archaeology"). A vector database is used to embed the feature vectors of cultural relics (material / age / craftsmanship), supporting semantic retrieval. The two are then data-bound: dynamically linking 3D models of cultural relics, high-definition images, academic papers, etc., with the exhibition coordinates.

[0088] (a) AI processing layer

[0089] 1. Intelligent Agent Management and Setting System

[0090] This system is used to create and manage a library of museum-specific intelligent agent IPs, setting their digital image, voice, personality, knowledge base, memory (long / short term) and interaction style, and is responsible for the anthropomorphic interpretation of the script.

[0091] Specifically, a museum-specific intelligent agent database is created based on historical background, defining each agent's identity, personality (rigorous / humorous), knowledge domain (ceramics / calligraphy and painting), appearance, emotions, voice characteristics (dialect / speech speed), long-term memory, short-term memory, and interaction style. Short-term memory stores the current dialogue context, ensuring the continuity of multi-turn interactions; long-term memory records audience preferences (such as personal tastes) to optimize subsequent scripts.

[0092] An example of building an intelligent agent IP:

[0093] Identity settings: historical figures (such as "King Shang, Fu Hao"), and fictional characters (such as "Fu Hao who traveled to the modern world").

[0094] Multidimensional parameters: knowledge domain (oracle bone / bronze / ceramics / calligraphy and painting), dialect speech database, and emotion response rules (surprise / sadness trigger conditions).

[0095] Short-term memory is set to cache the current dialogue context (e.g., when a viewer asks, "Are oracle bone inscriptions accurate?", it is linked to the previous text, "Each oracle bone inscription has a corresponding verification word").

[0096] Long-term memory is set to record user preferences (e.g., frequently asking war-related questions → subsequently recommending the "evolution of ancient weapons" route and related cultural and creative products).

[0097] 2. Script Engine System

[0098] This system efficiently generates engaging, plot-driven, multi-branching, and hierarchically adapted scripts by searching historical knowledge and background stories, and combining information such as agent role settings (identity, personality, knowledge domain, emotions, etc.), audience profiles (age, interests, etc.), and real-time location using Large Language Model (LLM). It also supports dynamic closed-loop optimization based on audience feedback (adjusting script parameters through reinforcement learning).

[0099] Specifically, by constructing a reasonable prompt vocabulary, utilizing Large Language Model (LLM) for script creation can significantly improve efficiency, enhance the comprehensiveness, diversity, and interest of the content, and reduce the workload of manual script creation. The script engine system described in this invention is an intelligent creation platform based on artificial intelligence technology. Many similar text writing technologies have been launched in the existing technology, such as the Kimi platform, Doubao platform, Tongyi Qianwen, and Deepseek large model. Unlike these technologies, this invention innovatively proposes combining the selected intelligent agent IP and the audience's basic personal information during the script creation process. Based on this personal profile, corresponding constraints are imposed on the script creation process, and then the automatic generation and optimization of the script are achieved by integrating large language model theory, knowledge graphs, and other technologies. This system uses a structured database (cultural relic knowledge base) as its data foundation, combines input parameters such as theme, character settings, and plot development, and uses the large language model to generate a first draft of the script. It also supports adaptation to multiple audience groups (such as children, women, and the elderly) and standardized script format output. These are technical effects that existing large language model writing modules / functions on various platforms cannot achieve.

[0100] The Large Language Model (LLM) described in this invention can use existing model versions commonly used in current technologies, such as Deepseek and Qwen. There may be slight differences in reasoning ability between different models, but these have little impact on the final presentation.

[0101] An example of a screenwriting process:

[0102] Audience Input: User Profile {Age: 25, Interests: Military History}, Location: G023 Exhibition Area, Matching with Intelligent Agent Role Database

[0103] Writing steps:

[0104] 1. Knowledge Retrieval: Utilizes knowledge graphs to link cultural relics to relevant background events (e.g., input "oracle bone script" → output "Shang Dynasty divination process + excavation history of Yin Ruins in Anyang").

[0105] 2. Query the intelligent agent database and match script characters;

[0106] 3. Construct LLM prompts: Incorporate character settings (rigorous scholar / majestic Shang king), language style (suspenseful / childlike), and trivia (e.g., "the priesthood system of the Shang Dynasty").

[0107] 4. Combine plot characters and clues to generate a multi-branch script → Output the script in JSON format; including dialogue text, emotion markers, and branching options (e.g., adding interactive tasks).

[0108] Script dynamic closed-loop optimization mechanism:

[0109] Audience behavior feedback →→→ Reinforcement learning strategy update →→→ Script generation parameter adjustment →→→ Multi-agent presentation optimization.

[0110] For example, feedback collection: recording user behavior (stay time > 120 seconds → indicating high interest), voice questions (analyzing user preferences), and evaluations.

[0111] The script engine system can optionally include a human polishing interface. The Large Language Model (LLM) can automatically create batches of scripts without human intervention. However, after integrating the human polishing interface, keywords or constraints related to script content review or verification can be added as needed. Based on the introduction of these keywords or constraints, the LLM can further improve the plot logic and text quality of the original script, or prevent the appearance of inappropriate content that violates social ethics.

[0112] III. Application Layer: Mobile Navigation Apps (Mobile Applications for Museums / Scenic Spots)

[0113] This program module is installed on the mobile smart terminal that follows the audience and integrates multimodal interaction (LLM dialogue / image recognition / touch control). Through the above systems, it provides the audience with a personalized guided tour experience that is accompanied, location-triggered, immersive, and interactive.

[0114] Specifically, the guided tour program utilizes various advanced interactive methods such as image recognition and touchscreens to achieve automatic location-based explanations, and provides accurate explanation services to the audience based on personalized scripts generated in real time using LLM. Secondly, a digital human with a virtual avatar (AI intelligent agent) accompanies the audience, enabling two-way interaction throughout the tour and accompanying the audience to experience a personalized and immersive "human-like" experience.

[0115] An example program module runtime interface framework is as follows: Figure 2 As shown, this program module can be used to achieve the following:

[0116] 1. Multimodal interaction:

[0117] Image recognition: Visitors take photos of exhibits → the computer vision model is used to identify the cultural relics → explanations of cultural relics are delivered.

[0118] Voice dialogue: Integrates text-to-speech (TTS) and sentiment analysis, and combines a professional knowledge base to answer user questions;

[0119] 2. Multi-agent companionship:

[0120] Step 1: Location triggering and multi-agent collaborative deduction;

[0121] Visitors enter the exhibition area (Shang and Zhou Dynasty Oracle Bones) → The positioning system sends coordinates to the script engine → The engine combines user profiles to retrieve scripts → Multiple intelligent agents collaboratively perform according to the script;

[0122] Step 2: Closed-loop optimization mechanism

[0123] Collect feedback → Subsequent script updates.

[0124] II. Multi-Agent Collaborative Guided Tour Explanation Method

[0125] Based on the system structure described above, this invention proposes a multi-agent collaborative guided tour explanation method driven by a script engine. This method includes the following steps:

[0126] 1. Based on the role positioning determined by the exhibition theme, set up multiple intelligent agents with virtual images to provide visitors with human-like guidance and explanation;

[0127] 2. Use mobile smart terminals to obtain the audience's basic personal information and the parameters of the selected intelligent agent, as the profile features of each audience member;

[0128] The system provides set options in mobile smart terminals, and after audience confirmation, obtains basic personal information and intelligent agent parameters. The basic personal information includes at least gender, age range, and interest preferences. The intelligent agent parameters include at least identity, personality, image, knowledge domain, emotional preferences, and voice characteristics. Subsequently, the audience's profile characteristics are used as influencing factors to adjust the intelligent agent characteristics and generate script content in real time according to preset rules.

[0129] 3. Based on the exhibition space layout and exhibit-related data, a large language model is used to generate a guided tour script in real time according to the characteristics of the audience profile; the intelligent agent selected by the audience is used as the protagonist in the script, and multiple other intelligent agents are matched to interact with the protagonist according to the script content;

[0130] A full-scene spatial labeling system is established based on the exhibition hall infrastructure and exhibition space layout, and multiple location coordinates are mapped in real time to form the spatial decision-making basis for guided tours; the location coordinates include at least: the exhibit booths, the real-time visitor flow, the exhibition hall service facilities, and the staff locations.

[0131] The exhibit-related data is stored in a multimodal database and dynamically associated with the coordinates of the exhibits in the exhibition space. The multimodal database includes: a knowledge graph database storing exhibits and related information, a vector database embedding exhibit feature vectors, and a database related to the services provided by the exhibition hall.

[0132] 4. Based on the script's content and the audience's changing locations along the tour route, utilize intelligent agents running on mobile smart terminals to provide exhibit explanations and guided tours through voice interaction and video displays.

[0133] The method further includes: collecting the time visitors spend at each guided tour point, the completion rate, the interactive content during the tour, and the evaluation content at the end of the tour, and optimizing the subsequent generation of guided tour scripts according to preset rules.

[0134] The method also includes: using mobile smart terminals to obtain questions raised by visitors on an ad-hoc basis, and after searching a multimodal database, providing answers related to the exhibits themselves or the exhibition hall services in the form of voice, text, or video.

[0135] Based on the understanding of those skilled in the art, the multi-agent collaborative guided tour method described above is entirely based on computer technology, and the entire implementation process includes data acquisition, organization, calculation, and result display.

[0136] The technology can be implemented as a computing device comprising: at least one processor and a memory communicatively connected to the at least one processor, wherein the memory stores instructions that are executed by the at least one processor to cause the at least one processor to perform the aforementioned script engine-driven multi-agent collaborative guided tour explanation method.

[0137] Therefore, it can also be understood that the technical implementation of the present invention can also be embodied in a computer-readable storage medium, which stores computer instructions for causing the computer to execute the aforementioned multi-agent collaborative guided tour explanation method driven by a script engine.

[0138] Part Two: A Specific Application Example

[0139] The following section provides a detailed explanation of the implementation of this invention using a real-world example implemented in a museum.

[0140] Intelligent Space Management System: For the intelligent space upgrade and renovation of museums, Bluetooth AOA base stations or Bluetooth positioning beacons are deployed in museums / scenic areas to achieve indoor positioning accuracy of ≤0.5 meters. Through technologies such as Bluetooth and ultrasound, decimeter-level positioning is achieved in indoor spaces, used for real-time location marking of four types of coordinates: the location of artifacts on display, the real-time location of visitors, the location of public facilities (restrooms / elevators), and the location of on-site staff.

[0141] Exhibits Knowledge Management System: This system organizes and stores museum artifact data, descriptions, images, videos, and other content in a structured manner, and links them to the exhibition locations to support user visits.

[0142] Agent Management and Setting System: Create an intelligent agent's IP digital image, generate the agent's actions / poses, set the agent's personality, set the agent's background knowledge base, set the agent's voice, configure the invocation of the large language model (LLM) for the agent, manage the agent's long / short-term memory, manage the agent's interaction conversation records, etc., based on the historical background of the venue.

[0143] Script Engine System: Formulate the dynamic generation process of the script, including: input parameters, knowledge retrieval, construction of large language model prompt words, multi-branch script output, and. Further introduce a closed-loop optimization mechanism to update the script: Optimize the script content according to the stay time, completion rate, and questions asked after listening to the explanation at this explanation point during the audience's visit, in order to achieve a better visit interaction experience and knowledge dissemination.

[0144] Mobile Tour Guide Program Module: Before the audience visits, they can choose to input basic information and their favorite intelligent agent image; during the visit, it automatically locates and triggers the explanation at the explanation point, presents diverse and multi-modal (graphs, texts, videos) knowledge on the display interface, and plays it through the intelligent agent's voice with multi-role settings.

[0145] Taking the introduction of oracle bone collections in a certain museum as an example, the content creation process of the script engine system is explained:

[0146] 1. Configure multiple intelligent agents with different personalities

[0147] Digital Explanation Officer: Used for the narration, responsible for scene connection and basic historical fact explanation, introducing the historical background from a macroscopic perspective. Personality: Kind, enthusiastic, and generous.

[0148] King Wu Ding of the Shang Dynasty (the protagonist corresponding to the intelligent agent selected by the audience): A wise monarch, concerned about national affairs and also very concerned about his wife, Fu Hao. He wears a feather crown, a luxurious leather armor, a jade axe on his waist, and has sharp eyes but a worried look. Personality: Majestic, decisive, but also has a gentle side, especially towards Fu Hao. He seems helpless when facing Fu Hao's childbirth, showing a cute contrast.

[0149] Zhenren Que: An official responsible for divination, meticulous about the divination ceremony. He wears a plain hemp robe of a diviner, holds a tortoise shell or animal bone, and conducts divination. Personality: Devout, rigorous, and speaks slowly.

[0150] Fu Hao: The queen of Wu Ding and also a female general. She is about to give birth at this time, still wears a short armor with a pregnant belly, has a bronze dagger hanging on her waist, and has a high coiled发髻, looking heroic. Personality: Strong and brave. Although she is worried about childbirth, she still shows calmness and courage.

[0151] Each intelligent agent is modeled using a five-dimensional vector:

[0152] Agent = (Role, Personality, Knowledge, Appearance, Interaction)

[0153] Personality = (κ_authority, β_warmth, γ_humor, δ_expertise)

[0154] In this context, Agent refers to an intelligent agent; Role refers to the definition of role identity; Personality refers to personality trait parameters; Knowledge refers to the character's exclusive knowledge boundaries; Appearance refers to the character's physical appearance (virtual avatar); Interaction refers to the interaction and expression patterns; κ_authority refers to the authority coefficient; β_warmth refers to the warmth coefficient; γ_humor refers to the humor coefficient; and δ_expertise refers to the expertise coefficient.

[0155] (Example: King Wu Ding of the Shang Dynasty had an authority coefficient κ_authority=0.85 and a warmth coefficient β_warmth=0.7)

[0156] 2. Retrieve overall information and background knowledge about cultural relics from multimodal databases (such as pictures of oracle bones, labels of their locations, classification of labels, basic introductions to cultural relics, etc.).

[0157] For example, the original text:

[0158] Oracle bone inscription: "Divination on the day of Wuchen, 㱿divination: Will Fu Hao give birth, auspicious?"

[0159] Verification of the birth date: "On the evening of the Bingzi day, the baby was born on the Dingchou day. It was auspicious. One."

[0160] The meaning is: On the day of Wuchen, the diviner asked: Will Fu Hao's delivery have a good outcome? The answer is: The delivery started on the night of Bingzi and ended in the early morning of Dingchou, and the outcome was very good (a boy was born).

[0161] 3. The Large Language Model (LLM) combines multi-agent roles, artifact knowledge, and audience characteristics such as age and gender to generate a real-time guided tour script. First, a digital narrator introduces the basic information about the artifact and the dangers of childbirth in the Shang Dynasty. Then, Fu Hao appears and speaks with the Shang King (as the protagonist), revealing the King's concerns and ordering the diviner to perform divination, inscribe and verify the oracle bone inscriptions, and interpret them. Through engaging multi-agent role-playing and explanation, the audience learns to understand oracle bone script, its format, and is familiar with the social, military, political, and cultural background of the Shang Dynasty.

[0162] The core innovation of this invention lies in the multi-agent collaborative computing method based on a script engine. This algorithm dynamically schedules multiple agent roles by perceiving user profiles and the spatiotemporal environment, utilizes Large Language Models (LLM) and Knowledge Graphs (KG) to generate a guided tour script that combines historical accuracy with dramatic tension, and achieves closed-loop optimization through reinforcement learning.

[0163] The algorithm mainly consists of the following five core steps:

[0164] Step 1: Multimodal context awareness and data initialization

[0165] First, relevant data or information is obtained through the intelligent space management system and mobile application. Then, the context vector of the current interaction is constructed. .

[0166] = { , , , }

[0167] in: Audience profile characteristics (age group, interest tags, historical preferences); Real-time spatial coordinates based on indoor positioning technology ; : The ID of the artifact / exhibit that is currently triggered (e.g., Oracle Bone Script-001); The viewer's current browsing path and the set of knowledge points they have acquired;

[0168] Step 2: Instantiation and Personalization Modeling of Intelligent Agent Roles

[0169] according to Retrieve the associated set of agents from a pre-created agent IP library. ,

[0170] Examples: number interpreter, King Wu Ding of Shang, Fu Hao, diviner.

[0171] For each intelligent agent Perform vectorized modeling:

[0172] in:

[0173] Role Identity Definition Vector

[0174] This parameter is used to define the social relationships and functional positioning of an agent in the script, and determines its "voice" and "perspective" in multi-agent collaboration.

[0175] Among them, : Identity type (e.g., historical figure, fictional character, omniscient narrator); : Social relationship graph index, defining the relationship between this character and other agents ( ), such as husband-wife, monarch-minister, adversarial relationship, etc., which will be used as constraints in the Prompt to prevent the character from exceeding the etiquette or logic; : Script function positioning (e.g., narrative pusher, conflict maker, emotional lubricant, knowledge interpreter).

[0176] Example: For "King Wu Ding of the Shang Dynasty", It is defined as: {Historical existent figure, husband of Fu Hao / monarch of the Shang Dynasty, core conflict initiator (initiated divination due to concerns)}.

[0177] : Personality parameter matrix (Personality Matrix)

[0178] Among them, : Authority coefficient; : Affinity / gentleness coefficient; : Humor coefficient; : Expertise coefficient.

[0179] Example: For the character "Diviner Pian", set , indicating that he has high authority and high affinity for specific objects (such as Fu Hao).

[0180] : Exclusive knowledge boundary vector (Knowledge Domain)

[0181]

[0182] Among them, : Time cognition boundary. Historical figures cannot know events after their living era (for example, the king of the Shang Dynasty cannot know about the Tang Dynasty), while the "digital commentator" has a full-time axis perspective. : Domain knowledge weight. Defining the professional weight of this character in military, etiquette, agriculture or modern archaeology. : Long / short-term memory pointer. Pointing to the life大事记 (long-term memory) of this character and the conversations that have occurred in the current tour path (short-term memory).

[0183] Example: For "Diviner Pian", his Restricted to {during the reign of Wu Ding in the Shang Dynasty and before, divination / sacrifice / turtle shell processing, only knowing the current divination result and not knowing the final verification result}.

[0184] : Appearance Features (Digital Image Feature Vector)

[0185] This parameter is used to control the visual rendering of the agent by the front-end display device, achieving the unity of text, graphics, sound, and images.

[0186]

[0187] Among them, : Index of the basic model (binding to specific 3D models or 2D Live2D image files); : Features of clothing and props (such as: feather crown, leather armor, jade battle ax, turtle shell); : Facial feature parameters (sense of age, sense of solemnity, sense of exhaustion); : Library of habitual actions (such as: stroking the stomach, trembling while holding a divination bone, standing with hands behind the back);

[0188] Example: For "Fu Hao", Described as: {female general model, pregnant with a bulging abdomen / wearing short armor / hanging a bronze dagger around the waist, heroic and with a slightly maternal glow, protecting the abdomen action / holding the sword action}.

[0189] : Interaction Pattern Vector

[0190]

[0191] Among them, : Language style, defining the syntactic features of text generation. For example, "semi-classical and semi-colloquial" (used for ancient characters to enhance immersion but ensure comprehensibility) or "modern spoken language" (used for digital narrators to ensure clarity). Tone setting, the default intonation and emotional benchmark of the character (such as: anxious,沉稳, brisk). : Output length preference, controlling the amount of speech of the character in a single-round conversation. For example, a majestic monarch may be "as sparing of words as gold", while an enthusiastic narrator may be "detailed and comprehensive". : Action trigger threshold, defining what degree of emotional fluctuation will trigger the corresponding body movements (such as: taking a step back when surprised, pacing when thinking).

[0192] Example: For "Zhenren Qia", Set as {archaic and solemn, slow and deliberate, mainly short sentences, accompanied by the action of shaking the turtle shell during divination}; for "Digital Narrator", The setting is {Modern Standard Mandarin, warm and friendly, explanation of medium to long sentences, gesture of pointing}.

[0193] Step 3: Knowledge Graph-Based RAG (Retrieval Enhanced Generation) and Script Collaborative Arrangement

[0194] (1) Knowledge retrieval:

[0195] Based on the currently triggered exhibit Using anchor points, retrieve related subgraphs within the knowledge graph. (Including historical facts, original oracle bone inscriptions, interpretations, and relationships between related figures), and extracting key knowledge point vectors. .

[0196] (2) Constructing prompts:

[0197] The context vector constructed in step S1 The role setting implemented in step S2 And the key knowledge point vector extracted in the previous step As a constraint, it is injected into the system prompt words of the large language model;

[0198] (3) Multi-agent collaborative reasoning (Chain-of-Agents) script generation:

[0199] The script engine uses a "director-actor" model for script generation: the director layer is used to plan the script structure (opening, conflict, climax, ending), and assign roles to actors. Dialogue turns; the actor layer is used for each agent based on... Generate dialogues that match the character's persona. The core idea of ​​this model is to introduce a "coordinator" (i.e., the "director") to break down tasks, plan processes, or provide guidance, while one or more specialized models (i.e., the "actors") specifically execute the tasks.

[0200] The script generation logic follows these constraints:

[0201]

[0202] in, This refers to the generated script; It refers to a large language model; This refers to the directing team; This refers to the actor level;

[0203] During the script generation process, a Large Language Model (LLM) is used to perform historical consistency verification based on a knowledge graph, ensuring that the dramatic content does not violate key knowledge point vectors. The core facts (such as the auspicious or inauspicious outcomes of divination).

[0204] Step 4: Emotional Speech Synthesis and Multimodal Rendering

[0205] To address the issue of monotonous and mechanical traditional tour guide audio, this invention introduces a dynamic acoustic parameter adjustment algorithm. By analyzing emotion tags in the script, real-time acoustic parameters are calculated. The system then converts the generated text script into an audio stream. To achieve a more human-like performance, the algorithm incorporates an "emotion-acoustic mapping model."

[0206] (1) Calculation of emotional state:

[0207] Based on the tension of the script's plot, a pre-trained language AI model (based on the BERT architecture) is used for real-time inference and calculation to obtain the current character's anxiety / emotion index. For details, please refer to the paper (BERT-ERC: Fine-tuning BERTis Enough for Emotion Recognition in Conversation).

[0208] (2) Acoustic parameter adjustment: The speech synthesis parameters are calculated using the following formula (taking the fundamental frequency pitch as an example): in, This refers to the fundamental frequency of the final synthesized speech. This refers to the fundamental frequency of the character's voice in a normal state; This refers to the emotional intensity regulation coefficient; It is a specific coefficient for a character, used to reflect the character's identity or personality traits;

[0209] For example, when King Wu Ding of the Shang Dynasty anxiously inquired about the results of the divination: The fundamental frequency is 120Hz, and the majesty factor is 0.85, ensuring that the sound has both a sense of authority and reflects anxiety.

[0210] (3) Synthesized output:

[0211] Combined with a TTS (text-to-speech) engine, the final audio is generated with breathing, pauses, and intonation variations.

[0212] Step S5: Reinforcement Learning-Based Play Loop Closure Optimization (RLHF)

[0213] To continuously improve the guided tour experience, this invention establishes a closed-loop optimization mechanism for guided tour scripts based on reinforcement learning.

[0214] (1) Data collection:

[0215] Record the duration of visitor stay at the explanation point. Audio completion rate and interactive question intent .

[0216] The performance evaluation matrix is ​​shown in the table below as an example:

[0217]

[0218] (2) Scoring calculation:

[0219] Establish a feedback-based script quality rating function :

[0220]

[0221] in: Information density is inferred from the depth of the audience's questions. Emotional resonance and completion rate And positively correlated with voice sentiment analysis; Interactive potential, calculated based on the frequency of audience interaction, is positively correlated with subsequent exploratory behaviors. The weighting coefficient is initially set to [0.4, 0.4, 0.2], and can be manually intervened and adjusted in real time based on the results.

[0222] (3) Fine-tuning / optimization strategy:

[0223] Will As a reward signal, the PPO (Proximal Policy Optimization) algorithm is used to optimize the prompt word templates of the script engine and the agent. The parameters are fine-tuned to make the next generated script more suitable for the preferences of this type of audience.

[0224] (4) After multiple iterations, it can automatically converge for different profiles (such as children, scholars, and audiences) until the optimal script structure is obtained.

[0225] Explanation of some terms related to the implementation of this invention:

[0226] 1. Vectorized Modeling

[0227] In this invention, vector modeling refers to the process of mapping the abstract attributes of an agent (such as personality, identity, knowledge domain, interaction style, etc.) into a multi-dimensional numerical vector that can be computed and quantified by a computer.

[0228] Technical implications: By constructing a high-dimensional vector space, unstructured character descriptions are transformed into mathematical expressions. For example, the personality trait "majesty" is quantified into a numerical value within the interval [0, 1], and "historical knowledge" is mapped into a specific semantic embedding vector.

[0229] Its role in this invention is to enable parameterized control of personality differences among multiple agents, allowing the Large Language Model (LLM) to precisely control the tone, behavioral logic, and interaction patterns of characters by calculating vector similarity or weight matrices when generating scripts, thus ensuring a personalized, anthropomorphic effect.

[0230] 2. RAG (Retrieval-Augmented Generation)

[0231] RAG is a technical architecture that combines a pre-trained large language model (LLM) with an external knowledge base retrieval system.

[0232] Technical implications: Traditional LLM relies solely on internal parameter knowledge during training, which can easily lead to "illusions" (i.e., generating erroneous information that does not conform to historical facts). The RAG architecture, before generating an answer, first retrieves relevant document fragments from an external private database (the "back-end cultural relic knowledge management system" in this invention) based on the user's input (Query), and then inputs the retrieved facts as "context" into the model, forcing the model to generate information based on these retrieved facts.

[0233] Its role in this invention is to solve the problem of general large-scale models lacking or being inaccurate in their knowledge of specific museum artifacts (such as obscure oracle bone inscriptions). Through RAG, this system ensures that the generated guided tour scripts are not only "interesting" but also strictly adhere to historical research (such as the interpretation of auspicious and inauspicious divination inscriptions), achieving a combination of "historical accuracy" and "narrative generation capability".

[0234] 3. Reinforcement Learning (RL)

[0235] Reinforcement learning is an important branch of machine learning. It describes the process by which an agent learns the optimal policy through a trial-and-error mechanism in order to maximize a certain cumulative reward in an environment.

[0236] Technical implications: Unlike supervised learning, which requires pre-labeled correct answers, reinforcement learning allows the system to automatically adjust parameters in its interaction with the environment by defining the "environment" (audience / guided scene), "actions" (generated script / voice), "states" (dwell time / completion rate), and "reward function" (Score).

[0237] Its role in this invention: Used for closed-loop optimization of the script engine (step S5). The system uses the audience's actual viewing behavior (did they finish listening? Did they interact?) as feedback signals, and automatically adjusts the Prompt template or agent personality parameters generated by the script through reinforcement learning algorithms, enabling the system to evolve and generate increasingly audience-friendly explanations.

[0238] 4. PPO Algorithm (Proximal Policy Optimization)

[0239] PPO is an efficient reinforcement learning algorithm proposed by OpenAI that is widely used in current fine-tuning of large language models.

[0240] Technical implications: PPO belongs to the Policy Gradient (PGP) algorithm class. Its core innovation lies in the introduction of a "clipping mechanism," which limits the magnitude of each model parameter update. This ensures that the new policy does not deviate too far from the old policy, thereby preventing a sudden performance collapse during optimization and guaranteeing training stability.

[0241] Its role in this invention: In the script optimization stage of this invention, the PPO algorithm is used to fine-tune the model parameters of the script engine based on the calculated "script fun score" (Reward). Compared with other algorithms, PPO can handle complex multimodal (text + voice + action) feedback more stably, preventing drastic and uncontrollable fluctuations in script style due to extreme feedback from individual viewers, and ensuring a smooth improvement in the guided tour experience.

[0242] The specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art can make various modifications or variations within the scope of the claims, which do not affect the essence of the present invention.

Claims

1. A multi-agent collaborative guided tour explanation method driven by a script engine, characterized in that, include: Based on the role positioning determined by the exhibition theme, multiple intelligent entities with virtual images are set up to provide visitors with human-like guidance and explanations; The user's basic personal information and the parameters of the selected intelligent agent are obtained by using mobile smart terminals, which serve as the profile features of each user. Based on the exhibition space layout and exhibit-related data, a large language model is used to generate a guided tour script in real time according to the characteristics of the audience profile; the intelligent agent selected by the audience is used as the protagonist in the script, and multiple other intelligent agents are matched to interact with the protagonist according to the script content; Based on the script's content and the audience's changing locations along the tour route, an intelligent agent running on a mobile smart terminal provides exhibit explanations and guided tours through voice interaction and video displays.

2. The method according to claim 1, characterized in that, The exhibit-related data is stored in a multimodal database and dynamically associated with the coordinates of the exhibits in the exhibition space. The multimodal database includes: a knowledge graph database storing exhibits and related information, a vector database embedding exhibit feature vectors, and a database related to the services provided by the exhibition hall.

3. The method according to claim 1, characterized in that, Based on the exhibition hall infrastructure and exhibition space layout, a full-scene spatial labeling system is established to map multiple location coordinates in real time, which forms the basis for spatial decision-making for guided tours. The location coordinates include at least: the exhibit booths, the real-time visitor flow, the exhibition hall service facilities, and the staff locations.

4. The method according to claim 1, characterized in that, The system provides set options in mobile smart terminals, and obtains personal basic information and intelligent agent parameters after the audience agrees and confirms. The personal basic information includes at least gender, age range, and interest preferences. The intelligent agent parameters include at least identity, personality, image, knowledge domain, emotional preferences, and voice characteristics. The audience's profile features are used as influencing factors to adjust the intelligent agent features and generate script content in real time according to preset rules.

5. The method according to claim 1, characterized in that, Using mobile smart terminals, questions raised by visitors are collected on the spot. After searching a multimodal database, answers related to the exhibits or exhibition hall services are provided in the form of voice, text, images, or videos.

6. The method according to claim 1, characterized in that, The steps involved in generating a guided tour script using a large language model include: S1. Multimodal context awareness and data initialization Construct the context vector of the current interaction : = { , , , }; in: To create a profile of the audience; Real-time spatial coordinates based on indoor positioning technology ; The ID of the currently triggered exhibit; It is a collection of routes that the audience has already traveled and knowledge points they have already acquired; S2, Instantiation and Personalization Modeling of Intelligent Agent Roles Retrieve associated sets of agents from a pre-created agent IP library. For each intelligent agent Perform vectorized modeling: ; ; ; ; ; ; in, Define a vector for the role's identity; For identity type; For the social relationship graph index, define the role and other agents ( The relationship between ) Define the function of the script; A personality parameter matrix; A specific coefficient used to reflect the identity or personality traits of a character; Affinity / Gentleness coefficient; Humor coefficient; This represents the professionalism coefficient. This is a dedicated knowledge boundary vector; The boundaries of time perception; Domain knowledge weights; Long / short term memory pointer; For digital image feature vectors; Basic model index; Features of clothing and props; For facial feature parameters; A library of habitual actions; Vectors representing interaction and expression patterns; For language style; Set the tone; For output length preference; The threshold for triggering the action; S3. Knowledge Graph-Based Retrieval Enhancement Generation and Collaborative Script Arrangement (1) Using the currently triggered exhibit as the anchor point, retrieve the related subgraphs in the knowledge graph that contain exhibit descriptions, historical facts, explanations, and relationships between related figures. And extract key knowledge point vectors ; (2) The context vector constructed in step S1 The role setting implemented in step S2 Related subgraphs The knowledge constraints in the text, and the key knowledge point vector extracted in the previous step (1) As a constraint, it is injected into the system prompt words of the large language model; (3) Based on multi-agent collaborative reasoning, the director-actor model is used for script generation; The script generation logic follows these constraints: ; in, This refers to the generated script; It refers to a large language model; This refers to the directing team; This refers to the actor level; During the script generation process, the director layer is used to plan the script structure and assign roles to various agents. The dialogue rounds; the actor layer is used by each agent based on the personality parameter matrix. Generate dialogues that match the character's persona. Using a large language model (LLM) based on a knowledge graph, historical consistency is verified to ensure that the deduced content does not contradict the key knowledge point vectors. The core facts; S4, Emotional Speech Synthesis and Multimodal Rendering The system analyzes emotional tags in the script, calculates real-time acoustic parameters, and converts the generated text script into an audio stream; it also introduces an emotion-acoustic mapping model into the dynamic acoustic parameter adjustment algorithm to achieve anthropomorphic performance. Based on the tension of the script's plot, a pre-trained language AI model based on the BERT architecture is used for real-time reasoning and calculation to obtain the current character's anxiety / emotion index. ; Calculate speech synthesis parameters and adjust acoustic parameters: ; in, This refers to the fundamental frequency of the final synthesized speech. This refers to the fundamental frequency of a character's voice in its normal state; This refers to the emotional intensity moderating coefficient; It is a specific coefficient for a character, used to reflect the character's identity or personality traits; Combined with a text-to-speech (TTS) engine, the final audio is synthesized and output with breathing, pauses, and intonation variations.

7. The method according to claim 1, characterized in that, Collect data on visitor dwell time at each point of interest, completion rate, interactive content during the tour, and evaluations at the end of the tour. Optimize these data in the subsequent generation of guided tour scripts based on pre-set rules.

8. The method according to claim 7, characterized in that, The method further includes closed-loop optimization of the guided tour script based on reinforcement learning: (1) Record the duration of the audience’s stay at each explanation point. Audio completion rate And the intention of interactive questions during the exhibition viewing process. ; (2) Establish a script quality rating function based on feedback : ; in: Information density; For emotional resonance; Potential for interaction; These are the weighting coefficients; (3) Fine-tuning optimization strategy: As a reward signal, the near-end policy optimization algorithm is used to optimize the prompt word template and the agent. The parameters are fine-tuned to make the next generated script more suitable for the preferences of this type of audience; (4) After multiple iterations, the automatic convergence of the profile features of various audiences is achieved until the optimal script structure is obtained.

9. A multi-agent collaborative guided tour system driven by a script engine, characterized in that, The system comprises a three-layer architecture: infrastructure layer, AI processing layer, and application layer. The infrastructure layer includes an intelligent space management system and an exhibit knowledge management system; the AI ​​processing layer includes an intelligent agent management and setting system and a script engine system; and the application layer includes a mobile tour guide module. The intelligent space management system is used to achieve sub-meter level precision indoor positioning inside the exhibition hall building and to conduct physical and spatial synchronous interaction based on the real-time location of visitors in the exhibition space. The exhibit knowledge management system is used to integrate exhibit data, graphic descriptions, high-definition images, 3D models and audio-visual materials, and to build a multimodal database that is dynamically associated with the coordinates of the exhibit location. This allows the script engine system to call up reliable information when using a large language model for reasoning, and to provide explanations or Q&A based on audience questions. The intelligent agent management and setting system creates an intelligent agent library based on the historical background of the exhibition theme, constructs intelligent agent roles that conform to the script's deductive logic, and sets the intelligent agent's identity, personality, knowledge domain, image, emotion, voice characteristics, long-term memory, short-term memory, and interaction style. The script engine system uses a structured multimodal database as its data foundation and combines parameters generated from audience profile features to generate guided tour scripts in real time using a large language model. The mobile tour guide module runs on the mobile smart terminal provided to each audience member. It obtains real-time location information and script performance progress through wireless positioning and communication. Through two-way human-computer interaction, the virtual image of the intelligent agent provides personalized, anthropomorphic explanation services to the audience.

10. An electronic device, characterized in that, It includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the multi-agent collaborative guided tour explanation method driven by a script engine as described in any one of claims 1 to 8.