Information processing method and apparatus, and device, medium and program product
By constructing a vector database and RAG architecture, and combining it with user-personalized features to generate extended explanation content, the problems of insufficient timeliness and matching degree of explanation content in existing technologies are solved, thereby improving the user's browsing experience.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- CHINA MOBILEHANGZHOUINFORMATION TECH CO LTD
- Filing Date
- 2025-12-19
- Publication Date
- 2026-07-02
AI Technical Summary
Existing technologies in tourist attractions, museums, or exhibitions suffer from insufficient timeliness and user relevance in interpretive content, leading to decreased user interest and a diminished visitor experience.
We construct a vector database with knowledge freshness characteristics, combine user personalized features and behavioral characteristics, dynamically generate extended explanation content through the RAG architecture, and use the Large Language Model (LLM) for personalized explanation.
This improved the timeliness and user relevance of the explanations, stimulated users' interest in viewing, and enhanced the tour experience.
Smart Images

Figure CN2025143724_02072026_PF_FP_ABST
Abstract
Description
Information processing methods, devices, equipment, media and program products
[0001] Cross-references to related applications
[0002] This disclosure claims priority to Chinese Patent Application No. 202411955718.6, filed in China on December 27, 2024, the entire contents of which are incorporated herein by reference. Technical Field
[0003] This disclosure pertains to the field of intelligent interaction, and particularly relates to an information processing method, apparatus, device, computer storage medium, and computer program product. Background Technology
[0004] With the booming development of the tourism industry and the increasing cultural needs of people, various tourist attractions, museums, and exhibitions have become important places for people to relax, enjoy themselves, and broaden their horizons. In these attractions, museums, and exhibitions, the quality of the interpretive content, as a key element in conveying information and enhancing the user experience, is crucial to improving the overall visitor experience.
[0005] Currently, at tourist attractions, museums, and exhibitions, guided tours are typically conducted using fixed locations and playing fixed audio recordings. While this method can partially replace human guides, the vividness and timeliness of the content cannot compare to that of live guides. Furthermore, with the rapid development of information technology, users' demands for timely and personalized information are increasing, leading to a decline in interest in unchanging audio narration. This phenomenon directly contributes to a decrease in the overall visitor experience. Summary of the Invention
[0006] This disclosure provides an information processing method, apparatus, device, computer storage medium, and computer program product that can generate extended explanatory content with high timeliness based on users' personalized needs, thereby enhancing users' browsing experience.
[0007] In a first aspect, embodiments of this disclosure provide an information processing method, including:
[0008] Based on the personalized tags of the target user and the tags of the first location, at least one candidate vector data is determined in the preset vector database. The candidate vector data includes media asset information related to the first location and the time information of the media asset information.
[0009] Based on the time information of media asset information in each candidate vector data, determine at least one target vector data from at least one candidate vector data;
[0010] Obtain the estimated dwell time of target users at the first location;
[0011] If the estimated dwell time is greater than or equal to the preset dwell time threshold, extended explanation content corresponding to the target user is generated based on the target vector data.
[0012] In an optional implementation, before determining at least one candidate vector data in a preset vector database based on the target user's personalized tags and the tags of a first location, the method further includes:
[0013] Obtain multiple media asset information related to the first location, as well as the time information of each media asset information;
[0014] Each media asset information is transformed into a vector to obtain the initial vector data corresponding to each media asset information;
[0015] For each piece of media asset information, the time information of the media asset information is added to the extended field of the initial vector data to obtain the vector data corresponding to each piece of media asset information.
[0016] The vector data corresponding to each media asset information is stored in the vector database.
[0017] In one optional implementation, based on the target vector data, extended explanatory content corresponding to the target user is generated, including:
[0018] Calculate the duration of the extended explanation based on the estimated length of stay;
[0019] The number of words for the extended explanation is determined based on the explanation duration and the preset explanation speed.
[0020] Based on the number of words in the explanation and at least one target vector data, generate input text parameters;
[0021] By processing the input text parameters using a large language model, extended explanation content is obtained.
[0022] In one optional implementation, the extended explanation duration is calculated based on the estimated stay duration, including:
[0023] Calculate the first product of the estimated dwell time and the proportion of extended explanation, where the proportion of extended explanation content is the percentage of the explanation content.
[0024] Calculate the first product and the second product of the congestion coefficient;
[0025] The second product will be used as the duration of the extended explanation.
[0026] In one optional implementation, the first location belongs to a preset location sequence, which includes multiple locations arranged in a preset order;
[0027] Obtain the estimated dwell time of the target user at the first location, including:
[0028] If the first location is the first location in the preset location sequence, or if the actual stay duration of the target user at the second location is greater than or equal to the average stay duration at the second location, the average stay duration at the first location shall be used as the estimated stay duration of the target user at the first location, wherein the second location is the location preceding the first location in the preset location sequence.
[0029] In one optional implementation, obtaining the estimated duration of the target user's stay at the first location further includes:
[0030] If the first location is not the first location in the preset location sequence, and the actual stay time of the target user at the second location is less than the average stay time at the second location, calculate the ratio of the actual stay time of the target user at the second location to the average stay time at the second location.
[0031] Based on the ratio and the average stay time at the first location, calculate the estimated stay time of the target user at the first location.
[0032] Secondly, embodiments of this disclosure provide an information processing apparatus, including:
[0033] The determination module is used to determine at least one candidate vector data in a preset vector database based on the personalized tags of the target user and the tags of the first location. The candidate vector data includes media asset information related to the first location and the time information of the media asset information.
[0034] The determination module is also used to determine at least one target vector data from at least one candidate vector data based on the time information of the media asset information in each candidate vector data;
[0035] The acquisition module is used to obtain the estimated dwell time of the target user at the first location;
[0036] The generation module is used to generate extended explanation content for the target user based on the target vector data, when the estimated stay time is greater than or equal to the preset stay time threshold.
[0037] Thirdly, embodiments of this disclosure provide an electronic device, the device including: a processor and a memory storing computer program instructions;
[0038] When a processor executes computer program instructions, it implements an information processing method as described in any of the optional embodiments of the first aspect of this disclosure.
[0039] Fourthly, embodiments of this disclosure provide a computer-readable storage medium storing computer program instructions, which, when executed by a processor, implement an information processing method as described in any optional embodiment of the first aspect of this disclosure.
[0040] Fifthly, embodiments of this disclosure provide a computer program product in which instructions, when executed by a processor of an electronic device, cause the electronic device to perform an information processing method as described in any optional embodiment of the first aspect of this disclosure.
[0041] The information processing method, apparatus, device, computer storage medium, and computer program product of this disclosure can determine at least one candidate vector data in a preset vector database based on the personalized tags of a target user and the tags of a first location. The candidate vector data includes media asset information related to the first location and the time information of the media asset information. This allows for the filtering of relevant media information about attractions or exhibition booths based on the user's personalized tags, thereby facilitating the addition of explanations tailored to the user's personalized needs on top of the basic explanation content. Subsequently, at least one target vector data can be determined from the at least one candidate vector data based on the time information of the media asset information in each candidate vector data. This allows for the selection of timely media asset information that matches the user's personalized needs, thereby improving the timeliness of the extended explanation content. Next, the estimated dwell time of the target user at the first location can be obtained. If the estimated dwell time is greater than or equal to a preset dwell time threshold, extended explanation content corresponding to the target user is generated based on the target vector data. This allows for the generation of extended explanation content only when the estimated visit time is sufficient. In this way, it is possible to combine user behavior characteristics and generate extended explanation content intelligently and efficiently based on users' personalized needs and the timeliness of information, thereby enhancing the user's tour experience. Attached Figure Description
[0042] To more clearly illustrate the technical solutions of the embodiments of this disclosure, the accompanying drawings used in the embodiments of this disclosure will be briefly introduced below. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0043] Figure 1 is a schematic flowchart of a search enhancement generation technique provided in an embodiment of this disclosure;
[0044] Figure 2 is a schematic diagram of a vector conversion process provided in another embodiment of this disclosure;
[0045] Figure 3 is a flowchart illustrating an information processing method provided in another embodiment of this disclosure;
[0046] Figure 4 is a schematic diagram of the structure of an information processing apparatus provided in another embodiment of the present disclosure;
[0047] Figure 5 is a schematic diagram of the structure of an information processing device provided in another embodiment of the present disclosure. Detailed Implementation
[0048] The features and exemplary embodiments of various aspects of this disclosure will now be described in detail. To make the objectives, technical solutions, and advantages of this disclosure clearer, the following detailed description, in conjunction with the accompanying drawings and specific embodiments, will provide a further detailed description. It should be understood that the specific embodiments described herein are intended only to explain this disclosure and not to limit it. For those skilled in the art, this disclosure can be implemented without some of these specific details. The following description of the embodiments is merely to provide a better understanding of this disclosure by illustrating examples.
[0049] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising..." does not exclude the presence of additional identical elements in the process, method, article, or apparatus that includes said element.
[0050] With the booming development of the tourism industry and the increasing cultural needs of people, various tourist attractions, museums, and exhibitions have become important places for people to relax, enjoy themselves, and broaden their horizons. In these attractions, museums, and exhibitions, the quality of the interpretive content, as a key element in conveying information and enhancing the user experience, is crucial to improving the overall visitor experience.
[0051] Currently, at tourist attractions, museums, and exhibitions, guided tours are typically conducted using fixed locations and playing fixed audio. While this method can partially replace human guides, the vividness and timeliness of the content cannot compare to that of a real guide. Consequently, users lack interest in the fixed audio content, and sometimes they leave before much of the tour is finished.
[0052] With the rise of Artificial Intelligence Generated Content (AIGC), Large Language Models (LLM) and Retrieval-augmented Generation (RAG) are increasingly being applied to guided tours and question-and-answer systems. LLM refers to a deep learning model trained on large amounts of text data, capable of generating natural language text or understanding its meaning. LLM can handle various natural language tasks, such as text classification, question answering, and dialogue, and is an important pathway to artificial intelligence. Currently, LLM employs a Transformer model architecture and pre-training objectives (such as Language Modeling) similar to smaller models, differing from smaller models in that it increases model size, training data, and computational resources. RAG is one of the most popular cutting-edge technologies in large-scale modeling. Specifically, RAG works through three key parts: retrieval, utilization, and generation. For example, as shown in Figure 1, in the retrieval phase, the system can retrieve relevant information from a collection of documents containing private data. In the utilization phase, the system uses LLM to utilize this retrieved information to populate text or answer questions. During the generation phase, the system uses LLM to generate the final text content based on the retrieved knowledge.
[0053] Currently, in guided tours and Q&A sessions, AIGC's technical approach mainly includes two methods: fine-tuning of LLM and RAG.
[0054] The relevant technology involves a virtual human-guided tour system based on a combination of large-scale models and vertical domain models. Users obtain explanations of relevant cultural relics knowledge and related knowledge extensions through dialogue with the virtual human. However, due to the high complexity and time consumption of training large-scale models and vertical domain models, this solution cannot solve the problem of the timeliness of guided tour knowledge in scenic spots and exhibition booths, and the cost of model training is also very high.
[0055] Related technologies also involve a dynamic adaptive question-answering system based on hierarchical structure and retrieval enhancement. This system combines RAG technology with a fine-tuned LLM to achieve dynamic learning and adaptation, as well as cross-domain knowledge integration, thereby enabling document retrieval and answering within an organization. This approach merges retrieval results from multiple domain knowledge bases and synthesizes question-answer content, using historical question-answer records for model retraining. However, this approach still cannot solve the problem of low knowledge timeliness. Furthermore, this approach also suffers from the problem of the generated content not synchronizing with the user's viewing progress.
[0056] Related technologies also involve a natural language large-screen application interaction and automatic explanation method based on a large model, used for human-computer interaction with large display screens through natural language. This method employs an LLM-based retrieval approach, converting the natural language spoken by the large-screen presenter into corresponding text information, and inferring this text information to associate it with fixed command statements for operating the display screen. These fixed command statements are then associated with page tags in the currently displayed content of the display screen, triggering mouse events that allow control of the display screen, thus enabling natural language control over the backend application system of the display screen. This solution primarily implements the integrated application of RAG retrieval enhancement generation and LLM in the water conservancy field, focusing on LLM fine-tuning to optimize the accuracy of questions and answers related to water conservancy knowledge. While it enhances the query information, it still fails to improve the timeliness of the question-and-answer content and does not consider the individual characteristics of the questioner or user.
[0057] In summary, the LLM fine-tuning approach suffers from long knowledge update cycles and high training costs, and it cannot resolve the LLM question-answering illusion problem. While combining RAG technology with LLM to build a private knowledge base question-answering system can mitigate the LLM question-answering illusion problem to some extent, the original RAG only addresses the accuracy of question-answering and reduces model training time and costs; it still cannot solve the problems of knowledge base timeliness and user matching.
[0058] In order to enhance the user's browsing experience, the inventor, after in-depth thinking, ingeniously proposed an information processing method, device, equipment, computer storage medium, and computer program product.
[0059] The information processing method of this disclosure can be implemented through a dynamic tour guide system based on RAG. Specifically, to address the issues of low timeliness and low user matching in automated voice narration during tours, a vector database with knowledge freshness characteristics is constructed to expand the query for timely narration materials. These materials are then combined with the user's personalized characteristics and behavioral features as prompt parameters and sent to the LLM to dynamically generate narration content for attractions and exhibitions. This not only extends the functionality based on the RAG architecture, reducing the time and hardware costs of fine-tuning the LLM model, but also improves the freshness and richness of the automated narration content for scenic spots, museums, and exhibitions, better stimulating user interest and enhancing the user experience of automated narration.
[0060] For example, a dynamic knowledge base can be built based on the RAG architecture. Besides storing basic vector information, the dynamic knowledge base will also record the time information of knowledge generation. The knowledge acquisition module will periodically pull media asset information related to attractions or exhibition booths from mainstream media, vectorize it using a vectorization model (e.g., embedding model), and then store it in a vector database. For example, media asset information can include information of any format, such as web pages, documents, etc. As shown in Figure 2, the vectorization model can vectorize the media asset information and extend the vectorized data with a date field to record the generation time of the media asset information.
[0061] For example, after the dynamic knowledge base is built, the normal viewing time for each attraction or booth can be measured. When a user is about to visit a corresponding attraction or booth, the knowledge base can be used to retrieve the top N pieces of knowledge related to the current attraction or booth based on the user's personalized tags such as occupation, interests, and place of residence. These top N pieces of knowledge are then sorted in reverse order based on their creation time, and the top M pieces of knowledge after the rearrangement are used as the original material for extended explanations. Here, N and M are both positive integers, and N is greater than M.
[0062] For example, after obtaining the original materials for the extended explanation, the user's viewing speed trend can be used to estimate the time the user will spend at the corresponding attraction or booth. The number of words in the explanation content can be calculated based on the obtained time spent. Combining the original materials for the extended explanation and the original materials for the basic explanation, a prompt parameter can be constructed to be sent to the LLM. The text content returned by the LLM can then be converted into speech and played.
[0063] In one embodiment, the dynamic navigation system may include a knowledge management module, a content generation module, and a content generation module.
[0064] The knowledge management module can periodically retrieve or manually input media asset information related to attractions or booths. After data cleaning and content auditing, the content is vectorized using a vector model, and the vectorized data is stored in a vector database.
[0065] The content generation module can record the baseline viewing time for each attraction or booth; calculate the word count of the explanation content based on the estimated ratio of the user's possible viewing time at a certain attraction or booth to the baseline viewing time; retrieve the top N relevant knowledge data from the vector database based on the user's personality and behavioral characteristics, reorder the top N knowledge data according to the generation time, and take the latest M data as the input parameter for the prompt corresponding to the extended explanation content; pass the prompt parameters of the fixed explanation content and the extended explanation content to the LLM to produce text content, and convert the text content into speech.
[0066] Voice playback terminals may include mobile loudspeakers, mobile phones, and augmented reality (AR) / virtual reality (VR) smart terminals, etc., for playing voice generated by the content generation module or driving virtual digital humans to give virtual explanations.
[0067] Based on the aforementioned RAG-based dynamic navigation method and system, compared to related technical solutions, it avoids extensive LLM fine-tuning training, resulting in lower time and hardware costs, and effectively avoids the LLM model illusion problem. Furthermore, by extracting knowledge base vectors based on user personality and behavioral features, adding date and time fields to the original vector database, and reordering the knowledge based on the time parameter after initial vector retrieval, the navigation content can be highly matched to the user and has good timeliness. This can stimulate user interest and enhance the viewing experience.
[0068] The information processing method provided by the present disclosure will be described below with reference to the accompanying drawings and specific embodiments and application scenarios. The information processing method provided by the present disclosure can be executed by an information processing device or a portion of the information processing device used to execute the information processing method. This disclosure uses the execution of the information processing method by an information processing device as an example to describe the information processing method provided by the present disclosure in detail.
[0069] The information processing method provided in the embodiments of this disclosure will be described in detail below with reference to Figure 2.
[0070] Figure 3 shows a schematic flowchart of an information processing method provided in an embodiment of this disclosure. As shown in the figure, the information processing method may specifically include the following steps S110 to S140.
[0071] S110, based on the target user's personalized tags and the first location's tags, determine at least one candidate vector data in a preset vector database. The candidate vector data includes media asset information related to the first location and the time information of the media asset information.
[0072] In step S110, the target user can include any user who needs to view the content. Personalized tags can include user's occupation, place of origin, interests, etc. For example, personalized tags can be generated based on information entered during user registration, or obtained from the operator's system based on the user's mobile phone number. The first location can include locations such as attractions or exhibition booths that the user is about to visit, determined based on the user's tour route. The vector database can include any database capable of storing vector data, such as Chroma, Elasticsearch, or LanceDB. Media information can include media information related to the first location. In step S110, personalized tags and fixed tags for the first location can be combined, and the top N candidate vector data with high relevance can be retrieved from the vector database. For example, N can be preset as a fixed value, and the top N vector data with the highest relevance can be selected as candidate vector data. For example, a relevance threshold can also be set, and the N vector data with relevance higher than the threshold can be selected as candidate vector data.
[0073] S120, Based on the time information of the media asset information in each candidate vector data, determine at least one target vector data from at least one candidate vector data.
[0074] For example, in step S120, the top M most up-to-date vector data can be selected as the target vector data based on the time information of the media asset information in each candidate vector data. Alternatively, the relevance and time information of the candidate vector data with the personalized tags can be combined to comprehensively calculate the score of each candidate vector data, and the target vector data can be determined based on the score.
[0075] S130, obtain the estimated dwell time of the target user at the first location.
[0076] S140: If the estimated dwell time is greater than or equal to the preset dwell time threshold, generate extended explanation content corresponding to the target user based on the target vector data.
[0077] Step S140 above can be executed via an LLM. Specifically, the target vector data can be sent to the LLM as a parameter to generate extended explanation content. For example, the prompt parameters for the extended explanation may include the target vector data, questions, and answer requirements.
[0078] The information processing method of this disclosure can determine at least one candidate vector data in a preset vector database based on the personalized tags of the target user and the tags of the first location. The candidate vector data includes media asset information related to the first location and the time information of the media asset information. This allows for the filtering of relevant media information about attractions or booths based on the user's personalized tags, thereby facilitating the addition of explanations tailored to the user's personalized needs on top of the basic explanation content. Subsequently, at least one target vector data can be determined from the at least one candidate vector data based on the time information of the media asset information in each candidate vector data. This allows for the selection of timely media asset information that matches the user's personalized needs, thereby improving the timeliness of the extended explanation content. Next, the estimated dwell time of the target user at the first location can be obtained. If the estimated dwell time is greater than or equal to a preset dwell time threshold, extended explanation content corresponding to the target user is generated based on the target vector data. This allows for the generation of extended explanation content only when the estimated tour time is sufficient. Thus, by combining user behavioral characteristics and considering the user's personalized needs and the timeliness of information, extended explanation content can be intelligently generated, thereby improving the user's tour experience.
[0079] In one embodiment, before determining at least one candidate vector data in a pre-defined vector database based on the target user's personalized tags and the tags of a first location, the method may further include:
[0080] Obtain multiple media asset information related to the first location, as well as the time information of each media asset.
[0081] Each media asset information is vectorized to obtain the initial vector data corresponding to each media asset information.
[0082] For each piece of media asset information, the time information of the media asset information is added to the extended field of the initial vector data to obtain the vector data corresponding to each piece of media asset information.
[0083] The vector data corresponding to each media asset information is stored in the vector database.
[0084] In the above implementation, the vector conversion process can be implemented as shown in Figure 2. The vectorization model can include an embedding model, such as m3e-large, BGE-large, or text2vec. The vectorization model can vectorize media asset information, and further extend the vectorized data with a date field to record the time information of the media asset information.
[0085] According to the above implementation method, media asset information related to scenic spots or exhibition booths can be acquired periodically, thereby continuously updating the vector data in the vector database and ensuring high timeliness of the vector data. According to the above implementation method, extended fields, such as date and time fields, are added to the initial vector data. This allows for filtering of vector data based on its time information, building upon the initial vector retrieval. This improves the timeliness of the extended explanation content, thereby enhancing the user experience.
[0086] In one embodiment, the content to be explained may include basic explanations and extended explanations.
[0087] After generating extended explanation content for the target user based on the target vector data, the method may also include:
[0088] In response to the target prompt, the system plays basic and extended explanations via voice. The target prompt is used to guide the target user to the first location.
[0089] In the above implementation, the target prompt information can be generated based on the location results of the target user. Specifically, the target user can be located using a fusion positioning system in the scenic area, exhibition, or museum. The positioning methods of the fusion positioning system may include, but are not limited to, Global Positioning System (GPS), base station information, wireless communication technology (WiFi) information, Bluetooth beacon, accelerometer sensor information, etc. In this way, the user's location and viewing progress can be monitored, thereby providing better guided tour services to users.
[0090] In one embodiment, based on target vector data, extended explanation content corresponding to the target user is generated, which may specifically include:
[0091] Calculate the duration of the first part of the extended explanation based on the estimated stay time.
[0092] The number of words in the first part of the extended explanation is determined based on the initial explanation duration and the preset explanation speed.
[0093] Based on the first number of words in the explanation and at least one target vector data, the first input text parameters are generated.
[0094] The first input text parameters are processed by a large language model to obtain extended explanation content.
[0095] In the above implementation, the first input text parameter can be a prompt parameter for extended explanation. For example, the prompt parameter for extended explanation may include target vector data, a question, and answer requirements. The question may include, for example, "Please summarize the relevant information for the first location, in words equal to the word count of the first explanation"; answer requirements may include, for example, "If you are unsure of the answer, you need to clarify," "Avoid mentioning that you obtained the knowledge from the target vector data," "Maintain consistency between the answer and the description in the target vector data," "Optimize the answer format using the lightweight markup language Markdown," and "Answer using the same language as the question," but are not limited to these.
[0096] According to the above implementation method, the first explanation duration of the extended explanation can be calculated based on the estimated dwell time. Based on the first explanation duration and a preset explanation speed, the number of words in the extended explanation is determined, and combined with the target vector, the extended explanation content is generated through a large language model. This ensures that the number of words in the extended explanation content is adapted to the user's dwell time, thereby reducing the probability of the user leaving the attraction or booth before the explanation is completed. This helps to better convey complete explanation content to the user, thus improving the user's visitor experience.
[0097] In one embodiment, after obtaining the estimated duration of the target user's stay at the first location, the method may further include:
[0098] The second explanation duration for the basic explanation is calculated based on the estimated stay duration.
[0099] The number of words in the second explanation of the basic explanation is determined based on the second explanation duration and the preset explanation speed.
[0100] Based on the second number of words in the explanation and the preset basic explanation materials, the second input text parameters for the basic explanation are generated.
[0101] The second input text parameters are processed using a large language model to obtain the basic explanation content.
[0102] According to the above implementation method, the second explanation duration of the basic explanation can be calculated based on the estimated dwell time. Based on the second explanation duration and the preset explanation speed, the word count of the basic explanation is determined, and combined with preset basic explanation materials, the basic explanation content is generated through a large language model. This allows for dynamic adjustment of the word count of the basic explanation content, ensuring it adapts to the user's dwell time, thereby reducing the probability of users leaving the attraction or booth before the explanation is completed. This helps to convey complete explanation content to users, thus enhancing their visitor experience.
[0103] In one embodiment, the extended explanation duration is calculated based on the estimated stay duration, which may specifically include:
[0104] Calculate the first product of the estimated dwell time and the proportion of extended explanation, where the proportion of extended explanation content is the percentage of the total explanation content.
[0105] Calculate the first product and the second product of the congestion coefficient.
[0106] The second product is used as the first explanation duration for the extended explanation.
[0107] In the above implementation, the congestion coefficient can be obtained through a heat map system for scenic spots, museums, or exhibitions. The heat map system utilizes big data technology to monitor the number and distribution of visitors in real time, obtaining the congestion coefficient, which reflects the population density at various attractions or booths during the same time period.
[0108] According to the above implementation method, the length of the extended explanation can be dynamically adjusted based on the real-time congestion level of the attraction or booth. This allows for more accurate estimation of the extended explanation's duration, ensuring the length of the explanation matches the user's dwell time, thus reducing the probability of users leaving the attraction or booth before the explanation is complete. This improves the delivery of complete explanations to users, thereby enhancing their visitor experience.
[0109] In one embodiment, the second explanation duration for the basic explanation is calculated based on the estimated stay duration, which may specifically include:
[0110] Calculate the third product of the estimated stay duration and the percentage of basic explanation, where the percentage of basic explanation content is the proportion of the explanation content within the total explanation content.
[0111] Calculate the third product and the fourth product of the congestion coefficient.
[0112] The fourth product is used as the second explanation duration for the extended explanation.
[0113] According to the above implementation method, the length of the basic explanation can be dynamically adjusted based on the real-time congestion level of the attraction or booth. This allows for more accurate estimation of the basic explanation time, ensuring that the number of words in the explanation matches the user's dwell time, thereby reducing the probability of users leaving the attraction or booth before the explanation is completed. This helps to convey complete explanation content to users, thus enhancing their visitor experience.
[0114] In one embodiment, after obtaining the estimated duration of the target user's stay at the first location, the method may further include:
[0115] If the estimated stay duration is less than the preset stay duration threshold, the estimated stay duration will be used as the second explanation duration.
[0116] The number of words in the second explanation of the basic explanation is determined based on the second explanation duration and the preset explanation speed.
[0117] Based on the second number of words in the explanation and the preset basic explanation materials, the second input text parameters for the basic explanation are generated.
[0118] The second input text parameters are processed using a large language model to obtain the basic explanation content.
[0119] In response to the target prompt, the system plays a basic explanation via voice, with the target prompt indicating to the user that they have arrived at the first location.
[0120] According to the above implementation method, when the estimated dwell time is short, only basic explanatory content can be generated and played. This helps to shorten the explanation time and prioritizes the integrity of the explanation content.
[0121] In one embodiment, the method may further include:
[0122] If the estimated dwell time is less than the preset dwell time threshold, the system will time the user's dwell time in response to the target prompt message.
[0123] If the user's dwell time reaches a preset threshold, the number of words in the extended explanation is calculated according to the preset third explanation duration and preset explanation speed.
[0124] Based on the number of words in the third explanation and at least one target vector data, a third input text parameter is generated.
[0125] By processing the third input text parameters using a large language model, additional extended explanation content is obtained.
[0126] After the basic explanation has finished playing and the user is at the first location, the audio playback will add extended explanation content.
[0127] In the above implementation, the preset threshold is less than the stay duration threshold. The value of the preset threshold can be selected according to actual needs and is not limited here. As an example, the preset threshold can be half of the stay duration threshold. The third explanation duration can be selected according to actual needs and is not limited here. As an example, the third explanation duration can be equal to the stay duration threshold.
[0128] According to the above implementation method, the user's actual dwell time can be determined. If the user's actual dwell time reaches a certain threshold, it can be assumed that the user may spend a relatively long time at the current attraction or booth. In this case, generating extended explanation content according to the preset explanation duration can reduce the probability that the explanation content has finished playing before the user leaves the current attraction. In this way, richer explanation content can be provided to users, thereby improving their tour experience.
[0129] In one embodiment, the average viewing time and the minimum normal viewing time for each attraction or booth can be pre-measured and calculated by multiple people on-site. The average viewing time for each attraction or booth is used as the average viewing time for the corresponding attraction or booth, and the minimum viewing time for each attraction or booth is used as the viewing time threshold for the corresponding attraction or booth.
[0130] In one embodiment, the first location belongs to a preset location sequence, which includes multiple locations arranged in a preset order.
[0131] Obtain the estimated duration of the target user's stay at the first location, which may specifically include:
[0132] If the first location is the first location in the preset location sequence, or if the actual stay duration of the target user at the second location is greater than or equal to the average stay duration at the second location, the average stay duration at the first location shall be used as the estimated stay duration of the target user at the first location, wherein the second location is the location preceding the first location in the preset location sequence.
[0133] In the above embodiments, the preset location sequence may include multiple attractions or booths arranged according to the user's viewing order.
[0134] According to the above implementation method, when the first location is the first location or the user's stay time at the location preceding the first location is relatively long, the average stay time at the first location is used as the estimated stay time of the target user at the first location. In this way, the user's stay time at the first location can be accurately estimated based on the average behavior of most users, which is beneficial for generating appropriate length explanation content for that user.
[0135] In one embodiment, obtaining the estimated duration of the target user's stay at the first location may further include:
[0136] If the first location is not the first location in the preset location sequence, and the actual stay time of the target user at the second location is less than the average stay time at the second location, calculate the ratio of the actual stay time of the target user at the second location to the average stay time at the second location.
[0137] Based on the ratio and the average stay time at the first location, calculate the estimated stay time of the target user at the first location.
[0138] According to the above implementation method, when the first location is not the first location or the user's stay time at the previous location is less than the average stay time, the estimated stay time is estimated based on the user's actual stay time at the previous location. In this way, estimating the user's stay time at the attraction or booth based on their actual behavior improves the accuracy of the estimated stay time, thereby facilitating the generation of appropriately sized narration content.
[0139] For ease of explanation, a specific example is provided based on the above embodiments. It should be noted that the following example is only for explaining the embodiments of this disclosure and does not constitute a limitation thereof.
[0140] In one example, the information processing method may include the following steps S210 to S240.
[0141] S210: Build a private knowledge base, periodically retrieve or input media asset information related to various attractions and booths, and vectorize the media asset information as shown in Figure 2. Further extend each vectorized data with a "date" field to record the generation time of the media asset information, resulting in vector data. Store the basic data and extended information in the vector database.
[0142] S220, through on-site measurements and calculations by multiple people, obtained the average viewing time and the minimum normal viewing time for each attraction or booth.
[0143] S230: The content generation module combines the user's personalized tags with the tags of the scenic spot or booth, and then queries the vector database to retrieve the top N vector data with a relevance greater than the minimum threshold k. After sorting these vectors in descending order by the date field, the top 3 target vector data are used as the parameters for the prompt. If there are fewer than 3 vector data with a relevance greater than the minimum threshold k, then all vector data with a relevance greater than the minimum threshold k are used as the parameters for the prompt. If no target vector data meeting the conditions is found, no further explanation is provided.
[0144] S240 confirms the user's location information through the integrated positioning system of the scenic spot, exhibition, or museum. If the user is at the first attraction or booth, or if the user's viewing time at the previous attraction has exceeded the average viewing time, the average viewing time T of the current attraction or booth is used directly. n This serves as a time limit for generating the explanatory content. If the user is not currently at the first attraction or booth, the time limit is based on the user's actual viewing time at the previous attraction, t. n-1 Average viewing time T compared to the previous attractionn-1 The proportion is used to obtain the estimated duration of user stay at the current attraction or booth. n Based on the estimated duration of stay A n Calculate the duration B of the extended explanation. n And the duration of the basic explanation C n Then calculate the number of words P in the extended explanation. n The word count of the basic explanation (Q) n See Equations 1 through 5 below for details. P n =B n *S-type 4 Q n =C n *S-type 5
[0145] In Equations 1 to 5, R can represent the proportion of the pre-set extended explanation in the entire explanation content. D n This can represent the congestion coefficient of the current attraction or booth. S can represent the playback speed of the audio, measured in words per minute. E can represent the number of target vector data points found in step S230. M n It can indicate the minimum stay duration at the current attraction or booth.
[0146] S250, the content generation module generates content based on the extended explanation word count B obtained in the previous step. n Basic Explanation (Word Count C) n Construct prompt parameters for both extended and basic explanations, and send them to LLM to generate the basic and extended explanation content.
[0147] S260, the content generation module merges the basic explanatory text and extended explanatory text generated by LLM and converts them into speech. Then, the speech and text content are returned to the speech playback terminal, which plays the generated explanatory speech through a loudspeaker, or drives the digital human to play the explanatory speech through the generated text content.
[0148] If, according to the prediction result of step S240, the current attraction or booth has an estimated stay time that is less than the minimum stay time M, then... n If no audio is played, it's necessary to determine the user's actual dwell time. If the user's dwell time at the current attraction or booth exceeds M... n If / 2 has not yet left, then the minimum stay time M will apply. n The extended explanatory text content is generated, converted into speech, and then played on a speech playback terminal.
[0149] Based on the same inventive concept, this disclosure also provides an information processing apparatus.
[0150] As shown in Figure 4, the information processing device 200 may include a determining module 201, a first acquiring module 202, and a generating module 203.
[0151] The determination module 201 is used to determine at least one candidate vector data in a preset vector database based on the personalized tags of the target user and the tags of the first location. The candidate vector data includes media asset information related to the first location and time information of the media asset information.
[0152] The determining module 201 is further configured to determine at least one target vector data from at least one candidate vector data based on the time information of the media asset information in each candidate vector data.
[0153] The first acquisition module 202 is used to acquire the estimated duration of the target user's stay at the first location.
[0154] The generation module 203 is used to generate extended explanation content corresponding to the target user based on the target vector data when the estimated stay time is greater than or equal to the preset stay time threshold.
[0155] The information processing apparatus of this embodiment can determine at least one candidate vector data in a preset vector database based on the personalized tags of the target user and the tags of the first location. The candidate vector data includes media asset information related to the first location and the time information of the media asset information. This allows for the filtering of relevant media information about attractions or exhibition booths based on the user's personalized tags, thereby facilitating the addition of explanations tailored to the user's personalized needs on top of the basic explanation content. Subsequently, at least one target vector data can be determined from the at least one candidate vector data based on the time information of the media asset information in each candidate vector data. This allows for the selection of timely media asset information that matches the user's personalized needs, thereby improving the timeliness of the extended explanation content. Next, the estimated dwell time of the target user at the first location can be obtained. If the estimated dwell time is greater than or equal to a preset dwell time threshold, extended explanation content corresponding to the target user is generated based on the target vector data. This allows for the generation of extended explanation content only when the estimated tour time is sufficient. Thus, by combining user behavioral characteristics and considering the user's personalized needs and the timeliness of information, extended explanation content can be intelligently generated, thereby improving the user's tour experience.
[0156] In one embodiment, the apparatus may further include:
[0157] The second acquisition module is used to acquire multiple media asset information related to the first location and the time information of each media asset information before determining at least one candidate vector data in a preset vector database based on the personalized tags of the target user and the tags of the first location.
[0158] The conversion module is used to perform vector conversion on each piece of media asset information to obtain the initial vector data corresponding to each piece of media asset information.
[0159] The module adds time information of the media asset information to the extended field of the initial vector data corresponding to each media asset information, thereby obtaining the vector data corresponding to each media asset information.
[0160] The storage module is used to store the vector data corresponding to each media asset information into the vector database.
[0161] In one embodiment, the generation module is used to generate extended explanatory content corresponding to the target user based on the target vector data, which may specifically include:
[0162] The calculation submodule is used to calculate the duration of the extended explanation based on the estimated stay time.
[0163] The "Determine" submodule is used to determine the number of words in the extended explanation based on the explanation duration and the preset explanation speed.
[0164] The generation submodule is used to generate input text parameters based on the number of words in the explanation and at least one target vector data.
[0165] The processing submodule is used to process the input text parameters through a large language model to obtain extended explanation content.
[0166] In one embodiment, the calculation submodule is used to calculate the duration of the extended explanation based on the estimated stay duration, which may specifically include:
[0167] The calculation unit is used to calculate the first product of the estimated stay time and the proportion of extended explanation, where the proportion of extended explanation content is the percentage of the explanation content.
[0168] The calculation unit is also used to calculate the second product of the first product and the congestion coefficient.
[0169] The unit is defined to determine the duration of the explanation for the second product as an extension.
[0170] In one embodiment, the first location belongs to a preset location sequence, which includes multiple locations arranged in a preset order.
[0171] The first acquisition module can be specifically used for:
[0172] If the first location is the first location in the preset location sequence, or if the actual stay duration of the target user at the second location is greater than or equal to the average stay duration at the second location, the average stay duration at the first location shall be used as the estimated stay duration of the target user at the first location, wherein the second location is the location preceding the first location in the preset location sequence.
[0173] In one embodiment, the first acquisition module can be specifically used for:
[0174] If the first location is not the first location in the preset location sequence, and the actual stay time of the target user at the second location is less than the average stay time at the second location, calculate the ratio of the actual stay time of the target user at the second location to the average stay time at the second location.
[0175] Based on the ratio and the average stay time at the first location, calculate the estimated stay time of the target user at the first location.
[0176] The information processing apparatus provided in this disclosure can implement the various processes implemented in the method embodiment of FIG3. To avoid repetition, it will not be described again here.
[0177] Figure 5 shows a schematic diagram of the hardware structure of the information processing device provided in an embodiment of this disclosure.
[0178] The information processing device may include a processor 301 and a memory 302 storing computer program instructions.
[0179] Specifically, the processor 301 may include a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits that can be configured to implement the embodiments of this disclosure.
[0180] Memory 302 may include mass storage for data or instructions. For example, and not limitingly, memory 302 may include a hard disk drive (HDD), floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or Universal Serial Bus (USB) drive, or a combination of two or more of these. Where appropriate, memory 302 may include removable or non-removable (or fixed) media. Where appropriate, memory 302 may be internal or external to the integrated gateway disaster recovery device. In a particular embodiment, memory 302 is non-volatile solid-state memory.
[0181] Memory may include read-only memory (ROM), random access memory (RAM), disk storage media devices, optical storage media devices, flash memory devices, and electrical, optical, or other physical / tangible memory storage devices. Therefore, typically, memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software including computer-executable instructions, and when the software is executed (e.g., by one or more processors), it is operable to perform the operations described with reference to the method according to one aspect of this disclosure.
[0182] The processor 301 implements any of the information processing methods described in the above embodiments by reading and executing computer program instructions stored in the memory 302.
[0183] As an example, the information processing device may also include a communication interface 303 and a bus 310. As shown in Figure 5, the processor 301, memory 302, and communication interface 303 are connected via the bus 310 and communicate with each other.
[0184] The communication interface 303 is mainly used to realize communication between various modules, devices, units and / or equipment in the embodiments of this disclosure.
[0185] Bus 310 includes hardware, software, or both, that couples components of an online data flow metering device together. For example, and not limitingly, the bus may include an Accelerated Graphical Port (AGP) or other graphics bus, an Extended Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture Bus (ISA) bus, an Infinite Bandwidth Interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial ATA (SATA) bus, a Video Electronics Standards Association (VLB) bus, or other suitable buses, or a combination of two or more of these. Where appropriate, bus 310 may include one or more buses. Although specific buses are described and illustrated in the embodiments of this disclosure, this disclosure contemplates any suitable bus or interconnect.
[0186] The information processing device can execute the information processing method in the embodiments of this disclosure, thereby realizing the information processing method and apparatus described in conjunction with FIG3 and FIG4.
[0187] Furthermore, in conjunction with the information processing methods in the above embodiments, this disclosure can provide a computer storage medium for implementation. The computer storage medium stores computer program instructions; when these computer program instructions are executed by a processor, they implement any of the information processing methods in the above embodiments.
[0188] In addition, in conjunction with the information processing methods in the above embodiments, this disclosure provides a computer program product. When the instructions in the computer program product are executed by the processor of an electronic device, the electronic device performs any of the information processing methods in the above embodiments.
[0189] It should be clarified that this disclosure is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of this disclosure is not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of this disclosure.
[0190] The functional blocks shown in the above-described block diagram can be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, they can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this disclosure are programs or code segments used to perform desired tasks. Programs or code segments can be stored on a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried on a carrier wave. "Machine-readable medium" can include any medium capable of storing or transmitting information. Examples of machine-readable media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable read-only memory (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, etc. Code segments can be downloaded via computer networks such as the Internet, intranets, etc.
[0191] It should also be noted that the exemplary embodiments mentioned in this disclosure describe methods or systems based on a series of steps or apparatus. However, this disclosure is not limited to the order of the above steps; that is, the steps can be performed in the order mentioned in the embodiments, or in a different order, or several steps can be performed simultaneously.
[0192] The aspects of this disclosure have been described above with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block in the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that these instructions, executable via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions / actions specified in one or more blocks of the flowchart illustrations and / or block diagrams. Such a processor can be, but is not limited to, a general-purpose processor, a special-purpose processor, a special application processor, or a field-programmable logic circuit. It is also understood that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can also be implemented by special-purpose hardware performing the specified functions or actions, or can be implemented by a combination of special-purpose hardware and computer instructions.
[0193] The above description is merely a specific embodiment of this disclosure. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, modules, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here. It should be understood that the protection scope of this disclosure is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this disclosure, and these modifications or substitutions should all be covered within the protection scope of this disclosure.
Claims
1. An information processing method, comprising: Based on the personalized tags of the target user and the tags of the first location, at least one candidate vector data is determined in a preset vector database. The candidate vector data includes media asset information related to the first location and the time information of the media asset information. Based on the time information of the media asset information in each candidate vector data, at least one target vector data is determined from the at least one candidate vector data; Obtain the estimated duration of the target user's stay at the first location; If the estimated dwell time is greater than or equal to a preset dwell time threshold, extended explanation content corresponding to the target user is generated based on the target vector data.
2. The method of claim 1, wherein, Before determining at least one candidate vector data in a preset vector database based on the personalized tags of the target user and the tags of the first location, the method further includes: Obtain multiple media asset information related to the first location and the time information of each media asset information; Each media asset information is transformed into a vector to obtain the initial vector data corresponding to each media asset information; For each piece of media asset information, the time information of the media asset information is added to the extended field of the initial vector data to obtain the vector data corresponding to each piece of media asset information. The vector data corresponding to each piece of media asset information is stored in the vector database.
3. The method according to claim 1, wherein, The step of generating extended explanation content corresponding to the target user based on the target vector data includes: Based on the estimated duration of stay, calculate the duration of the extended explanation; The number of words in the extended explanation is determined based on the explanation duration and the preset explanation speed. Based on the number of words in the explanation and the at least one target vector data, generate input text parameters; The input text parameters are processed using a large language model to obtain the extended explanation content.
4. The method according to claim 3, wherein, The step of calculating the extended explanation duration based on the estimated stay duration includes: Calculate the first product of the estimated dwell time and the proportion of extended explanation, where the proportion of extended explanation content is the percentage of the explanation content; Calculate the first product and the second product of the congestion coefficient; The second product is used as the duration of the extended explanation.
5. The method according to any one of claims 1-4, wherein, The first location belongs to a preset location sequence, which includes multiple locations arranged in a preset order; The step of obtaining the estimated dwell time of the target user at the first location includes: If the first location is the first location in the preset location sequence, or if the actual stay duration of the target user at the second location is greater than or equal to the average stay duration at the second location, the average stay duration at the first location shall be used as the estimated stay duration of the target user at the first location, wherein the second location is the location preceding the first location in the preset location sequence.
6. The method according to claim 5, wherein, The step of obtaining the estimated dwell time of the target user at the first location also includes: If the first location is not the first location in the preset location sequence, and the actual stay time of the target user at the second location is less than the average stay time at the second location, calculate the ratio of the actual stay time of the target user at the second location to the average stay time at the second location. Based on the ratio and the average stay time at the first location, the estimated stay time of the target user at the first location is calculated.
7. An information processing apparatus, comprising: The determination module is used to determine at least one candidate vector data in a preset vector database based on the personalized tags of the target user and the tags of the first location. The candidate vector data includes media asset information related to the first location and the time information of the media asset information. The determining module is further configured to determine at least one target vector data from the at least one candidate vector data based on the time information of the media asset information in each candidate vector data; The acquisition module is used to obtain the estimated dwell time of the target user at the first location; The generation module is used to generate extended explanation content corresponding to the target user based on the target vector data when the estimated stay time is greater than or equal to a preset stay time threshold.
8. An electronic device, the device comprising: Processor and memory storing computer program instructions; When the processor executes the computer program instructions, it implements the information processing method as described in any one of claims 1-6.
9. A computer-readable storage medium storing computer program instructions that, when executed by a processor, implement the information processing method as described in any one of claims 1-6.
10. A computer program product, wherein instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the information processing method as described in any one of claims 1-6.