Virtual object explanation method, electronic device, storage medium, and program product

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By establishing communication connections, displaying video lists, playing interactive videos, and providing question buttons during the virtual object explanation process, the problem of user interest and personalized explanation is solved, achieving an efficient and flexible learning experience and enhancing user participation and satisfaction.

CN116954786BActive Publication Date: 2026-06-16MOFA (SHANGHAI) INFORMATION TECH CO LTD +1

View PDF 4 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: MOFA (SHANGHAI) INFORMATION TECH CO LTD
Filing Date: 2023-05-19
Publication Date: 2026-06-16

Application Information

Patent Timeline

19 May 2023

Application

16 Jun 2026

Publication

CN116954786B

IPC: G06F9/451; G06F3/0482; G06F16/78; G06N3/08; G06F18/214; G06F18/22

CPC: G06F9/451; G06F3/0482; G06F16/78; G06N3/08; G06F18/214; G06F18/22

AI Tagging

Application Domain

Metadata video data retrievalExecution for user interfaces

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A video matching method, apparatus, device, medium and program product
CN122220569AMetadata video data retrievalNeural learning methods
Personalized media guide for offline media devices
US12659547B2Metadata video data retrievalSelective content distribution
Response output system and response output device
JP2026096875AMetadata video data retrieval
Method for generating multimedia title, method and apparatus for pushing multimedia title
CN120832882BMetadata video data retrievalBiological models
Electronic device and operation method thereof
US20260170314A1Video data queryingVideo data browsing/visualisation

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

In the existing process of explaining virtual objects, issues such as how to maintain user interest and how to provide personalized explanations have not been effectively resolved.

⚗Method used

By receiving access requests from terminal devices, a communication connection is established, a list of explanatory videos for virtual objects is displayed, the selected explanatory video is played, and a question button is displayed after the video ends to guide user interaction. The question and answer database is used to match answer videos, provide floating or pop-up reminders, and collect user feedback to optimize the explanation.

🎯Benefits of technology

It enhances users' learning interest and engagement, provides a personalized interactive experience, improves learning effectiveness and user satisfaction, and increases learning efficiency and flexibility.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116954786B_ABST

Patent Text Reader

Abstract

The application provides a virtual service personnel interaction method, an electronic device, a computer readable storage medium and a computer program product. The method comprises: receiving an access request from a first terminal device, establishing a communication connection between the first terminal device and a target server; using the first terminal device to display a list of explanation videos of a virtual object; when a selection operation for one of the explanation videos is received, using the first terminal device to play the selected explanation video; when the last explanation video in the list of explanation videos ends, using the first terminal device to play a first interaction video and display a question button, and the virtual object guides the user to click the question button and ask a question. The application provides an interaction video related to the explanation knowledge content, so that the user can more clearly understand the information he wants to obtain, thereby improving the user satisfaction.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the technical fields of virtual objects and artificial intelligence, and in particular to methods for explaining virtual objects, electronic devices, computer-readable storage media, and computer program products. Background Technology

[0002] Virtual objects include virtual objects, virtual animals, and virtual cartoon characters. Virtual objects, in particular, are anthropomorphic figures constructed using CG technology and operating in code form, possessing various interactive capabilities such as language communication, facial expressions, and action demonstrations. Virtual object technology has rapidly developed in the field of artificial intelligence and has been applied in many technological areas, including film, media, games, finance, cultural tourism, education, and healthcare.

[0003] Existing methods for explaining virtual objects still present some problems and challenges, such as how to maintain user interest and how to provide personalized explanations. Therefore, this application provides a method for explaining virtual objects, an electronic device, a computer-readable storage medium, and a computer program product to improve upon existing technologies. Summary of the Invention

[0004] The purpose of this application is to provide a method for explaining virtual objects, electronic devices, computer-readable storage media, and computer program products that match explanation content with information received during the interaction process and provide relevant interactive videos so that users can more clearly understand the information they want to obtain, thereby improving user satisfaction.

[0005] The objective of this application is achieved through the following technical solution:

[0006] Firstly, this application provides a method for explaining virtual objects, the method comprising:

[0007] Receive an access request from a first terminal device, establish a communication connection between the first terminal device and a target server, the target server being used to provide virtual object explanation functions;

[0008] The first terminal device is used to display a list of explanatory videos for virtual objects. The list of explanatory videos corresponds to multiple explanatory videos, and each explanatory video corresponds to different explanatory content.

[0009] When a selection operation for one of the explanatory videos is received, the selected explanatory video is played using the first terminal device;

[0010] When the last instructional video in the instructional video list finishes playing, the first terminal device plays the first interactive video and displays a question button. In the first interactive video, the virtual object guides the user to click the question button and ask a question.

[0011] The beneficial effects of this technical solution are as follows: by establishing a communication connection between the first terminal device and the target server, users can directly watch the list of explanatory videos for virtual objects on their own devices and conveniently select and play the desired videos; by displaying the list of explanatory videos for virtual objects, which corresponds to multiple explanatory videos, users can select the videos they are interested in to watch; when the explanatory video ends, an interactive segment will be entered, and users can interact with the virtual object through a question button to further understand the information.

[0012] In summary, this technical solution provides a more personalized and efficient online interactive experience. By offering convenient and efficient interactive methods, it effectively improves users' understanding and mastery of knowledge. Furthermore, the online format of this virtual object explanation method allows users to watch videos anytime, anywhere, increasing the flexibility and convenience of learning and demonstrating promising application prospects.

[0013] In some possible implementations, before playing the first interactive video using the first terminal device, the method further includes:

[0014] The first terminal device is used to remind the user to complete the explanation in the form of a floating layer or a pop-up window.

[0015] The beneficial effects of this technical solution are as follows: By using a floating layer or pop-up window to remind users to complete the explanation, users are reminded to learn the first interactive video, which increases user participation and willingness to learn; it helps users better grasp the knowledge points and improves learning efficiency; at the same time, it reminds users that the explanation has been completed, making it convenient for them to carry out the next learning plan, learn other courses, or review the knowledge points and do targeted exercises, thereby improving the user learning experience; it reduces the learning cost, as users do not need to spend extra time and energy to pay attention to whether the explanation is needed; and it prevents users from forgetting to learn the previous content, which would lead to not understanding the subsequent interactive videos, thus improving the smoothness and experience of user learning.

[0016] In summary, this technical solution, by using an overlay or pop-up to remind users to learn the first interactive video after the last explanatory video has finished playing, can improve user participation and willingness to learn. The overlay or pop-up prompts can directly attract users' attention, effectively reminding them to proceed to the next learning interaction, increasing users' understanding and participation in the learning content, and improving learning effectiveness.

[0017] In some possible implementations, during the first interactive video, the virtual object prompts the user that the explanation has ended, guiding the user to click the question button and ask a question.

[0018] The beneficial effects of this technical solution are as follows: By using virtual objects in the first interactive video to inform users that the explanation has ended and guiding them to click the question button to ask questions, it promotes interactive communication between users and the learning content; it effectively enhances user participation and learning interest, making the learning process more interactive and vivid; providing a question button helps users obtain answers and solve problems more easily, improving both learning effectiveness and user satisfaction with the learning content; by guiding users to participate in interaction, it establishes a good user experience, enhancing users' trust and satisfaction with the course or product. In summary, the learning experience provided by this technical solution can help users better grasp the learning content, improve user participation and learning interest. Through virtual object prompts, guiding users to participate in interaction and ask questions enhances user participation and interactivity, thereby improving teaching effectiveness and increasing user learning outcomes.

[0019] In some possible implementations, the method further includes:

[0020] When a click operation is received on the question button, the first terminal device displays a question interface, which includes one or more of the following: a question input box, a voice capture button, and an image upload control.

[0021] The question interface is used to receive the user's question information, which may be text information, voice information, or image information.

[0022] The system detects whether there is an answer video in the question-and-answer database corresponding to the virtual object that matches the question information. The question-and-answer database stores multiple answer videos.

[0023] When a video answer that matches the question information exists in the question-and-answer database, the matching video answer is recorded as the first video answer.

[0024] The first terminal device is used to play the first answer video, in which the virtual object explains the content corresponding to the question information.

[0025] The beneficial effects of this technical solution are as follows: It provides a convenient way to ask questions via a virtual object using one or more of the following methods: a question input box, a voice capture button, and an image upload control. Users can ask questions through text, voice, or images, and the solution automatically matches answer videos for quick access to the required information. Furthermore, it offers an interactive learning experience, allowing users to gain a deeper understanding of the content by watching answer videos, significantly improving user satisfaction and experience. It also enhances the intelligence of the virtual object, providing better service. The question-and-answer library corresponding to the virtual object stores multiple answer videos, offering more comprehensive and accurate information to meet the needs of different users. When playing the first answer video, the virtual object explains the corresponding content based on the question, providing users with more targeted and practical knowledge and helping them improve their learning effectiveness. In conclusion, this solution provides users with a better learning experience and results, helps them solve problems more quickly and accurately, and improves their learning enthusiasm and initiative, demonstrating significant application value and promotional significance.

[0026] In some possible implementations, the process of detecting whether a video answer matching the question information exists in the question-and-answer database corresponding to the virtual object includes:

[0027] Using the semantic extraction model corresponding to the question information, semantic information is extracted from the question information;

[0028] The relevance of the semantic information to each answer video in the question-and-answer database is obtained respectively.

[0029] When the maximum relevance value is greater than the preset relevance value, it is confirmed that there is an answer video in the question-and-answer database that matches the question information;

[0030] When the maximum relevance value is not greater than the preset relevance value, it is confirmed that there is no answer video in the question-and-answer database that matches the question information.

[0031] The beneficial effects of this technical solution are as follows: by using a semantic extraction model to extract semantic information from the input information, it is possible to more accurately understand the user's needs and effectively improve the accuracy of virtual object explanation videos; by using a semantic extraction model to extract semantic information from the question information, it is possible to more accurately match the corresponding answer video; by setting a preset relevance, it is possible to effectively avoid misjudgment and incorrect matching, ensuring that the answer video with the highest relevance is provided to the user as the first answer video; this technical solution can also perform statistical analysis on missing questions in the question-and-answer database, which helps to optimize and improve the question-and-answer database, and improve the stability and usability of the question-and-answer database.

[0032] In summary, by employing a semantic extraction model, calculating relevance, and setting preset relevance, this technical solution can more accurately determine users' question-and-answer needs, providing users with more precise and personalized answer videos. This approach is highly practical and efficient in explaining virtual objects, helping to further improve user experience and satisfaction. At the same time, it can automatically process large amounts of question information and answer videos without manual intervention, thus improving work efficiency.

[0033] In some possible implementations, obtaining the relevance of the semantic information to each answer video in the question-answering database includes:

[0034] For each of the aforementioned answer videos, the following processing is performed:

[0035] Obtain the tag information corresponding to the answer video;

[0036] The semantic information and the tag information are input into a relevance model to obtain the relevance between the semantic information and the tag information, which is used as the relevance between the semantic information and the answer video.

[0037] The beneficial effects of this technical solution are as follows: It utilizes a relevance model to calculate the relevance between semantic information and answer videos, enabling more accurate matching of user questions and answers, thus improving the accuracy of answer matching; by employing a relevance model to calculate the relevance between semantic information and tag information, it can automatically match the most suitable answer video based on the content of the user's question, effectively improving the level of intelligence and user-friendliness; the relevance calculation method enables rapid response to user questions, improving the user experience and allowing users to more conveniently obtain the information they need; by integrating semantic information and tag information, this technical solution achieves semantic-based search functionality, better conforming to user habits and improving search hit rate. In summary, adopting this technical solution can improve the accuracy of answer matching, enhance the level of intelligence and user-friendliness, while also enabling rapid response to user questions, realizing semantic search functionality, and providing users with higher-quality services.

[0038] In some possible implementations, the training process of the relevance model includes:

[0039] Obtain a training set, which includes multiple training data sets, each of which includes a sample semantic information, a sample label information, and annotation data on the relevance between the sample semantic information and the sample label information;

[0040] For each of the training data, perform the following processing:

[0041] The sample semantic information and sample label information in the training data are input into a preset deep learning model to obtain the predicted data of the relevance between the sample semantic information and the sample label information;

[0042] The model parameters of the deep learning model are updated based on the predicted data and labeled data of the relevance between the sample semantic information and the sample label information.

[0043] Check whether the preset training termination condition is met; if yes, use the trained deep learning model as the relevance model; if no, continue to train the deep learning model using the next training data.

[0044] The beneficial effects of this technical solution are as follows: Training with a deep learning model allows for continuous optimization of the relevance model's parameters based on a large amount of sample data, thereby improving the model's accuracy and precision. This enables more accurate matching of user questions and answer videos in practical applications. The solution automatically updates model parameters through labeled data, reducing manual intervention costs and ensuring the robustness of the deep learning model. Simultaneously, it continuously refines the training set, further enhancing the accuracy of the relevance model. Utilizing deep learning models to process large amounts of data improves data processing capabilities and efficiency, and can handle rapid growth and changes in data scale, thus adapting to data processing needs in different scenarios. Finally, this technical solution improves the accuracy and timeliness of matching user questions and answers, effectively enhancing the interactive experience and increasing user satisfaction.

[0045] In summary, this technical solution, by training a similarity model using a deep learning model, achieves more accurate calculation of the relevance between semantic information and service items. This improves the accuracy and classification ability of the relevance model, reduces the cost of manual intervention, and enhances data processing capabilities and interactive experience, thus providing users with higher-quality services.

[0046] In some possible implementations, the method further includes:

[0047] When there is no answer video matching the question information in the question-and-answer database, the first terminal device is used to play a second interactive video and display a form button. In the second interactive video, the virtual object guides the user to click the form button and fill in the form.

[0048] When a click operation is received on the form button, the form is displayed using the first terminal device. The form is provided with one or more of the following: radio buttons, check boxes, text input boxes, drop-down list boxes, file upload controls, and text fields.

[0049] Based on the form completion results, the user's feedback information is obtained. The feedback information is used to indicate one or more of the user's experience and suggestions for using the virtual object explanation function, and suggestions for adjusting the explanation content and order of the explanation video.

[0050] The beneficial effects of this technical solution are as follows: When no matching video answer exists in the question-and-answer database, guiding users to fill out a form enhances user participation, increases interactivity, and improves user satisfaction. Collecting user feedback through the form allows for understanding user experiences and suggestions regarding the virtual object explanation function, as well as adjustments to the content and order of the explanation videos. This feedback will help further optimize the explanation function and improve user satisfaction. Collecting user feedback also helps the system perform data analysis and learning, optimizing the algorithm model and further enhancing the system's intelligence level. Guiding users to fill out the form through a second interactive video provides convenient access to user feedback, enabling rapid service feedback and continuous improvement, and providing services that better meet user needs. In summary, this technical solution, when the highest relevance does not meet the preset conditions, can improve user participation and further enhance service quality and user satisfaction by guiding users to fill out a form and obtain feedback.

[0051] In some possible implementations, the method further includes:

[0052] Based on the feedback information, it is determined whether the user is satisfied with the explanation function of the virtual object;

[0053] When the user is not satisfied with the explanation function of the virtual object, a prompt message is generated and sent to the second terminal device of the configuration personnel to prompt the configuration personnel to adjust the explanation video and / or the explanation order of the virtual object.

[0054] The beneficial effects of this technical solution are as follows: By detecting user feedback, it can promptly determine whether users are satisfied with the virtual object explanation function. If the user is dissatisfied, a prompt message will be generated and sent to the configuration personnel's device to guide them in adjusting the explanation video and / or the explanation order of the virtual object. This effectively optimizes the explanation function, improves user satisfaction, and also strengthens the self-improvement capability of the question-and-answer database. In summary, detecting user satisfaction with the virtual object explanation function can improve real-time service, enhance user satisfaction, and improve service quality. Furthermore, by combining a second interactive video and form feedback mechanism, this technical solution effectively improves the intelligence level and service quality of the explanation function, while also providing users with a more personalized and superior experience.

[0055] Secondly, this application provides an electronic device including a memory and at least one processor, the memory storing a computer program, and the at least one processor being configured to execute the computer program to perform the following steps:

[0056] Receive an access request from a first terminal device, establish a communication connection between the first terminal device and a target server, the target server being used to provide virtual object explanation functions;

[0057] The first terminal device is used to display a list of explanatory videos for virtual objects. The list of explanatory videos corresponds to multiple explanatory videos, and each explanatory video corresponds to different explanatory content.

[0058] When a selection operation for one of the explanatory videos is received, the selected explanatory video is played using the first terminal device;

[0059] When the last instructional video in the instructional video list finishes playing, the first terminal device plays the first interactive video and displays a question button. In the first interactive video, the virtual object guides the user to click the question button and ask a question.

[0060] Thirdly, this application provides a computer-readable storage medium storing a computer program that, when executed by at least one processor, implements the steps of any of the above-described methods or the functions of any of the above-described electronic devices.

[0061] Fourthly, this application provides a computer program product comprising a computer program that, when executed by at least one processor, implements the steps of any of the above methods or the functions of any of the above electronic devices. Attached Figure Description

[0062] This application will be further described below with reference to the accompanying drawings and specific embodiments.

[0063] Figure 1 This is a flowchart illustrating a virtual object explanation method provided in an embodiment of this application.

[0064] Figure 2 This is a schematic diagram of a user interacting with a virtual object, provided in an embodiment of this application.

[0065] Figure 3 This is a structural block diagram of an electronic device provided in an embodiment of this application.

[0066] Figure 4 This is a schematic diagram of the structure of a computer program product provided in an embodiment of this application. Detailed Implementation

[0067] The technical solutions in this application will be described below with reference to the accompanying drawings and specific embodiments. It should be noted that, without conflict, the various embodiments or technical features described below can be arbitrarily combined to form new embodiments.

[0068] In this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any implementation or design described as "exemplary" or "for example" in this application should not be construed as being better or more advantageous than other implementations or designs. Specifically, the use of terms such as "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.

[0069] The descriptions of "first," "second," etc., appearing in the embodiments of this application are for illustrative purposes and to distinguish the objects being described. They have no order and do not indicate any special limitation on the quantity in the embodiments of this application, nor do they constitute any limitation on the embodiments of this application.

[0070] The technical field and related terms of the embodiments of this application are briefly described below.

[0071] Virtual objects include virtual objects, virtual animals, and virtual cartoon characters. Virtual objects are anthropomorphic figures constructed using CG technology and running in code form, possessing various interactive capabilities such as language communication, facial expressions, and action demonstrations. Virtual object technology has rapidly developed in the field of artificial intelligence and has been applied in many technological areas, such as film, media, games, finance, cultural tourism, education, and healthcare. It can not only customize virtual hosts, virtual anchors, virtual idols, virtual customer service representatives, virtual lawyers, virtual teachers, virtual financial advisors, virtual tour guides, virtual doctors, and virtual assistants, but also generate videos with a single click from text or audio. Among virtual objects, service-oriented virtual objects primarily function to replace real-person services and provide daily companionship; they are virtualizations of real-world service roles. Their industrial value mainly lies in reducing costs in existing service industries, improving efficiency and reducing costs in the existing market.

[0072] Artificial Intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess perception, reasoning, and decision-making capabilities. AI technology is a comprehensive discipline involving a wide range of fields, encompassing both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly include computer vision, speech processing, natural language processing, as well as machine learning / deep learning, autonomous driving, and intelligent transportation.

[0073] Machine Learning (ML) is a multidisciplinary field involving probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. A computer program can learn experience E given a certain type of task T and a performance metric P. If its performance on task T can be measured by P, it improves with experience E. Machine learning specifically studies how computers can simulate or implement human learning behavior to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to endow computers with intelligence; its applications span all areas of artificial intelligence.

[0074] Deep learning is a special type of machine learning that learns to represent the world using nested hierarchical concepts, achieving tremendous functionality and flexibility. Each concept is defined as being associated with a simpler one, while more abstract representations are computed in a less abstract manner. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and learning by demonstration.

[0075] Virtual object interaction applications provide interactive functionality using virtual objects. These virtual objects can simulate human communication and behavior and interact with the user. Such software (referring to virtual object interaction applications) is typically driven by artificial intelligence and natural language processing technologies and can interact with users through text, voice, images, forms, and other methods.

[0076] With the development of science and technology, virtual object explanation methods have emerged. This method utilizes computer technology to transform real-world objects, scenes, and situations into virtual forms, and then conveys relevant knowledge and concepts to users through visualization, interaction, and simulation. Virtual object explanation technology can be applied to various fields, such as medicine, engineering, architecture, and the military. In the medical field, virtual objects can explain surgical procedures, disease diagnosis, and treatment scenarios, helping medical students and doctors to better learn and practice. In engineering and architecture, virtual objects can explain mechanical equipment, building structures, and construction processes, helping engineers and architects to better understand and apply relevant knowledge. In the military field, virtual objects can explain battlefield environments, weapon systems, and combat strategies, helping soldiers to better train and respond to actual combat.

[0077] The application of virtual object explanation technology can effectively improve the effectiveness of education and training, reduce the waste of human, material, and time, and lower the risks and costs of learning and practice. However, existing virtual object explanation technologies still have some problems and challenges. For example, how to maintain user interest and engagement, how to provide personalized explanation methods, and how to adjust and optimize based on user feedback and performance. Based on this, this application provides a virtual object explanation method, electronic device, computer-readable storage medium, and computer program product to improve the existing technology.

[0078] The solutions provided in this application involve technologies such as virtual objects, interaction design, artificial intelligence, 3D modeling, and cloud computing, which are specifically illustrated in the following embodiments. It should be noted that the order of description of the following embodiments is not intended to limit the preferred order of the embodiments.

[0079] (Explanation of virtual objects)

[0080] See Figure 1 , Figure 1 This is a flowchart illustrating a virtual object explanation method provided in an embodiment of this application.

[0081] This application provides a method for explaining virtual objects, the method including:

[0082] Step S101: Receive an access request from the first terminal device and establish a communication connection between the first terminal device and the target server, wherein the target server is used to provide virtual object explanation function;

[0083] Step S102: Use the first terminal device to display a list of explanation videos for virtual objects. The list of explanation videos corresponds to multiple explanation videos, and each explanation video corresponds to different explanation content.

[0084] Step S103: When a selection operation for one of the explanation videos is received, play the selected explanation video using the first terminal device;

[0085] Step S104: When the last explanation video in the explanation video list finishes playing, play the first interactive video using the first terminal device and display a question button. In the first interactive video, the virtual object guides the user to click the question button and ask questions.

[0086] The virtual object explanation method can run on an electronic device. The electronic device and the first terminal device (used by the user) can be independent of each other, or the electronic device can be integrated with the first terminal device. When the electronic device and the first terminal device are independent of each other, the electronic device can be a device with computing capabilities such as a computer, a server (including a cloud server), etc.

[0087] The embodiments of this application do not limit the terminal device (including the first terminal device, the second terminal device, etc.). For example, it can be a smart terminal device with a display screen, a microphone, and a speaker, such as a mobile phone, a tablet computer, a laptop computer, a desktop computer, a smart wearable device, etc. Alternatively, it can also be a workstation or a console with a display screen, a microphone, and a speaker. The display screen can be a touch display screen or a non-touch display screen.

[0088] In the embodiments of this application, the virtual object explanation video includes one or more of virtual objects, virtual animals, and virtual cartoon characters. As an example, the virtual object explanation video is the virtual object "JING" (Chinese name: Mirror).

[0089] The target server can run one or more application programs to provide the interactive function of the virtual object explanation video. These application programs can be written in one or more programming languages, such as Java, Python, Node.js, etc., and can utilize various frameworks and libraries to implement various functions, such as natural language processing, speech-to-text, image recognition, etc. To improve the availability and performance of the target server, a load balancer can be used to distribute requests to multiple servers, and a failover mechanism can be used to automatically switch to a standby server when a server fails. The electronic device and the target server can be independent of each other, or the electronic device can be integrated with the target server.

[0090] The question-and-answer database corresponding to the virtual object explanation videos needs to be stored on one or more servers. These servers can use cloud storage or proprietary storage to store the videos. The question-and-answer database stores first-person interactive videos corresponding to multiple fields, such as medicine, engineering, architecture, and the military. In the medical field, virtual objects can explain surgical procedures, disease diagnosis, and treatment scenarios, helping medical students and doctors to better learn and practice. In the engineering and architecture fields, virtual objects can explain mechanical equipment, building structures, and construction processes, helping engineers and architects to better understand and apply relevant knowledge. In the military field, virtual objects can explain battlefield environments, weapon systems, and combat strategies, helping soldiers to better train and respond to actual combat.

[0091] Therefore, by establishing a communication connection between the first terminal device and the target server, users can directly watch a list of explanatory videos for virtual objects on their own devices and conveniently select and play the desired videos. By displaying a list of explanatory videos for virtual objects, which includes multiple explanatory videos, users can select videos that interest them to watch. When the explanatory video ends, an interactive segment will begin, allowing users to interact with the virtual object through a question button to learn more about the information.

[0092] In summary, this technical solution provides a more personalized and efficient online interactive experience. By offering convenient and efficient interactive methods, it effectively improves users' understanding and mastery of knowledge. Furthermore, the online format of this virtual object explanation method allows users to watch videos anytime, anywhere, increasing the flexibility and convenience of learning and demonstrating promising application prospects.

[0093] Give an example to illustrate the interaction process between the user and the virtual narrator "Little A".

[0094] A user, using a smartphone or computer (i.e., a terminal device), logs into an online learning platform and accesses the menu interface of the virtual tutor "Xiao A". At this time, the terminal device sends an access request to the target server, requesting to establish a communication connection in order to obtain the relevant configuration information for "Xiao A's" tutoring. Upon receiving the request, the target server authenticates the user, confirms the user's right to access "Xiao A's" tutoring function, and returns the corresponding configuration information.

[0095] The target server generates and returns a list of instructional videos for "Xiao A" based on the user's learning needs and interests. For example, "Xiao A" can explain multiple knowledge points, such as physics, chemistry, and mathematics. The video list contains multiple different instructional videos for these knowledge points, each with different content. For instance, when the user selects the knowledge point "mechanical kinematics," "Xiao A" will return multiple instructional videos related to mechanical kinematics.

[0096] The user selects one of the instructional videos, such as the one on "Fundamentals of Mechanics." At this point, the terminal device sends a selection request to the target server, requesting that the video be played. The target server confirms the request and returns the corresponding video stream to the terminal device for playback.

[0097] When the "Fundamentals of Mechanics" video finishes playing, the target server sends a command to the terminal device to automatically play the first interactive video. In this interactive video, "Little A" displays a question button and guides the user to click it to ask a question. For example, the user might ask a mechanics-related question such as "What is Newton's First Law?". "Little A" will then retrieve the corresponding video explanation based on the user's question and play it on the first terminal device. In this video, "Little A" will explain the definition, importance, principle, examples, precautions, application scenarios, learning tips, practice questions and their solutions, and answers to Newton's First Law.

[0098] Through the steps described above, the virtual object explanation video achieves effective interaction with the user. Online learning platforms using this method can provide personalized learning experiences based on different user needs and interests, helping users achieve more efficient, interesting, and in-depth learning results.

[0099] In some embodiments, before playing the first interactive video using the first terminal device, the method further includes:

[0100] The first terminal device is used to remind the user to complete the explanation in the form of a floating layer or a pop-up window.

[0101] Therefore, by using overlays or pop-ups to remind users to complete the explanation, users can be reminded to learn the first interactive video, increasing user participation and willingness to learn; helping users better grasp knowledge points and improve learning efficiency; at the same time, preventing users from forgetting to complete the explanation, thereby improving the user learning experience; reducing learning costs, as users do not need to spend extra time and energy to pay attention to whether the explanation is needed; and preventing users from forgetting to learn the prerequisite content and thus not understanding the subsequent interactive videos, thereby improving the smoothness and experience of user learning.

[0102] In summary, this technical solution, by using an overlay or pop-up to remind users to learn the first interactive video after the last explanatory video has finished playing, can improve user participation and willingness to learn. The overlay or pop-up prompts can directly attract users' attention, effectively reminding them to proceed to the next learning interaction, increasing users' understanding and participation in the learning content, and improving learning effectiveness.

[0103] For example, suppose a user uses a mobile phone (the first terminal device) to access a website that provides virtual physics teacher explanations. The user's mobile phone will send an access request to the target server of the website, establishing a communication connection.

[0104] On the phone, users will see a list of video lectures by a virtual physics teacher. This list contains multiple videos, each covering a different physics concept, such as Newton's First Law and Newton's Second Law.

[0105] When a user selects an explanatory video, the phone will play the selected video. For example, if the user selects the video on Newton's First Law, the user will see a virtual physics teacher explaining Newton's First Law.

[0106] When the last instructional video in the list finishes playing, the phone will notify the user via a floating overlay or pop-up window that all instructional content has been completed.

[0107] Next, the phone will play an interactive video and display a question button. In this video, a virtual physics teacher will guide the user to click the question button and ask a question. For example, if a user has questions about Newton's First Law, they can click the question button and ask a question.

[0108] In some embodiments, during the first interactive video, the virtual object prompts the user that the explanation has ended and guides the user to click the question button and ask a question.

[0109] Therefore, by using virtual objects in the first interactive video to indicate the end of the explanation and guiding users to click the question button, interactive communication between users and the learning content is promoted. This technology also effectively enhances user participation and learning interest, making the learning process more interactive and vivid. Providing a question button helps users obtain answers and solve problems more easily, improving both learning effectiveness and user satisfaction with the learning content. By guiding users to participate in interaction, a positive user experience is established, enhancing user trust and satisfaction with the course or product. In summary, this technical solution helps users better grasp the learning content, improves user participation and learning interest. By using virtual object prompts to guide users to participate in interaction and ask questions, user participation and interactivity are enhanced, thereby improving teaching effectiveness and increasing user learning outcomes.

[0110] Give an example to illustrate the interaction process between the user and the virtual narrator "Little A".

[0111] Let's say a user is using an online English learning app that offers a series of interactive video lessons, each taught by "Little A". The user has opened the app and is ready to start their first lesson.

[0112] When a user opens the video for the first lesson, a pop-up or overlay will appear, reminding the user to complete the explanation. This prompt will appear in the first few seconds of the video and will disappear automatically.

[0113] Next, users will watch the first interactive video and listen to "Little A's" explanation. When the explanation ends, "Little A" will display a prompt guiding users to click the "Ask a Question" button. This button is usually located at the bottom of the video and has a question mark icon.

[0114] After the user clicks the "Ask a Question" button, the phone displays a text input box and a voice capture button for entering the question. The user can input any question via text or voice and then click the "Send" button. "Xiao A" will then search the question and answer video from the question and answer database to help the user understand the content in the video.

[0115] For example, if a video explains a concept in English grammar and a user asks, "How often is this concept used in English?", "Little A" can answer, "This concept is very common in English, and you need to memorize its usage."

[0116] Through the steps described above, "Little A" guides users through the explanation using overlays and pop-ups, and then encourages users to ask questions after the explanation. This approach enhances the user's learning experience and makes learning easier and more enjoyable.

[0117] In some embodiments, the method further includes:

[0118] When a click operation is received on the question button, the first terminal device displays a question interface, which includes one or more of the following: a question input box, a voice capture button, and an image upload control.

[0119] The question interface is used to receive the user's question information, which may be text information, voice information, or image information.

[0120] The system detects whether there is an answer video in the question-and-answer database corresponding to the virtual object that matches the question information. The question-and-answer database stores multiple answer videos.

[0121] When a video answer that matches the question information exists in the question-and-answer database, the matching video answer is recorded as the first video answer.

[0122] The first terminal device is used to play the first answer video, in which the virtual object explains the content corresponding to the question information.

[0123] In some embodiments, when the input information is text information, the semantic extraction model corresponding to the input information is a pre-trained language model based on deep learning; when the input information is speech information, the semantic extraction model corresponding to the input information includes a speech-to-text model based on deep learning and a pre-trained language model based on deep learning; when the input information is image information, the semantic extraction model corresponding to the input information is a semantic segmentation model based on deep learning.

[0124] Therefore, deep learning-based models are used to extract semantic features from the input information. Different deep learning models are employed for semantic extraction depending on the type of input information, including pre-trained language models, speech-to-text models, and semantic segmentation models.

[0125] For textual information, pre-trained language models, such as BERT and GPT, are used to extract semantic features from the input text. These models, through pre-training on a large amount of text, can learn the structure and semantic information of the language, thus effectively extracting the semantic information of the input text.

[0126] For speech information, deep learning-based speech-to-text models, such as CTC and Transformer, are used to convert speech information into text information, and pre-trained language models are used to extract semantic features from the text information. In this way, semantic information related to the input information can be extracted from the speech information.

[0127] For image information, deep learning-based semantic segmentation models, such as UNet and DeepLab, are used to segment the image and extract the semantic information corresponding to each pixel. This allows for the accurate extraction of semantic information related to the input information from the image.

[0128] Therefore, this technical solution provides a convenient way to ask questions to a virtual object via text, voice capture, or image upload controls on the question interface. This allows users to automatically match answer videos to quickly obtain the required information. The solution also offers an interactive learning experience, allowing users to gain a deeper understanding of the content by watching answer videos, significantly improving user satisfaction and experience. It also enhances the intelligence of the virtual object, providing better service. The question-and-answer library corresponding to the virtual object stores multiple answer videos, providing more comprehensive and accurate information to meet the needs of different users. When playing the first answer video, the virtual object explains the corresponding content based on the question, providing users with more targeted and practical knowledge and improving learning effectiveness. In conclusion, this implementation provides users with a better learning experience and results, helping them solve problems more quickly and accurately, and increasing their learning enthusiasm and initiative. It has significant application value and promotional significance.

[0129] This example illustrates the interaction process between a user and the doctor's assistant "Xiao A" (i.e., the virtual object).

[0130] Imagine a user has a doctor's assistant named "Xiao A" who can be accessed through a dedicated mobile application called "Dr.AI". This application is installed on the user's primary device (such as a mobile phone or tablet), and the user can use this device to ask "Xiao A" questions and get answers.

[0131] When a user clicks the "Ask a Question" button in Dr.AI, the first terminal device will display the question interface. This interface includes a question input box, a voice capture button, and an image upload control. Users can ask "Xiao A" questions here in the form of text, voice, or images.

[0132] Users can click the voice capture button and say, "What should I do if I have a headache?" The target server will receive the question, convert it into text, and check if a matching video answer exists in its question-and-answer database.

[0133] If a matching video response exists, the target server will select the first matching video and play it on the user's device. In the video, "Little A" will explain to the user how to relieve headaches and provide relevant advice.

[0134] For example, in the video, "Little A" might suggest that users apply an ice pack to their head or massage their neck to relieve headaches. "Little A" can also provide other information about headaches, such as their causes, prevention methods, and when to consult a doctor.

[0135] By using the doctor's assistant "Xiao A", users can quickly obtain medical advice and information without having to visit a doctor in person.

[0136] See Figure 2 , Figure 2 This is a schematic diagram of a user interacting with a virtual object, provided in an embodiment of this application.

[0137] In some embodiments, the process of detecting whether there is an answer video matching the question information in the question-and-answer database corresponding to the virtual object includes:

[0138] Using the semantic extraction model corresponding to the question information, semantic information is extracted from the question information;

[0139] The relevance of the semantic information to each answer video in the question-and-answer database is obtained respectively.

[0140] When the maximum relevance value is greater than the preset relevance value, it is confirmed that there is an answer video in the question-and-answer database that matches the question information;

[0141] When the maximum relevance value is not greater than the preset relevance value, it is confirmed that there is no answer video in the question-and-answer database that matches the question information.

[0142] Therefore, by using a semantic extraction model to extract semantic information from the input information, it is possible to more accurately understand user needs and effectively improve the accuracy of virtual object explanation videos. This technical solution utilizes a semantic extraction model to extract semantic information from the question information, enabling more accurate matching of corresponding answer videos. By setting a preset relevance, misjudgment and incorrect matching can be effectively avoided, ensuring that the answer video with the highest relevance is provided to the user as the first answer video. This technical solution can also perform statistical analysis on missing questions in the question-and-answer database, which helps to optimize and improve the question-and-answer database, enhancing its stability and usability.

[0143] In summary, by employing a semantic extraction model, calculating relevance, and setting preset relevance, this technical solution can more accurately determine users' question-and-answer needs, providing users with more precise and personalized answer videos. This approach is highly practical and efficient in explaining virtual objects, helping to further improve user experience and satisfaction. At the same time, it can automatically process large amounts of question information and answer videos without manual intervention, thus improving work efficiency.

[0144] Give an example of the interaction process between the user and the virtual tour guide "Little A" (virtual object).

[0145] Imagine a user is touring a city and using a virtual tour guide app on their phone. One feature allows the user to ask questions to the virtual guide, "Xiao A." For example, if the user wants to know the historical background of a particular attraction, they could ask, "What is the history of this attraction?" The "Xiao A" app then uses a semantic extraction model to extract semantic information from the user's question and compares this information with each answer video in a question-and-answer database. For instance, the semantic extraction model might extract keywords like "attraction" and "history" from the user's question and match them with information from the answer videos.

[0146] Then, "Xiao A" calculates the relevance of each answer video to the question. For example, if there's an answer video in the question-and-answer database that answers the user's question: "This attraction was built in the 16th century as part of the city walls constructed by the rulers to defend against enemies," then "Xiao A" will identify it as the best answer video because its relevance to the question is the highest and greater than the preset relevance. If the highest relevance is not greater than the preset relevance, then "Xiao A" will tell the user that there is no answer video in the question-and-answer database that matches the user's question.

[0147] Through the steps described above, using semantic extraction models and relevance calculations, it's possible to automatically detect whether a matching video answer exists in the question-and-answer database corresponding to the virtual object. This allows users to more easily obtain information about their travel destinations.

[0148] In some embodiments, obtaining the relevance of the semantic information to each answer video in the question-answering database includes:

[0149] For each of the aforementioned answer videos, the following processing is performed:

[0150] Obtain the tag information corresponding to the answer video;

[0151] The semantic information and the tag information are input into a relevance model to obtain the relevance between the semantic information and the tag information, which is used as the relevance between the semantic information and the answer video.

[0152] For example, the virtual object is a virtual teacher who teaches a course on virtual human technology. This virtual teacher's question-and-answer database stores multiple answer videos, each with corresponding tag information describing the video's theme, keywords, type, etc.

[0153] For example:

[0154] Answer Video 1: A digital character with human appearance and behavioral characteristics created using technologies such as computer graphics, graphics rendering, motion capture, deep learning, and speech synthesis. Tags for Answer Video 1: Virtual human technology; definition; concept; technological components.

[0155] Answer to Video 2: Virtual humans can be classified in various ways, such as by technology, visual dimensions, structural composition, and business model. Video 2's tags: Virtual human technology; Classification; Methods; Types.

[0156] Answer to Video 3: Creating a virtual human involves the following steps: 1. Determine the virtual human's appearance and style; 2. Choose a suitable modeling method, such as 3D modeling or AI facial sculpting; 3. Bind key points and motion capture equipment to drive the virtual human's body shape, eyes, and movements; 4. Choose a suitable voice generation method, such as speech synthesis or human voice-over; 5. Choose a suitable animation generation method, such as preset animation or AI-generated animation; 6. Choose a suitable audio and video display method, such as screen display or holographic projection. Tags for Answer to Video 3: Virtual human technology; Production; Steps; Methods.

[0157] When a user asks a question, it needs to be converted into semantic information, that is, the intent and keywords of the question. For example:

[0158] The user's question is: How is virtual human technology implemented? The extracted semantic information is as follows: Intent: To understand the implementation principle of virtual human technology; Keywords: virtual human technology; implementation.

[0159] Then, the semantic information is obtained and its relevance to each answer video in the question-and-answer database, that is, to determine which answer video can better meet the user's needs.

[0160] For each answer video, the corresponding tag information is obtained. The semantic information and tag information are then input into a relevance model to obtain the relevance between the semantic information and the tag information. This relevance is used as the relevance between the semantic information and the answer video. The relevance model is a machine learning model used to calculate the similarity between two sets of texts, and it can score based on the semantic, grammatical, and lexical features of the text. For example:

[0161] The relevance between semantic information and the tag information of the response video 1: 0.6.

[0162] The relevance between semantic information and the tag information of response video 2: 0.4

[0163] The relevance between semantic information and the tag information of response video 3: 0.8

[0164] The most relevant answer video, i.e., answer video 3, will be selected as the output of the virtual object's explanation process.

[0165] In some embodiments, the training process of the relevance model includes:

[0166] Obtain a training set, which includes multiple training data sets, each of which includes a sample semantic information, a sample label information, and annotation data on the relevance between the sample semantic information and the sample label information;

[0167] For each of the training data, perform the following processing:

[0168] The sample semantic information and sample label information in the training data are input into a preset deep learning model to obtain the predicted data of the relevance between the sample semantic information and the sample label information;

[0169] The model parameters of the deep learning model are updated based on the predicted data and labeled data of the relevance between the sample semantic information and the sample label information.

[0170] Check whether the preset training termination condition is met; if yes, use the trained deep learning model as the relevance model; if no, continue to train the deep learning model using the next training data.

[0171] Training a deep learning model using the training set corresponding to the relevance model allows for rapid modeling by learning from only a small number of samples. The training error of the deep learning model gradually decreases during continuous training. The optimal weights are saved and retrieved. The accuracy of the training and validation sets is recorded for parameter tuning (adjusting model parameters). Updating the model parameters of the deep learning model enables the model to better fit the data, has effective generalization ability, and improves robustness and fitting accuracy.

[0172] In some alternative implementations, data mining can be performed on historical data to obtain sample data in the training set. That is, this sample data can be collected during real interactions between users and human service personnel (i.e., real service personnel). Alternatively, the sample data can be automatically generated using the generative network of a GAN model.

[0173] The GAN model, or Generative Adversarial Network, consists of a generator network and a discriminator network. The generator network randomly samples data from the latent space as input, and its output should closely mimic real samples in the training set. The discriminator network takes either real samples or the generator network's output as input, aiming to distinguish the generator network's output from real samples as much as possible. The generator network, in turn, tries to deceive the discriminator network. The two networks compete against each other, constantly adjusting their parameters, with the ultimate goal of making it impossible for the discriminator network to determine whether the generator network's output is genuine. Using GAN models can generate a large amount of sample data for training the aforementioned relevance models, effectively reducing the amount of raw data collected and significantly lowering the cost of data acquisition and annotation.

[0174] The embodiments of this application do not limit the training process of the relevance model. For example, it can adopt a supervised learning training method, a semi-supervised learning training method, or an unsupervised learning training method.

[0175] When using supervised or semi-supervised learning training methods, this application embodiment does not limit the method of obtaining labeled data. For example, manual labeling, automatic labeling, or semi-automatic labeling can be used. When sample data is collected during real-world interactions, real data can be obtained from historical data as labeled data through keyword extraction.

[0176] The embodiments of this application do not limit the training termination conditions in the training process of the relevance model. For example, it may be that the number of training times reaches a preset number (the preset number of times is, for example, 1 time, 3 times, 10 times, 100 times, 1000 times, 10000 times, etc.), or it may be that the training data in the training set has been trained once or multiple times, or it may be that the total loss value obtained in this training is not greater than the preset loss value.

[0177] Therefore, training with a deep learning model allows for continuous optimization of the relevance model's parameters based on a large amount of sample data, thereby improving the model's accuracy and precision. This enables more accurate matching of user questions and answer videos in practical applications. This technical solution automatically updates model parameters through labeled data, reducing manual intervention costs and ensuring the robustness of the deep learning model. Simultaneously, it continuously refines the training set, further enhancing the accuracy of the relevance model. By utilizing deep learning models to process large amounts of data, it improves data processing capabilities and efficiency, and can handle rapid growth and changes in data scale, thus adapting to data processing needs in different scenarios. This technical solution improves the accuracy and timeliness of matching user questions and answers, effectively enhancing the interactive experience and increasing user satisfaction.

[0178] In summary, this technical solution, by training a similarity model using a deep learning model, achieves more accurate calculation of the relevance between semantic information and service items. This improves the accuracy and classification ability of the relevance model, reduces the cost of manual intervention, and enhances data processing capabilities and interactive experience, thus providing users with higher-quality services.

[0179] See also Figure 2 In some embodiments, the method further includes:

[0180] When there is no answer video matching the question information in the question-and-answer database, the first terminal device is used to play a second interactive video and display a form button. In the second interactive video, the virtual object guides the user to click the form button and fill in the form.

[0181] When a click operation is received on the form button, the form is displayed using the first terminal device. The form is provided with one or more of the following: radio buttons, check boxes, text input boxes, drop-down list boxes, file upload controls, and text fields.

[0182] Based on the form completion results, the user's feedback information is obtained. The feedback information is used to indicate one or more of the user's experience and suggestions for using the virtual object explanation function, and suggestions for adjusting the explanation content and order of the explanation video.

[0183] Therefore, when no matching video answer exists in the question-and-answer database, guiding users to fill out a form enhances user engagement, increases interactivity, and improves user satisfaction. Collecting user feedback through the form allows for understanding user experiences and suggestions regarding the virtual object explanation function, as well as adjustments to the content and order of the explanation videos. This feedback helps further optimize the explanation function and improve user satisfaction. Collecting user feedback also aids in data analysis and learning, optimizing the algorithm model and further enhancing the system's intelligence. Guiding users to fill out the form through a second interactive video provides a convenient way to obtain user feedback, enabling rapid service feedback and continuous improvement, and providing services that better meet user needs. In summary, this technical solution, when the highest relevance does not meet the preset conditions, improves user engagement and further enhances service quality and user satisfaction by guiding users to fill out a form and obtain feedback.

[0184] Give an example of the interaction process between a user and the virtual museum guide "Xiao A".

[0185] Users use their smartphones or computers (i.e., the first terminal device) to log in to a certain online museum browsing website and access the virtual museum guide "Xiao A".

[0186] Users can learn about the history, culture, and stories behind museum exhibits by watching videos or engaging in interactive experiences. For example, if a user asks a technical question about an exhibit but there is no corresponding explanatory video to meet their needs, "Xiao A" will guide the user to click the form button.

[0187] When a user clicks the form button, the first terminal device displays a form with radio buttons, checkboxes, text input boxes, drop-down list boxes, file upload controls, and text fields. For example, in the scenario above, the form displays a form containing questions such as "How satisfied are you with the explanation of the current exhibit?", "How satisfied are you with the overall service experience of the museum?", and "What questions or suggestions do you have about this exhibit?".

[0188] Users fill out a form and submit it. The target server then obtains user feedback based on the form's completion and uses it to improve the presentation experience and content. For example, if a user indicates that the explanation of an exhibit is not clear enough, the virtual object can adjust the content and order of the presentation video accordingly to enhance the presentation and provide a better service.

[0189] By following the steps above, user feedback can be collected to improve the virtual museum guide "Xiao A's" explanation experience and content, so as to provide users with richer, more interesting and personalized museum exhibit display and explanation services.

[0190] In some embodiments, the method further includes:

[0191] Based on the feedback information, it is determined whether the user is satisfied with the explanation function of the virtual object;

[0192] When the user is not satisfied with the explanation function of the virtual object, a prompt message is generated and sent to the second terminal device of the configuration personnel to prompt the configuration personnel to adjust the explanation video and / or the explanation order of the virtual object.

[0193] Therefore, this technical solution, by detecting user feedback, can promptly determine whether users are satisfied with the virtual object explanation function. If the user is dissatisfied, a prompt message will be generated and sent to the configuration personnel's device to guide them in adjusting the virtual object explanation video and / or the explanation order. This effectively optimizes the explanation function, improves user satisfaction, and also strengthens the self-improvement capability of the question-and-answer database. In summary, detecting user satisfaction with the virtual object explanation function can improve real-time service, enhance user satisfaction, and improve service quality. Furthermore, by combining a second interactive video and form feedback mechanism, this technical solution effectively improves the intelligence level and service quality of the explanation function, while also providing users with a more personalized and superior experience.

[0194] Give an example of the interaction process between a user and the virtual library guide "Xiao A".

[0195] A user uses a smartphone or computer (the primary terminal device) to log in to an online library website and access the virtual library guide "Xiao A". The user's feedback is a text message: "This explanation is too abstract and not easy to understand."

[0196] Based on the feedback, it was detected that the user was dissatisfied with the explanation function. The target server generated a prompt message and sent it to the configuration personnel's second terminal device. The prompt message might read: "The user is not satisfied with the level of abstraction in the explanation video. Please adjust the video content and / or the order of explanations accordingly."

[0197] After receiving the prompt, the configuration personnel will adjust the instructional videos and their order accordingly to meet the user's needs. For example, they might re-record a more detailed and easier-to-understand instructional video and place it at the top of the instructional video list to improve user satisfaction.

[0198] By following the steps above, the virtual library guide "Xiao A's" explanation videos and explanation order can be continuously improved by detecting user satisfaction and sending prompts to the configuration personnel, so as to provide richer, more interesting and personalized book display and explanation services.

[0199] In a specific application scenario, this application embodiment also provides a method for explaining virtual objects, the method including:

[0200] Receive an access request from a first terminal device, establish a communication connection between the first terminal device and a target server, the target server being used to provide virtual object explanation functions;

[0201] The first terminal device is used to display a list of explanatory videos for virtual objects. The list of explanatory videos corresponds to multiple explanatory videos, and each explanatory video corresponds to different explanatory content.

[0202] When a selection operation for one of the explanatory videos is received, the selected explanatory video is played using the first terminal device;

[0203] When the last explanation video in the explanation video list finishes playing, the first terminal device plays the first interactive video and displays a question button. In the first interactive video, the virtual object prompts the user that the explanation has ended and guides the user to click the question button and ask a question.

[0204] When a click operation is received on the question button, the first terminal device displays a question interface, which includes one or more of the following: a question input box, a voice capture button, and an image upload control.

[0205] The question interface is used to receive the user's question information, which may be text information, voice information, or image information.

[0206] Using the semantic extraction model corresponding to the question information, semantic information is extracted from the question information;

[0207] For each of the answer videos, the following processing is performed: obtain the tag information corresponding to the answer video; input the semantic information and the tag information into a relevance model to obtain the relevance between the semantic information and the tag information, which is used as the relevance between the semantic information and the answer video;

[0208] When the maximum relevance is greater than the preset relevance, it is confirmed that there is an answer video in the question-and-answer database that matches the question information. The question-and-answer database stores multiple answer videos. When there is an answer video in the question-and-answer database that matches the question information, the matching answer video is recorded as the first answer video. The first answer video is played using the first terminal device. In the first answer video, the virtual object explains the explanation content corresponding to the question information.

[0209] When the maximum relevance value is not greater than the preset relevance value, it is confirmed that there is no answer video matching the question information in the question-and-answer database; when there is no answer video matching the question information in the question-and-answer database, the first terminal device is used to play the second interactive video and display a form button. In the second interactive video, the virtual object guides the user to click the form button and fill in the form.

[0210] When a click operation is received on the form button, the form is displayed using the first terminal device. The form is provided with one or more of the following: radio buttons, check boxes, text input boxes, drop-down list boxes, file upload controls, and text fields.

[0211] Based on the form completion results, obtain the user's feedback information. The feedback information is used to indicate one or more of the user's experience and suggestions for using the virtual object explanation function, and suggestions for adjusting the explanation content and explanation order of the explanation video.

[0212] Based on the feedback information, it is determined whether the user is satisfied with the explanation function of the virtual object;

[0213] When the user is not satisfied with the explanation function of the virtual object, a prompt message is generated and sent to the second terminal device of the configuration personnel to prompt the configuration personnel to adjust the explanation video and / or the explanation order of the virtual object;

[0214] When the user is satisfied with the explanation function of the virtual object, no operation is performed.

[0215] The training process of the relevance model includes:

[0216] Obtain a training set, which includes multiple training data sets, each of which includes a sample semantic information, a sample label information, and annotation data on the relevance between the sample semantic information and the sample label information;

[0217] For each of the training data, perform the following processing:

[0218] The sample semantic information and sample label information in the training data are input into a preset deep learning model to obtain the predicted data of the relevance between the sample semantic information and the sample label information;

[0219] The model parameters of the deep learning model are updated based on the predicted data and labeled data of the relevance between the sample semantic information and the sample label information.

[0220] Check whether the preset training termination condition is met; if yes, use the trained deep learning model as the relevance model; if no, continue to train the deep learning model using the next training data.

[0221] For example, an online education platform uses a virtual character named Xiaoming to provide a virtual explanation function. Xiaoming can introduce users to some basic computer knowledge, such as what a CPU, memory, and hard drive are. Xiaoming's explanation video list includes the following videos:

[0222] Explanation Video 1: What is a CPU?

[0223] Video Explanation 2: What is memory?

[0224] Video Explanation 3: What is a Hard Drive?

[0225] Video tutorial 4: What is an operating system?

[0226] When a user accesses the online education platform using a mobile phone, the target server receives the access request from the mobile phone and establishes a communication connection between the mobile phone and the target server. The target server is used to provide Xiaoming's explanation function.

[0227] Using a mobile phone to display Xiaoming's list of instructional videos, users can see the titles and thumbnails of four instructional videos on their mobile phone screen;

[0228] When the user clicks on the tutorial video 1 on the phone screen, the tutorial video 1 is played on the phone. In the tutorial video 1, Xiaoming uses voice and animation to introduce to the user what a CPU is, as well as the function and classification of CPUs.

[0229] After the user watches the explanation videos 1, 2, 3, and 4 in sequence, the first interactive video is played on the user's mobile phone, and a question button is displayed. In the first interactive video, Xiaoming uses voice and animation to prompt the user that the explanation has ended and guides the user to click the question button and ask a question.

[0230] When a user clicks the "Ask a Question" button on their phone screen, the phone displays the question interface, which includes a question input box, a voice capture button, and an image upload control.

[0231] Users can enter text information in the question input box, record voice information using the voice capture button, or upload image information using the image upload control as their question information. For example, a user can enter the text information "What is a GPU?";

[0232] Using a semantic extraction model corresponding to the question information, semantic information is extracted from the question information. For example, from "What is a GPU?", the keywords "GPU" and "definition" are extracted as semantic information;

[0233] For each answer video, the following processing is performed: Obtain the tag information corresponding to the answer video; input the semantic information and tag information into a relevance model to obtain the relevance between the semantic information and the tag information, which is then used as the relevance between the semantic information and the answer video. For example, suppose the question-and-answer database contains the following three answer videos:

[0234] Answering Video 1: What is a GPU? (Tags: GPU, definition)

[0235] Answering Video 2: What are the differences between a CPU and a GPU? (Tags: CPU, GPU, comparison)

[0236] Answering video 3: How to choose the right GPU? (Tags: GPU, selection)

[0237] By inputting "GPU" and "definition" along with the label information for each answer video into the relevance model, we can obtain the following results:

[0238] The relevance of semantic information to the response video 1: 0.9;

[0239] The relevance of semantic information to the response video 2: 0.6;

[0240] The semantic information is 0.4 as a function of the video.

[0241] When the maximum relevance score is greater than the preset relevance score, it is confirmed that there is an answer video in the question-and-answer database that matches the question information. Assuming the preset relevance score is 0.8, then it can be seen that answer video 1 matches the question information.

[0242] When a video answer that matches the question exists in the question-and-answer database, the matching video answer is recorded as the first video answer. The first video answer is played on the user's mobile phone. In the first video answer, Xiaoming uses voice and animation to introduce to the user what a GPU is, as well as the function and classification of GPUs.

[0243] When the maximum relevance score is not greater than the preset relevance score, it is confirmed that there is no answer video in the question-and-answer database that matches the question information. For example, if a user asks "What is AI?" as text information, after semantic extraction and relevance calculation, it can be found that there is no answer video in the question-and-answer database that matches this question.

[0244] When there is no matching video answer in the question-and-answer database, a second interactive video is played on the user's phone, and a form button is displayed. In the second interactive video, Xiaoming uses voice and animation to guide the user to click the form button and fill out the form.

[0245] When a user clicks the form button on their phone screen, the form is displayed on the user's phone. The form can be set to one or more of the following: radio buttons, check boxes, text input boxes, drop-down list boxes, file upload controls, and text fields.

[0246] Based on the form submissions, user feedback is collected. This feedback includes one or more suggestions regarding the user's experience with Xiaoming's explanation function, as well as adjustments to the content and order of the explanation videos. For example, a user could enter "I hope you can add some explanations about AI" in the text field as feedback.

[0247] Based on the feedback, determine whether the user is satisfied with Xiaoming's explanation function. For example, if the feedback contains words such as "dissatisfied," "doesn't understand," or "not clear," then it can be determined that the user is dissatisfied with Xiaoming's explanation function.

[0248] When a user is dissatisfied with Xiaoming's explanation function, a prompt message is generated and sent to the configuration personnel's second terminal device to prompt them to adjust Xiaoming's explanation videos and / or the order of explanations. For example, the prompt message could be "A user has reported a desire to add explanation content about AI. Please check and update the explanation video list."

[0249] In another example, suppose a user is a legal professional who needs to understand certain regulations but is unfamiliar with them. They can resolve this by visiting a legal advice website or app and communicating with a legal advisor, "Xiao A."

[0250] When a user uses a smartphone or computer (i.e., the first terminal device), they send an access request to "Xiao A". "Xiao A" will then establish a communication connection and transmit the user's request to the target server.

[0251] Users will see a list of explanatory videos on their primary device. The list contains multiple videos, each covering different regulations. Users can select a video they wish to learn about and begin playback. After the video finishes playing, "Xiao A" will automatically play the first interactive video and display a question button. In the first interactive video, "Xiao A" will notify the user that the explanation is complete and then guide the user to click the question button to ask a question.

[0252] When a user clicks the "Ask a Question" button, the first terminal device will display a question interface, which includes a question input box, a voice capture button, and an image upload control. The user can enter their question and submit it, and the target server will receive the user's question information.

[0253] When a user enters "I want to know how to go through divorce proceedings," the target server will predict the corresponding tag "divorce proceedings" based on this semantic information, and also predict the relevance of this semantic information to the labeled data. Based on the prediction results, the target server will provide relevant legal knowledge and advice to the user to help them better understand related legal issues.

[0254] When a user asks "Xiao A" a question, such as "Do I need to sign a contract to protect my rights?", the target server answers the question using data from its existing legal knowledge base and a trained deep learning model. If it cannot answer the question, a second interactive video is played to guide the user in filling out a form, and the user's feedback is obtained based on the form submission to further improve "Xiao A's" explanation capabilities.

[0255] If the user is not satisfied with "Xiao A's" explanation function, a prompt message will be generated and sent to the configuration personnel's device so that they can adjust the explanation videos and / or the explanation order of the virtual lawyer assistant.

[0256] In this embodiment, users can access the virtual legal assistant anytime, anywhere via smartphone or computer to obtain the necessary legal knowledge and advice without needing to schedule face-to-face consultations, saving time and effort. A deep learning model provides personalized legal knowledge and advice based on user questions, helping users better understand relevant legal issues. Real-time interaction with users guides them to ask questions and answers, improving user satisfaction. The virtual legal assistant can answer user questions using data from an existing legal knowledge base and a trained deep learning model, covering knowledge and advice from multiple legal fields. User feedback is obtained through forms to further improve the virtual legal assistant's explanation functions and increase user satisfaction. If users are not satisfied with the virtual legal assistant's explanation functions, administrators can adjust the explanation videos and / or the explanation order based on user feedback to ensure the continuous updating of the virtual legal assistant's knowledge base and service quality.

[0257] (Electronic devices)

[0258] This application also provides an electronic device, the specific embodiments of which are consistent with the embodiments and technical effects achieved in the above method embodiments, and some contents will not be repeated.

[0259] The electronic device includes a memory and at least one processor, the memory storing a computer program, and the at least one processor being configured to execute the computer program to perform the following steps:

[0260] Receive an access request from a first terminal device, establish a communication connection between the first terminal device and a target server, the target server being used to provide virtual object explanation functions;

[0261] The first terminal device is used to display a list of explanatory videos for virtual objects. The list of explanatory videos corresponds to multiple explanatory videos, and each explanatory video corresponds to different explanatory content.

[0262] When a selection operation for one of the explanatory videos is received, the selected explanatory video is played using the first terminal device;

[0263] When the last instructional video in the instructional video list finishes playing, the first terminal device plays the first interactive video and displays a question button. In the first interactive video, the virtual object guides the user to click the question button and ask a question.

[0264] In some embodiments, before playing the first interactive video using the first terminal device, the at least one processor, when configured to execute the computer program, further performs the following steps:

[0265] The first terminal device is used to remind the user to complete the explanation in the form of a floating layer or a pop-up window.

[0266] In some embodiments, during the first interactive video, the virtual object prompts the user that the explanation has ended and guides the user to click the question button and ask a question.

[0267] In some embodiments, when the at least one processor is configured to execute the computer program, it further performs the following steps:

[0268] When a click operation is received on the question button, the first terminal device displays a question interface, which includes one or more of the following: a question input box, a voice capture button, and an image upload control.

[0269] The question interface is used to receive the user's question information, which may be text information, voice information, or image information.

[0270] The system detects whether there is an answer video in the question-and-answer database corresponding to the virtual object that matches the question information. The question-and-answer database stores multiple answer videos.

[0271] When a video answer that matches the question information exists in the question-and-answer database, the matching video answer is recorded as the first video answer.

[0272] The first terminal device is used to play the first answer video, in which the virtual object explains the content corresponding to the question information.

[0273] In some embodiments, the at least one processor is configured to detect, when executing the computer program, whether an answer video matching the question information exists in the question-and-answer database corresponding to the virtual object:

[0274] Using the semantic extraction model corresponding to the question information, semantic information is extracted from the question information;

[0275] The relevance of the semantic information to each answer video in the question-and-answer database is obtained respectively.

[0276] When the maximum relevance value is greater than the preset relevance value, it is confirmed that there is an answer video in the question-and-answer database that matches the question information;

[0277] When the maximum relevance value is not greater than the preset relevance value, it is confirmed that there is no answer video in the question-and-answer database that matches the question information.

[0278] In some embodiments, the at least one processor is configured to, when executing the computer program, acquire the relevance of the semantic information to each answer video in the question-answering database in the following manner:

[0279] For each of the aforementioned answer videos, the following processing is performed:

[0280] Obtain the tag information corresponding to the answer video;

[0281] The semantic information and the tag information are input into a relevance model to obtain the relevance between the semantic information and the tag information, which is used as the relevance between the semantic information and the answer video.

[0282] In some embodiments, the training process of the relevance model includes:

[0283] Obtain a training set, which includes multiple training data sets, each of which includes a sample semantic information, a sample label information, and annotation data on the relevance between the sample semantic information and the sample label information;

[0284] For each of the training data, perform the following processing:

[0285] The sample semantic information and sample label information in the training data are input into a preset deep learning model to obtain the predicted data of the relevance between the sample semantic information and the sample label information;

[0286] The model parameters of the deep learning model are updated based on the predicted data and labeled data of the relevance between the sample semantic information and the sample label information.

[0287] Check whether the preset training termination condition is met; if yes, use the trained deep learning model as the relevance model; if no, continue to train the deep learning model using the next training data.

[0288] In some embodiments, when the at least one processor is configured to execute the computer program, it further performs the following steps:

[0289] When there is no answer video matching the question information in the question-and-answer database, the first terminal device is used to play a second interactive video and display a form button. In the second interactive video, the virtual object guides the user to click the form button and fill in the form.

[0290] When a click operation is received on the form button, the form is displayed using the first terminal device. The form is provided with one or more of the following: radio buttons, check boxes, text input boxes, drop-down list boxes, file upload controls, and text fields.

[0291] Based on the form completion results, the user's feedback information is obtained. The feedback information is used to indicate one or more of the user's experience and suggestions for using the virtual object explanation function, and suggestions for adjusting the explanation content and order of the explanation video.

[0292] In some embodiments, when the at least one processor is configured to execute the computer program, it further performs the following steps:

[0293] Based on the feedback information, it is determined whether the user is satisfied with the explanation function of the virtual object;

[0294] When the user is not satisfied with the explanation function of the virtual object, a prompt message is generated and sent to the second terminal device of the configuration personnel to prompt the configuration personnel to adjust the explanation video and / or the explanation order of the virtual object.

[0295] See Figure 3 , Figure 3 This is a structural block diagram of an electronic device 10 provided in an embodiment of this application.

[0296] Electronic device 10 may include, for example, at least one memory 11, at least one processor 12, and a bus 13 connecting different platform systems.

[0297] The memory 11 may include a (computer) readable medium in the form of volatile memory, such as random access memory (RAM) 111 and / or cache memory 112, and may further include read-only memory (ROM) 113.

[0298] The memory 11 also stores a computer program, which can be executed by the processor 12 to enable the processor 12 to implement the steps of any of the above methods.

[0299] The memory 11 may also include a utility 114 having at least one program module 115, including but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of these examples may include an implementation of a network environment.

[0300] Accordingly, processor 12 can execute the aforementioned computer program, and can also execute utility 114.

[0301] The processor 12 may employ one or more application-specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), or other electronic components.

[0302] Bus 13 can represent one or more types of bus structures, including a memory bus or memory controller, peripheral bus, graphics acceleration port, processor, or a local bus using any bus structure with multiple bus structures.

[0303] Electronic device 10 can also communicate with one or more external devices, such as keyboards, pointing devices, Bluetooth devices, etc., and with one or more devices capable of interacting with it, and / or with any device that enables it to communicate with one or more other computing devices (e.g., routers, modems, etc.). This communication can be performed through input / output interface 14. Furthermore, electronic device 10 can communicate with one or more networks (e.g., local area networks (LANs), wide area networks (WANs), and / or public networks, such as the Internet) via network adapter 15. Network adapter 15 can communicate with other modules of electronic device 10 via bus 13. It should be understood that, although not shown in the figures, in practical applications, other hardware and / or software modules can be used in conjunction with electronic device 10, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms.

[0304] (Computer-readable storage medium)

[0305] This application also provides a computer-readable storage medium, the specific embodiments of which are consistent with the embodiments and technical effects achieved in the above method embodiments, and some contents will not be repeated.

[0306] The computer-readable storage medium stores a computer program that, when executed by at least one processor, implements the steps of any of the above methods or the functions of any of the above electronic devices.

[0307] A computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. In embodiments of this application, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. Computer-readable storage media can be, for example, but not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any combination thereof. More specific examples of computer-readable storage media (a non-exhaustive list) include: electrical connections having one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0308] Computer-readable storage media may include data signals propagated in baseband or as part of a carrier wave, carrying readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. The computer-readable storage medium may also be any computer-readable medium capable of sending, propagating, or transmitting a program for use by or in conjunction with an instruction execution system, apparatus, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to wireless, wired, optical fiber, RF, or any suitable combination thereof. Program code for performing operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages such as Java and C++, as well as conventional procedural programming languages such as C or similar programming languages. The program code may be executed entirely on a user computing device, partially on a user device, as a standalone software package, partially on a user computing device and partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing devices can be connected to user computing devices via any type of network, including local area networks (LANs) or wide area networks (WANs), or they can be connected to external computing devices (e.g., via the Internet using an Internet service provider).

[0309] (Computer program products)

[0310] This application also provides a computer program product, the specific embodiments of which are consistent with the embodiments and technical effects achieved in the above method embodiments, and some contents will not be repeated.

[0311] The computer program product includes a computer program that, when executed by at least one processor, implements the steps of any of the above methods or the functions of any of the above electronic devices.

[0312] See Figure 4 , Figure 4 This is a schematic diagram of the structure of a computer program product provided in an embodiment of this application.

[0313] The computer program product is used to implement the steps of any of the above methods or to implement the functions of any of the above electronic devices. The computer program product may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may run on a terminal device, such as a personal computer. However, the computer program product of the present invention is not limited thereto, and the computer program product may employ any combination of one or more computer-readable media.

[0314] This application describes the invention from the perspectives of purpose, performance, progress, and novelty, and it meets the functional enhancement and use requirements emphasized by the Patent Law. The above description and drawings are merely preferred embodiments of this application and are not intended to limit this application. Therefore, all structures, devices, features, etc., that are similar to or identical to those of this application, i.e., all equivalent substitutions or modifications made in accordance with the scope of this patent application, shall fall within the scope of protection of this patent application.

Claims

1. A method for explaining virtual objects, characterized in that, The method includes: Receive an access request from a first terminal device, establish a communication connection between the first terminal device and a target server, the target server being used to provide virtual object explanation functions; The first terminal device is used to display a list of explanatory videos for virtual objects. The list of explanatory videos corresponds to multiple explanatory videos, and each explanatory video corresponds to different explanatory content. When a selection operation for one of the explanatory videos is received, the selected explanatory video is played using the first terminal device; When the last instructional video in the instructional video list finishes playing, the first terminal device plays the first interactive video and displays a question button. In the first interactive video, the virtual object guides the user to click the question button and ask a question. When a click operation is received on the question button, the first terminal device displays the question interface and receives the user's question information through the question interface. Detect whether there is an answer video in the question-and-answer database corresponding to the virtual object that matches the question information; When there is no answer video matching the question information in the question-and-answer database, the first terminal device is used to play a second interactive video and display a form button. In the second interactive video, the virtual object guides the user to click the form button and fill in the form. When a click operation is received on the form button, the form is displayed using the first terminal device. The form is provided with one or more of the following: radio buttons, check boxes, text input boxes, drop-down list boxes, file upload controls, and text fields. Based on the form completion results, the user's feedback information is obtained. The feedback information is used to indicate one or more of the user's experience and suggestions for using the virtual object explanation function, and suggestions for adjusting the explanation content and order of the explanation video.

2. The virtual object explanation method according to claim 1, characterized in that, Before playing the first interactive video using the first terminal device, the method further includes: The first terminal device is used to remind the user to complete the explanation in the form of a floating layer or a pop-up window.

3. The virtual object explanation method according to claim 1, characterized in that, In the first interactive video, the virtual object prompts the user that the explanation has ended and guides the user to click the question button and ask a question.

4. The virtual object explanation method according to claim 1, characterized in that, The method further includes: The question interface is provided with one or more of the following: a question input box, a voice capture button, and an image upload control; The question information can be text information, voice information, or image information; The question-and-answer database stores multiple answer videos; When a video answer that matches the question information exists in the question-and-answer database, the matching video answer is recorded as the first video answer. The first terminal device is used to play the first answer video, in which the virtual object explains the content corresponding to the question information.

5. The virtual object explanation method according to claim 4, characterized in that, The process of detecting whether there is an answer video matching the question information in the question-and-answer database corresponding to the virtual object includes: Using the semantic extraction model corresponding to the question information, semantic information is extracted from the question information; The relevance of the semantic information to each answer video in the question-and-answer database is obtained respectively. When the maximum relevance value is greater than the preset relevance value, it is confirmed that there is an answer video in the question-and-answer database that matches the question information; When the maximum relevance value is not greater than the preset relevance value, it is confirmed that there is no answer video in the question-and-answer database that matches the question information.

6. The virtual object explanation method according to claim 5, characterized in that, The step of obtaining the relevance between the semantic information and each answer video in the question-and-answer database includes: For each of the aforementioned answer videos, the following processing is performed: Obtain the tag information corresponding to the answer video; The semantic information and the tag information are input into a relevance model to obtain the relevance between the semantic information and the tag information, which is used as the relevance between the semantic information and the answer video.

7. The virtual object explanation method according to claim 6, characterized in that, The training process of the relevance model includes: Obtain a training set, which includes multiple training data sets, each of which includes a sample semantic information, a sample label information, and annotation data on the relevance between the sample semantic information and the sample label information; For each of the training data, perform the following processing: The sample semantic information and sample label information in the training data are input into a preset deep learning model to obtain the predicted data of the relevance between the sample semantic information and the sample label information; The model parameters of the deep learning model are updated based on the predicted data and labeled data of the relevance between the sample semantic information and the sample label information. Check whether the preset training termination condition is met; if yes, use the trained deep learning model as the relevance model; if no, continue to train the deep learning model using the next training data.

8. The virtual object explanation method according to claim 1, characterized in that, The method further includes: Based on the feedback information, it is determined whether the user is satisfied with the explanation function of the virtual object; When the user is not satisfied with the explanation function of the virtual object, a prompt message is generated and sent to the second terminal device of the configuration personnel to prompt the configuration personnel to adjust the explanation video and / or the explanation order of the virtual object.

9. An electronic device, characterized in that, The electronic device includes a memory and at least one processor, the memory storing a computer program, and the at least one processor being configured to execute the computer program to perform the following steps: Receive an access request from a first terminal device, establish a communication connection between the first terminal device and a target server, the target server being used to provide virtual object explanation functions; The first terminal device is used to display a list of explanatory videos for virtual objects. The list of explanatory videos corresponds to multiple explanatory videos, and each explanatory video corresponds to different explanatory content. When a selection operation for one of the explanatory videos is received, the selected explanatory video is played using the first terminal device; When the last instructional video in the instructional video list finishes playing, the first terminal device plays the first interactive video and displays a question button. In the first interactive video, the virtual object guides the user to click the question button and ask a question. When a click operation is received on the question button, the first terminal device displays the question interface and receives the user's question information through the question interface. Detect whether there is an answer video in the question-and-answer database corresponding to the virtual object that matches the question information; When there is no answer video matching the question information in the question-and-answer database, the first terminal device is used to play a second interactive video and display a form button. In the second interactive video, the virtual object guides the user to click the form button and fill in the form. When a click operation is received on the form button, the form is displayed using the first terminal device. The form is provided with one or more of the following: radio buttons, check boxes, text input boxes, drop-down list boxes, file upload controls, and text fields. Based on the form completion results, the user's feedback information is obtained. The feedback information is used to indicate one or more of the user's experience and suggestions for using the virtual object explanation function, and suggestions for adjusting the explanation content and order of the explanation video.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by at least one processor, implements the steps of the method of any one of claims 1-8 or the function of the electronic device of claim 9.

11. A computer program product, characterized in that, The computer program product includes a computer program that, when executed by at least one processor, implements the steps of the method of any one of claims 1-8 or the function of the electronic device of claim 9.