Virtual human generation device to which large language model based on retrieval-augmented generation using open library is applied, and method applied thereto
The virtual human generation device addresses the cost and customization issues of commercial systems by using a search augmented generation-based large-scale language model with an open library, allowing easy setup and efficient query processing for cost-effective and accurate virtual human services.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- KOREA UNIV OF TECH & EDUCATION IND UNIV COOPERATION FOUND
- Filing Date
- 2025-12-02
- Publication Date
- 2026-06-11
AI Technical Summary
Small businesses and organizations face challenges in using commercially provided virtual human services due to high continuous service fees and limited customization, making it difficult to utilize them effectively and efficiently.
A virtual human generation device utilizing a search augmented generation-based large-scale language model with an open library, enabling easy setup and customization, and reducing response times by optimizing query processing through vector storage and distributed computing.
Enables small businesses to establish and continuously use virtual human services at low cost, with customizable and accurate responses, and reduced response times compared to commercial systems.
Smart Images

Figure KR2025020391_11062026_PF_FP_ABST
Abstract
Description
A virtual human generation device utilizing a search augmentation generation-based large-scale language model utilizing an open library, and a method applied thereto
[0001] The present invention relates to a virtual human generation device incorporating a large-scale language model based on search augmented generation utilizing an open library, and a method applied thereto. More specifically, it relates to a virtual human generation device incorporating a large-scale language model based on search augmented generation utilizing an open library, which enables the expansion of universality of virtual human services and the reduction of response time, and a method applied thereto.
[0002] This invention is a research project conducted with funding from the government (Ministry of Science and ICT) and supported by the Institute of Information and Communications Planning and Evaluation - Regional Intelligence Innovation Talent Development Project (IITP-2024-2710008868).
[0003] Typically, a chatbot refers to a robot that converses with users, communicating with them through text and providing the information they want.
[0004] Chatbots are being used in various fields requiring communication because they can provide information to users 24 hours a day through conversation without requiring a human to be present at all times. In the past, these chatbots used a rule-based approach that responded according to fixed rules, which had limitations in providing the exact answers users wanted. However, as they have been improved based on artificial intelligence using Large Language Models (LLM), they are evolving to the point where more natural conversation is possible and the accuracy of the information conveyed is also improved.
[0005] By combining these large-scale language model-based chatbot models with images to provide a virtual human-style chatbot system similar to a real person, the advantage of being able to provide users with a more natural way of communicating, just like communicating with a real person, is being further enhanced, and thus its utilization is increasing in various fields that require communication with users.
[0006] As the use of such virtual human systems increases, AI-related IT companies are competitively developing and producing virtual humans to provide to companies, institutions, or organizations that wish to utilize them for communication with users.
[0007] However, in the case of commercially provided virtual human services by companies supplying virtual human systems, continuous service usage fees are required. Furthermore, since customized service modifications tailored to the specific needs of the user are provided very limitedly or not at all, there are significant difficulties for small businesses or organizations to use them continuously or to utilize them in an optimal state for their intended purposes.
[0008] Therefore, it is necessary to devise a plan that enables small businesses or organizations to easily build virtual human systems at a low cost, and to achieve response speeds exceeding those of commercial virtual human systems provided by specialized IT companies.
[0009] Accordingly, the present invention was created to solve the above-mentioned problems, and the objective of the present invention is to provide a virtual human generation device that applies a large-scale language model based on search augmented generation utilizing an open library, which enables the expansion of the universality of virtual human services and the reduction of response time, and a method applied thereto.
[0010] The purpose of the invention is not limited to the purposes mentioned above, and other unmentioned purposes will be clearly understood by those skilled in the art from the description below.
[0011] A virtual human generation device based on a search augmentation generation using an open library according to the first aspect of the present invention for achieving the above objective, which applies a large-scale language model, comprises a memory and at least one processor, wherein the processor processes an automatic response based on a virtual human for a user's query, extracts an open library query corresponding to the query from a vector storage that stores open library queries optimized for the open library of the large-scale language model, obtains a text answer to the query as a result of applying the extracted open library query to the open library of the large-scale language model, converts the obtained text answer into a voice answer to be output to the virtual human, generates a virtual human image that transmits the converted voice answer by the virtual human, and processes the generated virtual human image as an output response to the query.
[0012] The queries for the above open library can be classified and stored after being pre-trained by artificial intelligence based on a search augmented generative model.
[0013] The above text answer can be converted into the above voice answer, one or more virtual human images related to the above virtual human corresponding to the above text answer can be generated, and the generation of the above virtual human video by combining them can be processed through linkage with one or more virtual human-related open libraries.
[0014] The above text answer can be divided according to predetermined criteria, and for each divided text answer, the generation of one or more virtual human images related to the above virtual human can be divided and processed by a predetermined division processing model.
[0015] The above text response can be divided based on a combination of minimum semantic units and character counts, or based on the number of keywords or estimated response time.
[0016] The above-mentioned segmented processing model generates an estimated processing time based on traffic transmitted and received in real time while linked with the virtual human-related open library, and can process by classifying into an integrated processing process, a flexible processing process, or an individual processing process for each segmented text answer based on the estimated processing time.
[0017] The segmentation processing for the generation of virtual human images for each of the above-mentioned segmented text answers can be distributed by a predetermined distributed computing system.
[0018] The above distributed computing may be executed by being physically or software-separated by two or more client computing resources, or by being physically or software-separated by two or more server computing resources.
[0019] And, according to the second aspect of the present invention for achieving the above objective, a method for generating a virtual human based on a search augmented generation using an open library and applying a large-scale language model comprises: a step of automatically processing a response to a user's query based on a virtual human in a virtual human generation device, and extracting an open library query corresponding to the query from a vector storage storing open library queries optimized for the open library of the large-scale language model; a step of obtaining a text answer to the query as a result of applying the extracted open library query to the open library of the large-scale language model; a step of converting the obtained text answer into a voice answer to be output to the virtual human; a step of generating a virtual human image that transmits the converted voice answer by the virtual human; and a step of processing the generated virtual human image as an output response to the query.
[0020] The method may further include a step of classifying and storing the queries for the above open library as results of prior artificial intelligence training based on a search augmentation generative model.
[0021] The method may further include the step of dividing the above text answer according to predetermined criteria, and for each divided text answer, dividing the generation of one or more virtual human images related to the above virtual human according to a predetermined division processing model.
[0022] The method may further include a step of distributing the division processing for the generation of virtual human images for each of the above-described divided text answers by a predetermined distributed computing.
[0023] Therefore, the present invention has the advantage of enabling small businesses or organizations to easily establish virtual human services or use them continuously at low cost.
[0024] In addition, the present invention has the advantage of being able to be customized to meet the needs of users utilizing virtual human services, and by setting up the initial data learning process according to the characteristics of the users, it is possible to build a system capable of providing accurate answers even in fields of users where data learning is insufficient.
[0025] Furthermore, the present invention makes it possible to provide services with reduced response times even compared to existing commercial virtual human services, and moreover, has the advantage of being able to provide virtual human services with significantly reduced response times even without mobilizing high-performance computing resources.
[0026] The effects of the present invention are not limited to those mentioned above, and other unmentioned effects will be clearly understood by those skilled in the art from the description in the claims.
[0027] FIG. 1 is a drawing including a virtual human generation device according to one embodiment of the present invention.
[0028] Figure 2 is a diagram illustrating a more specific example of the process of linking the virtual human creation device and library of Figure 1.
[0029] Figure 3 is a diagram illustrating, as an example, the process of learning a query for an open library through the virtual human generation device of Figure 1.
[0030] Figure 4 is a diagram illustrating, as an example, the process of divided processing and distributed processing through the virtual human generation device of Figure 1.
[0031] Figure 5 is a diagram showing an example of a screen for selecting a virtual human in the virtual human creation device of Figure 1.
[0032] Figure 6 is a graph showing the response time during split processing through the virtual human generation device of Figure 1.
[0033] Figure 7 is a graph showing the execution time during distributed computing processing through the virtual human generation device of Figure 1.
[0034] And, FIG. 8 is a flowchart illustrating a method for creating a virtual human according to an embodiment of the present invention.
[0035] Hereinafter, embodiments will be described in detail with reference to the attached drawings. However, the scope of the patent application is not limited or restricted by these embodiments. Identical reference numerals in each drawing indicate identical components.
[0036] Various modifications may be made to the embodiments described below. The embodiments described below are not intended to limit the forms of practice and should be understood to include all modifications, equivalents, and substitutions thereof.
[0037] Terms such as "first" or "second" may be used to describe various components, but these terms should be understood solely for the purpose of distinguishing one component from another. For example, a first component may be named a second component, and similarly, a second component may be named a first component.
[0038] The terms used in the embodiments are used merely to describe specific embodiments and are not intended to limit the embodiments. A singular expression includes a plural expression unless the context clearly indicates otherwise. In this specification, phrases such as “A or B,” “at least one of A and B,” “at least one of A or B,” “A, B or C,” “at least one of A, B and C,” and “at least one of A, B, or C” may each include any one of the items listed together with the corresponding phrase, or any possible combination thereof. In this specification, terms such as “comprising” or “having” are intended to indicate the presence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.
[0039] Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by those skilled in the art to which the embodiments pertain. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant technology, and should not be interpreted in an ideal or overly formal sense unless explicitly defined in this application.
[0040] In addition, when describing with reference to the attached drawings, identical components are assigned the same reference numeral regardless of drawing symbols, and redundant descriptions thereof are omitted. When describing the embodiments, if it is determined that a detailed description of related prior art could unnecessarily obscure the essence of the embodiments, such detailed description is omitted.
[0041] A virtual human generation device and a method applied thereto, based on a search augmented generation using an open library of the present invention and a large-scale language model, are configured to enable the expansion of the universality of virtual human services and the reduction of response time.
[0042] Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings.
[0043] FIG. 1 is a drawing including a virtual human generation device according to one embodiment of the present invention.
[0044] As illustrated in FIG. 1, a virtual human generation device (100) with a large-scale language model based on search augmentation generation utilizing an open library includes a memory (110) and at least one processor (120). The processor (120) processes an automatic response based on a virtual human for a user's query, extracts a query for an open library corresponding to the query from a vector storage that stores queries for an open library optimized for the open library of the large-scale language model, obtains a text answer for the query as a result of applying the extracted query for the open library to the open library of the large-scale language model, converts the obtained text answer into a voice answer to be output to a virtual human, generates a virtual human image that conveys the converted voice answer by the virtual human, and processes the generated virtual human image as an output response to the query.
[0045] Here, large-scale language models refer to models such as GPT-2 with 1.5 billion parameters, GPT-3 with 175 billion parameters, or GPT-4 or higher models with more parameters.
[0046] When directly accessing and querying a web service that provides such a large-scale language model, it is possible to present a structure that provides answers to user queries by linking with the model's open library; however, this structure cannot provide accurate answers regarding information not trained in the open library, and if used as is, it may become a factor that reduces the user's need to utilize the virtual human service. Therefore, an additional training process is required to provide accurate answers to user queries regarding specific or untrained fields. During this additional training process, existing large-scale language model training methods require significant resources and time for small businesses or organizations to use. Since such time delays incur associated costs, this can also be a factor that reduces the demand of service providers who wish to use the present invention.
[0047] Accordingly, the present invention stores queries for an open library optimized for a large-scale language model in a vector storage after undergoing a prior training process, and enables the user to obtain an optimally accurate answer in the shortest possible time through the support of the large-scale language model's open library by allowing the user to extract an optimized query for the open library corresponding to the input query and then applying the extracted query for the open library to the large-scale language model's open library.
[0048] Additionally, the virtual human generation device (100) can convert a text response into a voice response, generate one or more virtual human images related to the virtual human corresponding to the text response, and process through linkage with one or more virtual human-related open libraries to execute the generation of a virtual human image by combining them.
[0049] Such a virtual human creation device (100) may be provided in a structure that links with the aforementioned open library through independent processing of a client end such as a PC, or may be provided in a manner that links with the aforementioned open library through a structure linking a client end such as a PC and an external server end, and the method of constructing the virtual human creation device (100) is not limited thereto and may be provided in a more expanded concept from the perspective of a person skilled in the art.
[0050] Figure 2 is a diagram illustrating a more specific example of the process of linking the virtual human creation device and library of Figure 1.
[0051] As illustrated in FIG. 2, the virtual human generation device (100) may include vector storage, thereby enabling the user to extract a query for an open library corresponding to a requested query from the vector storage and to apply the extracted query for the open library to the open library of a large-scale language model. If the aforementioned query is speech rather than text, the process of converting speech into text may be additionally performed.
[0052] An open library of a large-scale language model responding to this can output a text response to a query for the received open library, convert the output text response into a voice response through additional integration with an open library related to virtual humans, and generate a virtual human video based on this.
[0053] Figure 3 is a diagram illustrating, as an example, the process of learning a query for an open library through the virtual human generation device of Figure 1.
[0054] As illustrated in Fig. 3, the queries for the open library described above can be classified by prior artificial intelligence training based on a search augmented generative model and then stored in vector storage.
[0055] For queries that can be entered by the user, if data inequality exists during the process of converting them into queries for the open library, oversampling is performed to resolve the data inequality and convert them into queries for the open library at a level where optimized answer output is possible.
[0056] Figure 4 is a diagram illustrating, as an example, the process of divided processing and distributed processing through the virtual human generation device of Figure 1.
[0057] As illustrated in FIG. 4, the virtual human generation device (100) divides text answers according to predetermined criteria and can divide the generation of one or more virtual human images related to the virtual human for each divided text answer according to a predetermined division processing model.
[0058] In this case, the text response can be divided based on the combination of minimum semantic units and character counts, or based on the number of keywords or the estimated response time.
[0059] In Figure 4, for example, the text response is divided based on the minimum semantic unit and the combination of character counts.
[0060] In addition, the segmented processing model is a model that generates an estimated processing time based on traffic transmitted and received in real time while linked with an open library related to virtual humans, and can process by classifying it into an integrated processing process (integrated processing without segmentation if the estimated processing time is short (e.g., 10 seconds), a flexible processing process (processing through 2 to 3 segmented paths if the estimated processing time is medium (e.g., 15 seconds), or processing by expanding the segmented paths to 5 to 7 if the estimated processing time changes to take longer during the process), or an individual processing process for each segmented text answer (segmented processing by setting corresponding segmented paths for each text answer if the estimated processing time is long (e.g., 20 seconds or more).
[0061] Additionally, the segmentation processing for the generation of virtual human images for each segmented text answer can be distributed by a predetermined distributed computing system.
[0062] Here, distributed computing may be executed by being physically or software-separated by two or more client computing resources, or by being physically or software-separated by two or more server computing resources.
[0063] Figure 5 is a diagram showing an example of a screen for selecting a virtual human in the virtual human creation device of Figure 1.
[0064] As illustrated in FIG. 5, in the virtual human creation device (100), a user or administrator can set a basic image of the virtual human or create a new image depending on the time of the query or the field of the query.
[0065] At this time, the basic image of the virtual human can be generated by taking a photo instantly via the camera and uploading it, or by uploading a photo file taken in advance.
[0066] Figure 6 is a graph showing the response time during partition processing through the virtual human generation device of Figure 1, and Figure 7 is a graph showing the execution time during distributed computing processing through the virtual human generation device of Figure 1.
[0067] With reference to FIG. 6, as a result of experimenting with the response time during splitting processing through the virtual human generation device (100), it can be seen that the response time decreases with the number of split characters and does not exceed 40 seconds, whereas when splitting processing is not applied, the response time increases linearly with the length of the image generation.
[0068] In addition, with reference to FIG. 7, when applying distributed computing through a virtual human generation device (100), the execution time was tested by distinguishing between cases where distributed computing was used and cases where "zombie computing" was not used, and the measurement was taken while increasing the length of the generated video up to a maximum of 25 seconds. As a result of the experiment, it was confirmed that the video was generated about 1.5 times faster in the section of 8 seconds or longer, which is the typical video time generated when producing virtual human videos by applying division in units of 10 characters.
[0069] And, FIG. 8 is a flowchart illustrating a method for creating a virtual human according to an embodiment of the present invention.
[0070] As illustrated in FIG. 8, a method for generating a virtual human using a search augmented generation-based large-scale language model utilizing an open library is performed by inputting a query by a user in a virtual human generation device (100) (S100).
[0071] In step S100, the input query is extracted as a corresponding open library query from a vector storage that stores open library queries optimized for the open library of a large-scale language model (S102).
[0072] The query for the open library extracted in step S102 is applied to the open library of the large-scale language model (S104), and as a result of the application, a text answer for the query entered in step S100 is obtained (S106).
[0073] The text response obtained in step S106 is converted into a voice response to be output as a virtual human (S108).
[0074] In step S108, the converted voice response is generated into a virtual human image that is delivered by a virtual human, and the generated virtual human image is processed to be output as a response to the query in step S100 (S110).
[0075] Afterwards, if the query continues, the previously described steps S100 to S110 are repeated (S112).
[0076] Detailed descriptions of each step described above and descriptions of additional steps shall be in accordance with FIGS. 1 to 7 and the detailed descriptions of these figures.
[0077] Although embodiments of the present invention have been described above with reference to the attached drawings, those skilled in the art will understand that the present invention may be implemented in other specific forms without changing its technical concept or essential features. Therefore, the embodiments described above should be understood as illustrative in all respects and not restrictive.
[0078] Furthermore, since the present invention aims to provide a virtual human generation device and a method applied thereto, which utilize a search augmented generation-based large-scale language model based on an open library that enables the expansion of universality of virtual human services and the reduction of response time, the invention is industrially applicable as it not only has sufficient potential for commercialization or business but is also clearly and practically implementable.
Claims
1. Memory; and It includes at least one processor, A virtual human generation device utilizing a search augmentation generation-based large-scale language model that applies the above processor to a user's query, which processes an automatic response based on a virtual human, extracts an open library query corresponding to the above query from a vector storage that stores open library queries optimized for an open library of a large-scale language model, obtains a text answer to the above query as a result of applying the extracted open library query to the open library of the large-scale language model, converts the obtained text answer into a voice answer to be output to the virtual human, generates a virtual human image to be delivered by the virtual human using the converted voice answer, and processes the generated virtual human image as an output response to the above query.
2. In Paragraph 1, A virtual human generation device that applies a large-scale language model based on search augmented generation utilizing an open library, wherein the queries for the above open library are pre-trained in artificial intelligence, classified, and stored based on a search augmented generation model.
3. In Paragraph 1, A virtual human generation device that applies a large-scale language model based on search augmented generation utilizing an open library, which converts the above text answer into the above voice answer, generates one or more virtual human images related to the above virtual human corresponding to the above text answer, and processes the generation of the above virtual human image by combining them through linkage with one or more virtual human-related open libraries.
4. In Paragraph 3, A virtual human generation device that applies a large-scale language model based on search augmented generation utilizing an open library, which divides the above text answers according to predetermined criteria and divides the generation of one or more virtual human images related to the above virtual human for each divided text answer according to a predetermined division processing model.
5. In Paragraph 4, A virtual human generation device that applies a large-scale language model based on search augmented generation utilizing an open library, which divides the above text response based on combinations of minimum semantic units and character counts, or based on the number of keywords or estimated response time.
6. In Paragraph 4, A virtual human generation device that applies a large-scale language model based on search augmented generation utilizing an open library, wherein the above-mentioned segmented processing model generates an estimated processing time based on traffic transmitted and received in real time while linked with the above-mentioned virtual human-related open library, and processes by classifying into an integrated processing process, a flexible processing process, or an individual processing process for each segmented text answer based on the estimated processing time.
7. In Paragraph 4, A virtual human generation device that applies a large-scale language model based on search augmented generation utilizing an open library for distributed processing of the division of virtual human images for each of the above-described divided text answers by predetermined distributed computing.
8. In Paragraph 7, A virtual human generation device that applies a large-scale language model based on search, augmentation, and generation utilizing an open library, wherein the distributed computing described above is executed physically or software-separated by two or more client computing resources or physically or software-separated by two or more server computing resources.
9. A step in which, in a virtual human generation device, an automatic response processing is performed based on a virtual human for a user's query, and an open library query corresponding to the query is extracted from a vector storage that stores open library queries optimized for an open library of a large-scale language model; A step of obtaining a text answer to the query as a result of applying the extracted query for the open library to the open library of the large-scale language model; A step of converting the acquired text response into a voice response to be output to the virtual human; A step of generating a virtual human image that conveys the converted voice response by the virtual human; and A method for generating a virtual human using a search augmented generation-based large-scale language model utilizing an open library, comprising the step of processing the generated virtual human image as an output in response to the above query.
10. In Paragraph 9, A method for generating a virtual human using a search augmentation generation-based large-scale language model utilizing an open library, further comprising the step of classifying and storing the queries for the open library as results of prior artificial intelligence training based on a search augmentation generation model.
11. In Paragraph 9 or 10, A method for generating a virtual human using a search augmented generation-based large-scale language model utilizing an open library, further comprising the step of dividing the above text answer according to a predetermined standard and, for each divided text answer, processing the generation of one or more virtual human images related to the above virtual human by a predetermined division processing model.
12. In Paragraph 11, A method for generating a virtual human using a search augmentation generation-based large-scale language model utilizing an open library, further comprising the step of distributing the segmentation processing for generating virtual human images for each of the above-mentioned segmented text answers by a predetermined distributed computing.