Information processing device
The information processing device automates preprocessing determination using a learning model, addressing the administrative burden in knowledge bases by estimating and applying preprocessing to new data, thus improving operational efficiency.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- TOYOTA JIDOSHA KK
- Filing Date
- 2024-12-03
- Publication Date
- 2026-06-15
AI Technical Summary
The burden on administrators of knowledge bases is significant due to the need to manually determine preprocessing for raw data before registration, which is often complex and time-consuming.
An information processing device uses a learning model to estimate and apply preprocessing to new data items based on the relationship between data types and preprocessing, reducing the administrative burden by automating this process.
Automated preprocessing reduces the workload on knowledge base administrators by estimating and applying necessary preprocessing to new data, thereby enhancing efficiency and reducing manual intervention.
Smart Images

Figure 2026096335000001_ABST
Abstract
Description
【Technical Field】 【0001】 The present invention relates to the technical field of information processing apparatuses. 【Background Art】 【0002】 As this type of apparatus, for example, an apparatus has been proposed that generates query data based on a document for a language model and uses a pair of the document and the query data for learning a search model for a chatbot (see Patent Document 1). 【Prior Art Documents】 【Patent Documents】 【0003】 【Patent Document 1】 Japanese Unexamined Patent Application Publication No. 2023-076413 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0004】 As a chatbot, a chatbot using a mechanism (Retrieval-Augmented Genration: RAG) that gives an independent information source to a large language model by combining a large language model (Large Language Models: LLM) and search of a specific information source (hereinafter, appropriately referred to as a "knowledge base") has been proposed. Predetermined processed data is registered in the knowledge base. The predetermined processing may include preprocessing of raw data. The content of this preprocessing is often determined by the administrator of the knowledge base. Note that a large language model is a language model constructed using a very large dataset and deep learning technology. 【0005】 The present invention has been made in view of the above circumstances, and an object thereof is to provide an information processing apparatus capable of reducing the burden on the administrator of the knowledge base. 【Means for Solving the Problems】 【0006】 An information processing device according to one aspect of the present invention includes an estimation means for estimating a preprocessing treatment to be applied to a newly registered data item in a database, using a learning model that has learned the relationship between the type of data registered in a database and the type of preprocessing applied to the data, and a proposal means for proposing the estimated preprocessing treatment. 【0007】 An information processing device according to another aspect of the present invention includes estimation means for estimating preprocessing to be applied to a new data to be registered in the database, using a learning model that has learned the relationship between the type of data to be registered in the database and the type of preprocessing applied to the data, and processing means for applying the estimated preprocessing to the data. [Brief explanation of the drawing] 【0008】 [Figure 1] This is a diagram showing the configuration of an information processing system according to an embodiment. [Figure 2] This figure shows an example of the processing applied to data registered in the knowledge base. [Figure 3] This block diagram shows an example of the configuration of a computing device according to the embodiment. [Figure 4] This block diagram shows another example of the configuration of the arithmetic unit according to the embodiment. [Modes for carrying out the invention] 【0009】 <First Embodiment> A first embodiment of the information processing device will be described with reference to Figures 1 to 3. In Figure 1, the information processing system 1 comprises an information processing device 10, a server 20, and a knowledge base 30. The information processing device 10, server 20, and knowledge base 30 are configured to communicate with each other via a network NW. Server 20 is a server for operating a large-scale language model (LLM). For this reason, server 20 may be referred to as an LLM server. Server 20 may be a cloud server. 【0010】 (Chatbot) Server 20 and Knowledge Base 30 may provide a chatbot service using RAG. For example, user U may use the chatbot service via terminal device 50. In this case, user U may operate terminal device 50 to launch an application for using the chatbot service. User U may operate terminal device 50 to enter a question into the input field of the chat application. Here, "question" is not limited to interrogative sentences. For example, "question" may be a sentence that includes expressions such as requests, instructions, or commands, such as "Tell me about ****" or "Answer me about ****". Therefore, "question" is a concept that includes not only sentences in the form of interrogative sentences, but also sentences that include expressions such as requests, instructions, or commands. In other words, "question" may mean a sentence that seeks an answer from the other party. 【0011】 Terminal device 50 may search the knowledge base 30 based on the input question. Terminal device 50 may send first information, which includes the input question and text data as search results from the knowledge base 30, to server 20. Server 20 may input the question and text data included in the first information as a prompt to a large-scale language model. Server 20 may obtain the answer to the question output from the large-scale language model. Server 20 may send second information indicating the answer to terminal device 50. Upon receiving the second information, terminal device 50 may display the answer indicated by the second information on the screen related to the chat application. Terminal device 50 may be a personal computer, a tablet terminal, or a smartphone. 【0012】 (Knowledge Base 30) Knowledge base 30 may contain multiple text data entries. Each of these text data entries may be vectorized text data. In other words, knowledge base 30 may be a vector database / vector store. An example of how to construct knowledge base 30 will be explained with reference to Figure 2. 【0013】 In Figure 2, the data source may include raw data (e.g., documents, images, etc.). When a piece of raw data included in the data source is registered in the knowledge base 30, the administrator M of the knowledge base 30 may specify the preprocessing to be performed on that piece of raw data. For example, if the raw data is a document (in other words, text data), the preprocessing to be performed on the raw data may be synonym conversion, summarization, etc. For example, if the raw data is image data, the preprocessing to be performed on the raw data may be image documentation (e.g., Optical Character Recognition: OCR). 【0014】 In this embodiment, text data is generated as a result of preprocessing the raw data. The text data generated by the preprocessing may be subjected to chunking. This chunking process may divide the text data into multiple fragments of data (i.e., chunks). Specific examples of chunking include dividing the data at fixed lengths, dividing it into sentences based on sentence delimiters, and dividing it based on structure such as Markdown. 【0015】 Subsequently, each of the multiple chunks is converted into a numerical vector. In other words, an embedding is generated based on each of the multiple chunks. Then, the chunks converted into numerical vectors (in other words, the embeddings) are registered in the knowledge base 30. 【0016】 (Information processing device 10) In Figure 1, the information processing device 10 comprises an arithmetic unit 11, a storage device 12, a communication device 13, an input device 14, and an output device 15. The arithmetic unit 11, storage device 12, communication device 13, input device 14, and output device 15 are connected via a data bus 16. The information processing device 10 may be a personal computer, a tablet terminal, or a smartphone. 【0017】 The arithmetic unit 11 may have a processor. Note that the arithmetic unit 11 may have a single processor or a plurality of processors. That is, the arithmetic unit 11 may have one or more processors. Note that the processor may be a multi-core processor. When the arithmetic unit 11 has a single processor that is a multi-core processor, it can be said that the arithmetic unit 11 logically has a plurality of processors. 【0018】 The processor may be at least one of, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an FPGA (Field Programmable Gate Array), and a TPU (Tensor Processing Unit). 【0019】 The storage device 12 may be at least one of, for example, a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk device, a magneto-optical disk device, an SSD (Solid State Drive), and an optical disk array. That is, the storage device 12 may be realized by a single device or by a plurality of devices. 【0020】 The communication device 13 may be able to communicate with a device external to the information processing device 10. Note that the communication device 13 may perform wired communication or wireless communication. 【0021】 The input device 14 is a device capable of receiving input of information to the information processing device 10 from the outside. The input device 14 may include an operating device (e.g., keyboard, mouse, touch panel, etc.) that can be operated by the user of the information processing device 10. The input device 14 may include a recording medium reader capable of reading information recorded on a detachable recording medium such as a USB (Universal Serial Bus) memory for the information processing device 10. When information is input to the information processing device 10 via the communication device 13 (in other words, when the information processing device 10 acquires information via the communication device 13), the communication device 13 may function as an input device. 【0022】 The output device 15 is a device capable of outputting information to the outside of the information processing device 10. The output device 15 may have a display device capable of outputting visual information such as characters and images as the above information. The output device 15 may also have a speaker capable of outputting auditory information such as sound as the above information. The output device 15 may also have a vibration motor capable of outputting tactile information such as vibration as the above information. The output device 15 may have a printer. The output device 15 may be capable of outputting information to a detachable recording medium such as a USB memory for the information processing device 10. When the information processing device 10 outputs information via the communication device 13, the communication device 13 may function as an output device. 【0023】 The storage device 12 can store desired data. The storage device 12 may store a computer program CP executed by the arithmetic device 11. The storage device 12 may temporarily store data temporarily used by the arithmetic device 11 when the arithmetic device 11 is executing the computer program CP. 【0024】 Furthermore, the computer program CP may be recorded on a non-temporary recording medium that is readable by a computer. In this case, the computer program CP may be stored in the storage device 12 by reading the recording medium using a recording medium reading device (not shown) provided by the information processing device 10. Furthermore, at least one of the following may be used as the recording medium: an optical disc, a magnetic medium, a magneto-optical disc, a semiconductor memory, and any other medium capable of storing a program. Furthermore, the computer program CP may be obtained from an external device (not shown) of the information processing device 10 via a communication device 13. In other words, the computer program CP may be downloaded from an external device to the storage device 12 of the information processing device 10. 【0025】 The arithmetic unit 11 (for example, a processor) may execute the processing that the information processing device 10 should perform together with the memory device 12 in which the computer program CP is stored (in other words, together with the memory device 12 and the computer program CP stored in the memory device 12). For example, by the arithmetic unit 11 executing the computer program CP, a logical functional block for executing the processing that the information processing device 10 should perform may be realized within the arithmetic unit 11 (for example, within the processor). 【0026】 As described above, when administrator M specifies the preprocessing to be applied to raw data, the burden on administrator M is considerable. Therefore, the information processing device 10 according to this embodiment uses a learning model that has learned the relationship between the type of raw data registered in the knowledge base 30 and the type of preprocessing applied to said raw data to estimate the preprocessing to be applied to a new piece of raw data to be registered in the knowledge base 30. 【0027】 Furthermore, the above-mentioned learning model may be a rule-based model based on the type of raw data and the type of preprocessing applied to said raw data. The above-mentioned learning model may be a model constructed by machine learning using training data that shows combinations of the type of raw data and the type of preprocessing applied to said raw data. 【0028】 As shown in Figure 3, the arithmetic unit 11 of the information processing device 10 has an estimation unit 111 and a proposal unit 112 in order to perform the above estimation. The estimation unit 111 and the proposal unit 112 may be implemented as the logical functional blocks described above. At least one of the estimation unit 111 and the proposal unit 112 may be implemented as a physical processing circuit. Alternatively, at least one of the estimation unit 111 and the proposal unit 112 may be implemented in a form in which a logical functional block and a physical processing circuit are mixed. 【0029】 For example, the estimation unit 111 may input a single raw data point included in the data source into the learning model. The learning model, having received the single raw data point, may output information indicating the preprocessing to be applied to the single raw data point. Based on the information output from the learning model, the estimation unit 111 may estimate the preprocessing to be applied to the single raw data point. The suggestion unit 112 may control the output device 15 to suggest the information indicating the preprocessing estimated by the estimation unit 111 to the administrator of the knowledge base 30 (for example, administrator M). 【0030】 (Technical effects) The information processing device 10 according to this embodiment estimates the preprocessing that should be applied to the raw data registered in the knowledge base 30 and proposes the estimated preprocessing. With this configuration, for example, the administrator of the knowledge base 30 (for example, administrator M) can reduce the burden of considering the preprocessing that should be applied to the raw data. Therefore, the information processing device 10 can reduce the burden on the administrator of the knowledge base 30. 【0031】 <Second Embodiment> A second embodiment of the information processing device will be described with reference to Figures 1 and 4. The second embodiment is the same as the first embodiment described above, except that the configuration of the information processing device is slightly different. For this reason, explanations of the second embodiment that overlap with those of the first embodiment described above will be omitted as appropriate. 【0032】 As shown in Figure 4, the arithmetic unit 11 of the information processing device 10 according to the second embodiment has an estimation unit 111 and a processing unit 113 in order to perform the estimation described above. The estimation unit 111 and the processing unit 113 may be implemented as the logical functional blocks described above. At least one of the estimation unit 111 and the processing unit 113 may be implemented as a physical processing circuit. Alternatively, at least one of the estimation unit 111 and the processing unit 113 may be implemented in a form in which a logical functional block and a physical processing circuit are mixed. 【0033】 For example, the estimation unit 111 may input a single raw data point included in the data source into the learning model. The learning model, having received the single raw data point, may output information indicating the preprocessing to be applied to the single raw data point. Based on the information output from the learning model, the estimation unit 111 may estimate the preprocessing to be applied to the single raw data point. The processing unit 113 may apply the preprocessing estimated by the estimation unit 111 to the single raw data point. 【0034】 (Technical effects) The information processing device 10 according to this embodiment estimates the preprocessing that should be applied to the raw data registered in the knowledge base 30, and applies the estimated preprocessing to the raw data. With this configuration, preprocessing is automatically applied to the raw data. Therefore, the information processing device 10 can reduce the burden on the administrator of the knowledge base 30. 【0035】 Various aspects of the invention derived from the embodiments described above are described below. 【0036】 An information processing device according to one aspect of the invention includes an estimation means for estimating a preprocessing to be applied to a new data to be registered in the database, using a learning model that has learned the relationship between the type of data to be registered in the database and the type of preprocessing applied to the data, and a suggestion means for suggesting the estimated preprocessing. In the above embodiment, "knowledge base 30" corresponds to an example of a "database", "estimation unit 111" corresponds to an example of an "estimation means", and "suggestion unit 112" corresponds to an example of a "suggestion means". 【0037】 An information processing device according to another aspect of the invention includes estimation means for estimating preprocessing to be applied to a new data to be registered in the database, using a learning model that has learned the relationship between the type of data to be registered in the database and the type of preprocessing applied to the data, and processing means for applying the estimated preprocessing to the data. In the above embodiment, the "processing unit 113" corresponds to an example of the "processing means". 【0038】 In the information processing device according to the above embodiment, the database may be a database applied to search extension generation. 【0039】 The present invention is not limited to the embodiments described above, and can be modified as appropriate without contradicting the gist or idea of the invention as can be read from the claims and specification as a whole. Information processing devices that involve such modifications are also included within the technical scope of the present invention. [Explanation of symbols] 【0040】 1... Information processing system, 10... Information processing device, 20... Server, 30... Knowledge base, 111... Estimation unit, 112... Extraction unit, 113... Processing unit
Claims
[Claim 1] An estimation means for estimating the preprocessing to be applied to a new data entry in the database, using a learning model that has learned the relationship between the type of data registered in the database and the type of preprocessing applied to the data; A proposal means for proposing the estimated preprocessing, An information processing device equipped with the following features. [Claim 2] An estimation means for estimating the preprocessing to be applied to a new data entry in the database, using a learning model that has learned the relationship between the type of data registered in the database and the type of preprocessing applied to the data; Processing means for applying the estimated preprocessing to the first data, An information processing device equipped with the following features. [Claim 3] The aforementioned database is the database to which search extension generation is applied. The information processing apparatus according to claim 1 or 2.