Information processing device
The information processing device addresses the issue of large graph structures in large language models by performing spleness analysis to exclude irrelevant image data regions, improving learning and inference accuracy.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- TOYOTA JIDOSHA KK
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-26
AI Technical Summary
Large language models using deep learning and image data can generate unnecessarily large graph structures due to reflection of unnecessary information, affecting learning and inference accuracy.
An information processing device performs a spleness analysis on image data to generate a graph structure only for regions with higher than a predetermined threshold, excluding regions with lower spleness, thereby suppressing the graph structure from becoming unnecessarily large.
The device effectively suppresses the inclusion of unnecessary information in the graph structure, enhancing learning and inference accuracy by focusing on relevant regions.
Smart Images

Figure 2026105644000001_ABST
Abstract
Description
Technical Field
[0001] The present invention relates to the technical field of information processing devices.
Background Art
[0002] As this type of device, for example, a system has been proposed in which a large language model (LLM) generates query data based on a document, and a pair of the document and the query data is used for learning a search model for a chatbot (see Patent Document 1).
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] A large language model is a language model constructed using a very large dataset and deep learning technology. For example, the dataset may include image data. By the way, as a method for extracting features of image data, a method has been proposed in which a graph structure is generated based on the image data, and the features of the image data are extracted based on the generated graph structure. For example, when a graph structure is simply generated based on image data, unnecessary information may be reflected in the graph structure. In this case, the graph structure may become unnecessarily large. Furthermore, when the graph structure is used for model learning (in other words, AI (Artificial Intelligence) learning), it may affect learning and inference accuracy.
[0005] The present invention has been made in view of the above circumstances, and an object thereof is to provide an information processing device capable of suppressing the graph structure from becoming unnecessarily large. [Means for solving the problem]
[0006] An information processing device according to one aspect of the present invention comprises an analysis means for performing a spleness analysis process on image data, and a generation means for generating a graph structure relating to a region of the image data whose spleness is higher than a predetermined threshold, based on the results of the spleness analysis process. [Brief explanation of the drawing]
[0007] [Figure 1] This block diagram shows an example of the configuration of an information processing device according to the embodiment. [Figure 2] This block diagram shows an example of the configuration of a computing device according to the embodiment. [Figure 3] This is a conceptual diagram showing an example of the operation of the information processing device according to the embodiment. [Figure 4] This figure shows an example of a display screen. [Modes for carrying out the invention]
[0008] Embodiments relating to the information processing device will be described with reference to Figures 1 to 4. In Figure 1, the information processing device 10 includes an arithmetic unit 11, a storage device 12, a communication device 13, an input device 14, and an output device 15. The arithmetic unit 11, the storage device 12, the communication device 13, the input device 14, and the output device 15 are connected via a data bus 16.
[0009] The arithmetic unit 11 may have a processor. The arithmetic unit 11 may have a single processor or multiple processors. In other words, the arithmetic unit 11 may have one or more processors. Furthermore, the processor may be a multi-core processor. If the arithmetic unit 11 has a single processor that is a multi-core processor, then logically, the arithmetic unit 11 can be said to have multiple processors.
[0010] The processor may be at least one of the following: CPU (Central Processing Unit), GPU (Graphics Processing Unit), FPGA (Field Programmable Gate Array), and TPU (Tensor Processing Unit).
[0011] The storage device 12 may be at least one of the following: RAM (Random Access Memory), ROM (Read Only Memory), hard disk drive, magneto-optical disk drive, SSD (Solid State Drive), and optical disk array. In other words, the storage device 12 may be implemented by a single device or by multiple devices.
[0012] The communication device 13 may be capable of communicating with devices outside the information processing device 10. The communication device 13 may use either wired or wireless communication.
[0013] The input device 14 is a device capable of receiving information input to the information processing device 10 from an external source. The input device 14 may include an operating device (e.g., a keyboard, mouse, touch panel, etc.) that can be operated by the user of the information processing device 10. The input device 14 may include a recording medium reader capable of reading information recorded on a recording medium that can be attached to and detached from the information processing device 10, such as a USB (Universal Serial Bus) memory. When information is input to the information processing device 10 via the communication device 13 (in other words, when the information processing device 10 acquires information via the communication device 13), the communication device 13 may function as an input device.
[0014] The output device 15 is a device capable of outputting information to the outside of the information processing device 10. The output device 15 has a display device 151 capable of outputting visual information such as characters and images as the above information. The output device 15 may also have a speaker capable of outputting auditory information such as sound as the above information. The output device 15 may also have a vibration motor capable of outputting tactile information such as vibration as the above information. The output device 15 may also have a printer. The output device 15 may be capable of outputting information to a recording medium that can be attached to and detached from the information processing device 10, such as a USB memory stick. When the information processing device 10 outputs information via the communication device 13, the communication device 13 may function as an output device.
[0015] The storage device 12 is capable of storing desired data. The storage device 12 may store the computer program CP that the arithmetic unit 11 will execute. The storage device 12 may temporarily store data that the arithmetic unit 11 will use temporarily when the arithmetic unit 11 is executing the computer program CP.
[0016] Furthermore, the computer program CP may be recorded on a non-temporary recording medium that is readable by a computer. In this case, the computer program CP may be stored in the storage device 12 by reading the recording medium using a recording medium reading device (not shown) provided by the information processing device 10. Furthermore, at least one of the following may be used as the recording medium: an optical disc, a magnetic medium, a magneto-optical disc, a semiconductor memory, and any other medium capable of storing a program. Furthermore, the computer program CP may be obtained from an external device (not shown) of the information processing device 10 via a communication device 13. In other words, the computer program CP may be downloaded from an external device to the storage device 12 of the information processing device 10.
[0017] The arithmetic unit 11 (for example, a processor) may execute the processing that the information processing apparatus 10 should perform, together with the storage device 12 in which the computer program CP is stored (in other words, together with the storage device 12 and the computer program CP stored in the storage device 12). For example, by executing the computer program CP, a logical functional block for executing the processing that the information processing apparatus 10 should perform may be realized in the arithmetic unit 11 (for example, within the processor).
[0018] As shown in FIG. 2, the arithmetic unit 11 includes an analysis unit 111, a generation unit 112, and a modification unit 113. The analysis unit 111, the generation unit 112, and the modification unit 113 may be realized as the above-described logical functional blocks. Note that at least one of the analysis unit 111, the generation unit 112, and the modification unit 113 may be realized as a physical processing circuit. At least one of the analysis unit 111, the generation unit 112, and the modification unit 113 may be realized in a form in which logical functional blocks and physical processing circuits are mixed.
[0019] The operation of the information processing apparatus 10 will be described with reference to FIG. 3. For example, the analysis unit 111 of the information processing apparatus 10 performs a saliency analysis process on the image data related to the image Img shown in FIG. 3. Note that various existing modes can be applied to the saliency analysis process. Therefore, a detailed description of the saliency analysis process will be omitted. Note that the image data may be the image data included in the image data set used for the learning of the model.
[0020] The generation unit 112 of the information processing apparatus 10 may generate a mask image MI in which regions with saliency lower than a predetermined threshold are masked based on the result of the saliency analysis process. That is, in the mask image MI, regions with saliency higher than the predetermined threshold are not masked. Note that when the saliency is "equal" to the predetermined threshold, it may be treated as either case. Note that the mask image MI may also be referred to as a saliency map.
[0021] The generation unit 112 generates a graph structure (e.g., graph structure GS) based on the mask image MI. That is, the generation unit 112 generates a graph structure for the area not masked in the mask image MI (in other words, the area with saliency higher than a predetermined threshold). Note that the graph structure may mean data composed of a group of nodes representing the relationship between parts of an object in an image related to one image data and a group of edges indicating the relationship between the nodes. Note that various existing modes can be applied to the method of generating the graph structure. Therefore, a detailed description of the method of generating the graph structure is omitted.
[0022] For example, after the saliency analysis process is performed on the image data and before the mask image MI is generated, the arithmetic unit 11 may control the display device 151 to display the image 200 shown in FIG. 4. The image 200 includes an area 201 for displaying a preview image and a slider 202. When the user of the information processing apparatus 10 changes the knob 202a of the slider 202 via the input device 14, the predetermined threshold may be changed. Specifically, the change unit 113 of the information processing apparatus 10 may change the predetermined threshold according to the position of the knob 202a on the slider 202. When the threshold is changed, the area masked in the mask image (e.g., mask image MI) changes. The arithmetic unit 11 may control the display device 151 to display a preview of the mask image generated when the predetermined threshold is changed. When the user selects the button 203 included in the image 200 via the input device 14, the generation unit 112 may generate a mask image in which an area with saliency lower than the threshold changed by the user is masked.
[0023] (Technical effect) In this embodiment, the analysis unit 111 performs spleness analysis on the image data. Then, the generation unit 112 generates a graph structure relating to regions of the image data where the spleness is higher than a predetermined threshold, based on the results of the spleness analysis. In other words, information about regions where the spleness is lower than the predetermined threshold is not included in the generated graph structure. Here, if the image data is used for model training, regions where the spleness is higher than the predetermined threshold can be said to be regions with relatively high relevance to training. Conversely, regions where the spleness is lower than the predetermined threshold can be said to be regions with relatively low relevance to training. Therefore, information about regions where the spleness is lower than the predetermined threshold can be said to be unnecessary information for model training. As described above, the generation unit 112 generates a graph structure relating to regions where the spleness is higher than the predetermined threshold. For this reason, the information processing device 10 according to this embodiment can suppress the reflection of unnecessary information in the graph structure. As a result, the information processing device 10 can suppress the graph structure from becoming unnecessarily large.
[0024] The embodiments of the invention derived from the above-described embodiments are described below.
[0025] An information processing device according to one aspect of the invention comprises an analysis means for performing a spleness analysis process on image data, and a generation means for generating a graph structure relating to regions in the image data whose spleness is higher than a predetermined threshold, based on the results of the spleness analysis process. In the above-described embodiment, the "analysis unit 111" corresponds to an example of the "analysis means," and the "generation unit 112" corresponds to an example of the "generation means."
[0026] In the information processing apparatus according to the above embodiment, the generation means may generate a mask image in which regions included in the image in which the prominentness is lower than the predetermined threshold are masked, and may generate the graph structure based on the mask image. With this configuration, a graph structure relating to regions in which the prominentness is higher than the predetermined threshold can be generated for comparative application.
[0027] The information processing device according to the above embodiment may include a receiving means for receiving user input, and a changing means for changing the predetermined threshold in accordance with the user input received by the receiving means. With this configuration, the user can adjust the threshold relatively easily, which is advantageous in practice. In the above embodiment, the "input device 14" corresponds to an example of the "receiving means," and the "changing unit 113" corresponds to an example of the "changing means."
[0028] The present invention is not limited to the embodiments described above, and can be modified as appropriate without contradicting the gist or idea of the invention as can be read from the claims and specification as a whole. Information processing devices and other devices that undergo such modifications are also included within the technical scope of the present invention. [Explanation of Symbols]
[0029] 10...Information processing unit, 11...Calculation unit, 12...Storage device, 13...Communication device, 14...Input device, 15...Output device, 111...Analysis unit, 112...Generation unit, 113...Modification unit
Claims
1. An analytical means for applying sampling analysis processing to image data, A generation means that generates a graph structure relating to a region of the image data whose sampling is higher than a predetermined threshold, based on the results of the sampling analysis process, An information processing device equipped with the following features.
2. The generating means is A mask image is generated in which the region of the aforementioned image in which the significance is lower than the predetermined threshold is masked. The graph structure is generated based on the mask image. The information processing apparatus according to claim 1.
3. A means of receiving user input, A change means for changing the predetermined threshold in response to the user input received by the reception means, The information processing apparatus according to claim 1 or 2, comprising: