Model generation method, model generation device, inference device, and storage medium

By introducing a combined model of compression and inference modules into image processing, the inference accuracy of image sub-regions is improved by utilizing extended region information. This solves the problem of insufficient utilization of external feature information of sub-regions in existing technologies and achieves more efficient image processing results.

CN116612301BActive Publication Date: 2026-06-23TOYOTA JIDOSHA KK

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TOYOTA JIDOSHA KK
Filing Date
2022-12-19
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies struggle to effectively utilize external feature information of sub-regions to improve image inference accuracy, resulting in insufficient accuracy in sub-region inference processing.

Method used

The reasoning model, consisting of a compression module and an inference module, obtains extended region information from the input image, compresses it to generate compressed information, combines it with sub-region information to perform inference, and trains the model to match the correct answer.

Benefits of technology

It improves the inference accuracy of image sub-regions, enabling more accurate inference of features within sub-regions and enhancing the efficiency and accuracy of image processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116612301B_ABST
    Figure CN116612301B_ABST
Patent Text Reader

Abstract

The model generation method according to the present disclosure is an information processing method executed by a computer, and includes machine learning that implements an inference model using a plurality of training images. The inference model includes a compression module and an inference module configured to infer a solution to a task for a sub-region in an input image. The compression module is configured to generate compressed information by compressing information about an extended region that includes the sub-region and is wider than the sub-region. The inference module is configured to derive the solution to the task from information about the sub-region and the compressed information obtained by the compression module.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to a model generation method, a model generation apparatus, an inference apparatus, and a storage medium. Background Technology

[0002] Japanese Unexamined Patent Application Publication No. 2019-8383 (JP2019-8383A) discloses an image processing apparatus that performs image processing using information from multi-resolution representation. Specifically, the image processing apparatus uses a first convolutional neural network to convert an input image into a first feature value. The image processing apparatus uses a second convolutional neural network to convert the input image into a second feature value. Furthermore, the image processing apparatus uses a third convolutional neural network to convert a third feature value, generated by adding the first and second feature values, into an output image. Summary of the Invention

[0003] This disclosure provides a technique for improving the inference accuracy of sub-regions in an input image.

[0004] A first aspect of this disclosure relates to a model generation method, which is an information processing method performed by a computer. The information processing method includes: acquiring a plurality of training images; and using the acquired plurality of training images to perform machine learning on an inference model. The inference model includes a compression module and an inference module, the inference module being configured to infer a solution to a task for a sub-region in an input image. The compression module is configured to generate compressed information by acquiring information from the input image about an extended region that includes the sub-region and is wider than the sub-region, and compressing the acquired information about the extended region. The inference module is configured to deduce a solution to the task from the information about the sub-region obtained from the input image and the compressed information obtained by the compression module. Performing machine learning includes training the inference model such that the inference result obtained by the inference model by inputting each of the plurality of training images as an input image matches the correct answer to the task for a sub-region in each of the plurality of training images.

[0005] A second aspect of this disclosure relates to a model generation apparatus. The model generation apparatus includes a controller. The controller is configured to perform: acquiring a plurality of training images, and implementing machine learning of an inference model using the acquired plurality of training images. The inference model includes a compression module and an inference module, the inference module being configured to infer a solution to a task for a sub-region in an input image. The compression module is configured to generate compressed information by acquiring information from the input image about an extended region that includes the sub-region and is wider than the sub-region, and compressing the acquired information about the extended region. The inference module is configured to deduce a solution to the task from the information about the sub-region obtained from the input image and the compressed information obtained by the compression module. Performing machine learning includes training the inference model such that the inference result obtained by the inference model by inputting each of the plurality of training images as an input image matches the correct answer to the task for a sub-region in each of the plurality of training images.

[0006] A third aspect of this disclosure relates to an inference apparatus. The inference apparatus includes a controller. The controller is configured to perform: acquiring a target image, and inferring a solution to a task for the acquired target image using an inference model trained via machine learning. The inference model includes a compression module and an inference module, the inference module being configured to infer a solution to a task for a sub-region in an input image. The compression module is configured to generate compressed information by acquiring information from the input image about an extended region that includes the sub-region and is wider than the sub-region, and compressing the acquired information about the extended region. The inference module is configured to deduce a solution to the task from the information about the sub-region obtained from the input image and the compressed information obtained by the compression module. Inferring a solution to a task for the target image includes: acquiring a result obtained by inputting the target image as an input image into the trained inference model, and inferring a solution to the task from the trained inference model.

[0007] A fourth aspect of this disclosure relates to a storage medium storing an inference program. This inference program enables a computer to perform an information processing method. The information processing method includes: acquiring a target image; and inferring a solution to a task for the acquired target image using an inference model trained via machine learning. The inference model includes a compression module and an inference module, the inference module being configured to infer a solution to a task for a sub-region in an input image. The compression module is configured to generate compressed information by acquiring information about an extended region that includes the sub-region and is wider than the sub-region from the input image and compressing the acquired information about the extended region. The inference module is configured to derive a solution to the task from the information about the sub-region obtained from the input image and the compressed information obtained by the compression module. Inferring a solution to a task for the target image includes: acquiring a result obtained by inputting the target image as an input image into the trained inference model, and inferring a solution to the task from the trained inference model.

[0008] According to this disclosure, the inference accuracy of sub-regions in an input image can be improved. Attached Figure Description

[0009] The features, advantages, and technical and industrial significance of exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which like reference numerals denote like elements, and wherein:

[0010] Figure 1 An example of a scenario in which this disclosure is applied is illustrated schematically;

[0011] Figure 2 An example of the configuration of the inference model according to an embodiment is illustrated schematically;

[0012] Figure 3 An example of the computational processing of the compression module according to an embodiment is illustrated schematically;

[0013] Figure 4 An example of the hardware configuration of the model generation apparatus according to an embodiment is illustrated schematically;

[0014] Figure 5 An example of the hardware configuration of the inference device according to an embodiment is illustrated schematically;

[0015] Figure 6 An example of the software configuration of the model generation apparatus according to an embodiment is illustrated schematically;

[0016] Figure 7 An example of the software configuration of the inference device according to an embodiment is illustrated schematically;

[0017] Figure 8 This is a flowchart illustrating an example of the processing procedure of a model generation apparatus according to an embodiment;

[0018] Figure 9 This is a flowchart illustrating an example of the processing procedure of the inference device according to an embodiment;

[0019] Figure 10 The inference results for the comparative examples of the evaluation images are shown; and

[0020] Figure 11 The inference results for an example of an evaluation image are shown. Detailed Implementation

[0021] By utilizing methods in existing technologies, it is expected that information from multi-resolution representations can be used to improve the accuracy of inference processing of input images. On the other hand, for example, the input image can be divided into sub-regions, and due to limitations such as the size of the input image and the computing power of the computer, inference processing can be performed on sub-regions. As a result, the scope involved in inference processing can be reduced, thereby reducing the computational load and improving efficiency.

[0022] However, features appearing in a sub-region may be related to features existing outside that sub-region. For example, consider a scenario where the curvature of each part of an object is estimated. In this scenario, if the target part in the sub-region has a gentle curvature, the shape of the object extending from the outside of the sub-region to the target part may be useful for estimating the curvature of the target part.

[0023] In other words, the information used to infer features appearing in a sub-region may exist outside that sub-region. While existing methods can reference information within a sub-region in a multi-resolution representation, it is difficult to reference information existing outside that sub-region. Consequently, when performing inference processing on a sub-region, there is a possibility of degraded inference accuracy.

[0024] On the other hand, the model generation method according to one aspect of this disclosure is an information processing method executed by a computer, the information processing method comprising: acquiring a plurality of training images; and using the acquired plurality of training images to perform machine learning on an inference model. The inference model includes a compression module and an inference module, the inference module being configured to infer a solution to a task for a sub-region in an input image. The compression module is configured to generate compressed information by acquiring information from the input image about an extended region that includes the sub-region and is wider than the sub-region, and compressing the acquired information about the extended region. The inference module is configured to deduce a solution to the task from the information about the sub-region obtained from the input image and the compressed information obtained by the compression module. Performing machine learning includes training the inference model such that the inference result obtained by the inference model by inputting each of the plurality of training images as an input image matches the correct answer to the task for a sub-region in each of the plurality of training images.

[0025] Furthermore, according to another aspect of this disclosure, the inference apparatus includes a controller configured to perform: acquiring a target image, and inferring a solution to a task for the acquired target image using an inference model trained via machine learning. The inference model includes a compression module and an inference module, the inference module being configured to infer a solution to a task for a sub-region in the input image. The compression module is configured to generate compressed information by acquiring information about an extended region that includes the sub-region and is wider than the sub-region from the input image and compressing the acquired information about the extended region. The inference module is configured to deduce a solution to the task from the information about the sub-region obtained from the input image and the compressed information obtained by the compression module. Inferring a solution to the task for the target image includes: acquiring a result obtained by inputting the target image as an input image into the trained inference model, and inferring a solution to the task from the trained inference model.

[0026] According to each aspect of this disclosure, in the inference process, the inference model is configured to reference, in addition to information about sub-regions in the input image, compressed information obtained by the compression module as information about an extended region including regions outside the sub-regions. As a result, features appearing in the sub-regions can be inferred based on features existing outside the sub-regions, thereby potentially improving the accuracy of the inference process. Therefore, using the model generation method according to one aspect of this disclosure, a trained inference model capable of performing inference processing with high accuracy can be generated. Using the inference apparatus according to another aspect of this disclosure, the accuracy of inference processing for sub-regions in the input image can be improved by using such a trained inference model.

[0027] In the following description, embodiments relating to one aspect of this disclosure (hereinafter also referred to as "this embodiment") will be described with reference to the accompanying drawings. It should be noted that the embodiments described below are merely examples of this disclosure in all respects. Various modifications or variations can be made without departing from the scope of this disclosure. Specific configurations according to the embodiments may be appropriately adopted when implementing this disclosure. It should be noted that the data appearing in this embodiment is described in natural language, but more specifically, the data is specified in pseudo-language, commands, parameters, machine language, etc., that can be recognized by a computer.

[0028] 1 Application Example

[0029] Figure 1 An example of a scenario in which this disclosure is applied is illustrated schematically. The inference system according to this embodiment includes a model generation device 1 and an inference device 2.

[0030] According to this embodiment, the model generation apparatus 1 is one or more computers configured to generate an inference model 5 that has been trained by implementing machine learning. Specifically, the model generation apparatus 1 acquires a plurality of training images 30. The model generation apparatus 1 uses the acquired plurality of training images 30 to implement machine learning on the inference model 5. As a result, the model generation apparatus 1 generates a trained inference model 5.

[0031] Figure 2 An example configuration of the inference model 5 according to this embodiment is schematically shown. The inference model 5 according to this embodiment includes a compression module 50 and an inference module 55 for inferring a solution to a task for a sub-region in an input image 6. The compression module 50 is configured to obtain extended region information 61 from the input image 6, which includes the sub-region and is wider than the sub-region. Further, the compression module 50 is configured to generate compressed information 65 by compressing the obtained extended region information 61. The inference module 55 is configured to derive a solution to the task from the sub-region information 60 obtained from the input image 6 and the compressed information 65 obtained by the compression module 50.

[0032] like Figure 1 As shown, implementing machine learning includes training an inference model 5 such that the inference result obtained by the inference model 5 by inputting each training image 30 as an input image 6 matches the correct answer for a task in a sub-region of each training image 30. Through this machine learning, an inference model 5 that has been trained and has acquired the ability to perform inference tasks on sub-regions in images can be generated.

[0033] On the other hand, such as Figure 1 As shown, the inference device 2 according to this embodiment is one or more computers configured to infer solutions to a task for a sub-region in an image using a trained inference model 5. Specifically, the inference device 2 acquires a target image 221. The inference device 2 infers solutions to a task for the acquired target image 221 using an inference model 5 trained via machine learning.

[0034] like Figure 1 and Figure 2 As shown, the inference of the solution to the task for the target image 221 includes: obtaining the result by inputting the target image 221 as the input image 6 into the trained inference model 5, and inferring the solution to the task from the trained inference model 5. As a result of this operation, the inference device 2 can obtain the result obtained by inferring the solution to the task for the target image 221. Furthermore, the inference device 2 outputs information about the result obtained by inferring the solution to the task.

[0035] As described above, in this embodiment, the inference model 5 includes a compression module 50. Consequently, in the inference processing of the inference module 55, in addition to referring to the sub-region information 60 of the input image 6, the inference model 5 can also refer to information (compressed information 65) about an extended region including the region outside the sub-region. As a result, features appearing in the sub-region can be inferred based on features existing outside the sub-region, thereby potentially improving the accuracy of the inference processing. Therefore, according to the model generation apparatus 1 of this embodiment, a trained inference model 5 capable of performing inference processing with high accuracy can be generated. Using the inference apparatus 2 of this embodiment, by using this trained inference model 5, the accuracy of inference processing for sub-regions in the input image 6 (target image 221) can be improved.

[0036] It should be noted that the data format of the images (training image 30 and target image 221) is not particularly limited, but can be appropriately selected according to the embodiments. The images can consist of general image data including multiple pixels, or they can consist of data that can be output in an image format (such as point cloud data, computer-aided design (CAD) data, map data, and simulation data).

[0037] Images can be acquired by sensors such as cameras, light detection and ranging (Light Detection and Ranging, Laser Imaging Detection and Ranging, LiDAR), millimeter-wave radar, infrared sensors, or ultrasonic sensors. Images can be generated by computer operations. Images can be generated through computer computational processing, such as simulations (e.g., fluid, temperature, heat flow rate, and strain) or computer-aided engineering (CAE). Images can be obtained by simulating sensor operations. Training images 30 can be obtained by applying any computational processing (e.g., processing related to data expansion) to data obtained by any method. Processing related to data expansion can be, for example, rotation or translation. Images can be configured to represent two-dimensional or three-dimensional space.

[0038] The compression module 50 is not particularly limited, as long as it can perform the computational processing of generating compressed information 65 from the extended region information 61, and can be appropriately configured according to embodiments. In one example, the compression module 50 may be constituted by a machine learning model including one or more parameters for performing the computational processing of generating compressed information 65 from the extended region information 61, wherein the one or more parameters have values ​​adjusted by machine learning. The type of machine learning model constituting the compression module 50 is not particularly limited, and can be appropriately selected according to embodiments. For example, a neural network can be used for the machine learning model constituting the compression module 50. The structure of the neural network (e.g., the number of layers, the type of each layer, the number of nodes included in each layer, or the connection relationships between nodes in the layers) can be appropriately determined according to embodiments. When a neural network is used for the compression module 50, the weights of the connections between each node, the threshold of each node, etc., are examples of parameters of the compression module 50.

[0039] Furthermore, the inference module 55 is not particularly limited, as long as it can perform computational processing to derive the solution from the sub-region information 60 and the compressed information 65, and can be appropriately configured according to the embodiment. In this embodiment, the inference module 55 may be constituted by a machine learning model including one or more parameters for performing computational processing to derive the solution from the sub-region information 60 and the compressed information 65, wherein the one or more parameters have values ​​adjusted by machine learning. The type of machine learning model constituting the inference module 55 is not particularly limited, and can be appropriately selected according to the embodiment. For example, a neural network can be used for the machine learning model constituting the inference module 55. The structure of the neural network constituting the inference module 55 can be appropriately determined according to the embodiment. When a neural network is used for the inference module 55, the weights of the connections between each node, the threshold of each node, etc., are examples of the parameters of the inference module 55. The format of the output of the inference module 55 is not particularly limited, as long as the output shows the inference result, and can be appropriately determined according to the embodiment.

[0040] It should be noted that when both the compression module 50 and the inference module 55 are composed of machine learning models, the compression module 50 and the inference module 55 can be configured as a single unit. Furthermore, training the inference model 5 by the model generation device 1 may include adjusting the values ​​of one or more parameters of the compression module 50 and the inference module 55 such that the inference result obtained by the inference model 5 by inputting each training image 30 as an input image 6 matches the correct answer to the task for each training image 30.

[0041] The content of the compression processing performed by the compression module 50 is not particularly limited, as long as the compression processing reduces the data size of the extended region information 61, and can be appropriately determined according to the embodiment. In one example, the compression module 50 can be configured to obtain compressed information 65 through simple compression processing. In another example, the compression module 50 can be configured to obtain compressed information 65 through data reduction and restoration.

[0042] Figure 3 An example of the computational processing of the compression module 50 according to this embodiment is illustrated schematically. It should be noted that... Figure 3 In this document, for ease of description, sub-region information 60 is represented by 4×4 data, and extended region information 61 is represented by 8×8 data. The extended region (8×8) is contained within the central sub-region (4×4). These representations are merely examples and do not limit this disclosure.

[0043] exist Figure 3 In the example, firstly, the compression module 50 reduces the size of the extended region information 61 (8×8) by performing a pooling operation of size 2×2, and obtains information 62 with the same size (4×4) as the sub-region information 60. In information 62, the 2×2 region at the center corresponds to the sub-region after the pooling operation. Next, the compression module 50 obtains information 63 by averaging the corresponding values ​​included in information 62 through a convolution operation. Subsequently, the compression module 50 obtains information 64 of size 2×2 by cutting out the central 2×2 region from information 63. Then, the compression module 50 obtains compressed information 65 of size 4×4 by performing a scaling (upsampling) operation of size 2×2. Although Figure 3 The compression process includes cutting out a 2×2 region, but before the cutting operation, pooling and convolution operations are included, so that the obtained compressed information 65 can represent the features of the entire extended region.

[0044] Pooling operations can be performed by pooling layers. Convolution operations can be performed by convolutional layers. Scaling operations can be performed by scaling. Therefore, as an example, compression module 50 can be composed of a neural network with three layers: pooling layers, convolutional layers, and scaling layers. Therefore, compression module 50 can be configured to perform… Figure 3 The operation shown is a computational process. It should be noted that the content of the compression process is not limited to this example. Figure 3 The processing content can be appropriately changed according to the embodiments.

[0045] It should be noted that the number of dimensions (information content, compression ratio, and data size) of the compressed information 65 is not particularly limited and can be appropriately determined according to the embodiments. In one example, the compressed information 65 can be configured to have the same dimensions as the sub-region. That is, the compression module 50 can be configured to generate compressed information 65 with the same dimensions as the sub-region. Specifically, as Figure 3 As shown, compression information 65 can be configured to represent a spatial region with the same size (same number of features and same spatial extent) as the sub-region.

[0046] In this scenario, compressed information 65 can be integrated with sub-region information 60 before performing inference processing. That is, as... Figure 3 As shown, the inference module 55 can be configured to generate integrated information 67 by integrating sub-region information 60 and compressed information 65, and to derive the solution to the task from the generated integrated information 67. By utilizing the configuration of processing integrated information 67, the number of parameters of the inference module 55 can be reduced compared to the case where sub-region information 60 and compressed information 65 are processed separately. As a result, the efficiency of the inference process can be improved. It should be noted that the integration process can consist of any processing, such as addition, averaging, weighted averaging, and addition of information as another dimension (e.g., addition of channels of feature maps).

[0047] A sub-region is a portion of a region in the input image 6. Sub-regions can be appropriately specified in the input image 6. In one example, inference model 5 can repeatedly perform inference processing on the input image 6 while changing the extent designated as a sub-region (e.g., shifting the extent by a predetermined amount). That is, the input image 6 can be divided into multiple extents, and inference model 5 can designate each extent of the input image 6 as a sub-region and perform inference processing on each sub-region (extension). As a result, inference model 5 can be configured to perform inference processing on multiple extents (e.g., the entire extent) of the input image 6. It should be noted that the processing scope of inference model 5 is not limited to such an example. Inference model 5 does not necessarily perform inference processing on a portion of the extent of the input image 6. It should be noted that in the case where multiple sub-regions are specified in the input image 6, at least some of the sub-regions can be specified as overlapping with adjacent sub-regions. Alternatively, sub-regions can be specified as not overlapping each other.

[0048] An extended region is a region that includes a sub-region and is wider than that sub-region. An extended region can be specified by extending the sub-region in any direction. The extension direction is not particularly limited and can be appropriately selected according to the embodiment. As an example, candidates for extension directions are four directions (up, down, right, and left) in two-dimensional space and six directions (up, down, right, left, front, and back) in three-dimensional space. An extended region can be specified by selecting a direction for extending the sub-region from multiple candidates for scalable directions and extending the sub-region in the selected direction. In one example, an extended region can be specified by extending the sub-region in the directions of some of the multiple candidates for scalable directions.

[0049] In another example, such as Figure 2 As shown, the extended region can be configured to include a perimeter region S that surrounds the entire perimeter of the sub-region. That is, the extended region can be specified by extending the sub-region in all scalable directions. In this case, as... Figure 3 As shown, by uniformly expanding the sub-region in all directions, the expanded region can be specified as a sub-region included at the center. It should be noted that the method of specifying the expanded region is not limited to this example. In another example, the expansion amount of the sub-region can be biased in some directions. As described above, by utilizing the configuration of the expanded region including the peripheral region S, information can be obtained from all directions surrounding the sub-region. As a result, in the inference processing of inference model 5, the probability of missing information about features related to features appearing in the sub-region can be reduced, thereby potentially improving the accuracy of the inference processing. It should be noted that the order of specifying the sub-region and the expanded region is not particularly restricted. In one example, the expanded region can be specified after the sub-region. In another example, the sub-region can be specified after the expanded region.

[0050] The size of the extended region need not be particularly limited and can be appropriately determined according to the embodiments. In one example, the size of the extended region can be up to eight times the size of the sub-region. By setting an upper limit on the size of the extended region in this way, the increase in computational load and cost of the compression processing performed by the compression module 50 can be suppressed, thereby improving the processing efficiency of the inference model 5.

[0051] Sub-region information 60 and extended region information 61 can be appropriately obtained from input image 6. For example... Figure 3As shown, each piece of information (60, 61) can indicate a feature quantity included in each feature (e.g., a pixel or point) in each region. The feature quantity can be, for example, a pixel value, a measurement, a value indicating the probability of presence, or a value calculated through any computational processing. In one example, each piece of information (60, 61) can be obtained directly from the input image 6. In another example, each piece of information (60, 61) can be obtained by performing any computational processing on the input image 6.

[0052] The type of task is not particularly limited, as long as the task involves reasoning about features appearing in a sub-region of an image (e.g., attributes of objects appearing within that region), and can be appropriately selected according to the embodiments. Reasoning can be recognition or regression. Reasoning can include prediction. The type of image can be appropriately selected according to the task.

[0053] In one example, each training image 30 and target image 221 can depict an object. The task can be to infer the attributes of objects appearing in sub-regions. As a result, the accuracy of the inference processing can be improved in scenarios where object attributes are inferred. It should be noted that, as an example of an application scenario, the task of inferring object attributes can be employed in a scenario where items are inspected. In this case, the attributes of the item being inferred can be, for example, the R-value, the state of other parts, or the presence or absence of defects. As another example, the task of inferring object attributes can be employed in a scenario where objects are measured by onboard sensors. In this case, the onboard sensors can be arranged facing the interior of the vehicle, and the object can be present in the vehicle, such as an occupant. Alternatively, the onboard sensors can be arranged facing the exterior of the vehicle, and the object can be present outside the vehicle, such as obstacles (people or objects) and traffic-related objects (e.g., road surfaces and traffic lights). The attributes of the object can be, for example, the state of the occupant, the state of the obstacle, or the state of the road surface.

[0054] In another example, each training image 30 and target image 221 can be acquired by an onboard sensor. The onboard sensor can be, for example, a camera, LiDAR, millimeter-wave radar, infrared sensor, or ultrasonic sensor. The task can be to infer features appearing within the observation range of the onboard sensor. As a result, the accuracy of inference processing in scenarios where features of observed targets by the onboard sensor are inferred can be improved. It should be noted that inferring features appearing within the observation range can be, for example, inferring the occurrence of an event or inferring the properties of objects existing within the observation range. The onboard sensor can be positioned facing the interior or exterior of the vehicle. The event that serves as the inference target can be, for example, an event occurring within the occupants or the appearance of an obstacle. Furthermore, the object that serves as the target for inferring properties can exist inside or outside the vehicle and can be, for example, an occupant, an obstacle, or a traffic-related object. The inferred object properties can be, for example, the state of the occupants, the state of the obstacle, or the state of the road surface.

[0055] In machine learning, the correct answer (truth value) for a task can be appropriately provided. In one example, the correct answer to the task can be provided via a correct answer label (teacher signal). The correct answer (truth value) indicated by the correct answer label can be provided manually or obtained through any inference processing by the computer. In this case, each training image 30 can be obtained in the format of a dataset along with the correct answer label. In another example, the correct answer to the task can be provided as a training metric through any rule, etc.

[0056] Furthermore, in one example, such as Figure 1 As shown, the model generation device 1 and the inference device 2 can be connected to each other via a network. The type of network can be appropriately selected from, for example, the Internet, a wireless communication network, a mobile communication network, a telephone network, or a dedicated network. It should be noted that the method of exchanging data between the model generation device 1 and the inference device 2 is not limited to this example and can be appropriately selected according to the embodiment. In another example, data can be exchanged between the model generation device 1 and the inference device 2 using a storage medium.

[0057] 2 Configuration Examples

[0058] Hardware configuration example

[0059] Model generation device

[0060] Figure 4 An example of the hardware configuration of the model generation apparatus 1 according to this embodiment is shown schematically. Figure 4 As shown, the model generation apparatus 1 according to this embodiment is a computer in which the controller 11, storage unit 12, communication interface 13, external interface 14, input device 15, output device 16 and driver 17 are electrically connected to each other.

[0061] The controller 11 includes a central processing unit (CPU), random access memory (RAM), read-only memory (ROM), etc., and is configured to perform information processing based on programs and various data. The controller 11 (CPU) is an example of a processor resource. The storage unit 12 is, for example, composed of a hard disk drive and a solid-state drive. The storage unit 12 is an example of a memory resource. In this embodiment, the storage unit 12 stores various information such as the model generation program 81, multiple training images 30, learning result data 125, etc.

[0062] Model generation program 81 is a machine learning information processing (described below) process that causes model generation device 1 to perform to generate a trained inference model 5. Figure 8 The model generation program 81 includes a series of commands for information processing. Multiple training images 30 are used for machine learning of the inference model 5. The learning result data 125 can be appropriately configured to indicate the inference model 5 that has been trained and generated through machine learning.

[0063] Communication interface 13 is, for example, a wired local area network (LAN) module or a wireless LAN module, and is an interface for performing wired or wireless communication via a network. External interface 14 is, for example, a Universal Serial Bus (USB) port or a dedicated port, and is an interface for connecting to external devices. The type and number of external interfaces 14 can be arbitrarily determined. The model generation device 1 can perform data communication with another information processing device via a network using communication interface 13. Furthermore, training images 30 can be acquired by a sensor. In this case, model generation device 1 can be connected to the sensor via communication interface 13 or external interface 14.

[0064] Input device 15 is, for example, a device for performing input, such as a mouse or keyboard. Output device 16 is, for example, a device for performing output, such as a display or speaker. An operator can operate model generation device 1 by using input device 15 and output device 16. Training image 30 can be acquired by the operator through input via input device 15. Input device 15 and output device 16 can be configured as a whole, for example, by a touch screen display.

[0065] Driver 17 is a device for reading various information, such as a program stored in storage medium 91. At least one of the model generation program 81 and the plurality of training images 30 can be stored in storage medium 91. Therefore, model generation device 1 can obtain the model generation program 81 and at least one of the plurality of training images 30 from storage medium 91. Storage medium 91 is a medium that accumulates information such as a program through electrical, magnetic, optical, mechanical, or chemical action, enabling computers, other devices, machines, etc., to read various information such as the stored program.

[0066] Here, Figure 4 The diagram shows a disc-type storage medium, such as a CD or DVD, as an example of storage medium 91. However, the type of storage medium 91 is not limited to a disk type and can be of other types besides disk types. Examples of storage media other than disk types can include semiconductor memory such as flash memory. The type of drive 17 can be appropriately selected depending on the type of storage medium 91.

[0067] It should be noted that, depending on the specific hardware configuration of the model generation apparatus 1, components may be appropriately omitted, replaced, or added according to the embodiments. For example, the controller 11 may include multiple hardware processors. The hardware processors may consist of a microprocessor, an electronic control unit (ECU), a field-programmable gate array (FPGA), or a graphics processing unit (GPU). At least one of the communication interface 13, external interface 14, input device 15, output device 16, and driver 17 may be omitted. The model generation apparatus 1 may consist of multiple computers. In this case, the hardware configurations of these computers may be identical or inconsistent. Furthermore, the model generation apparatus 1 may be an information processing device specifically designed for the services provided, or it may be a general-purpose server device, a personal computer (PC), etc.

[0068] Reasoning device

[0069] Figure 5 An example of the hardware configuration of the inference device 2 according to this embodiment is illustrated schematically. Figure 5 As shown, the inference device 2 according to this embodiment is a computer in which the controller 21, storage unit 22, communication interface 23, external interface 24, input device 25, output device 26 and driver 27 are electrically connected to each other.

[0070] The controller 21 to the driver 27 and the storage medium 92 of the inference device 2 can be configured in the same manner as the controller 11 to the driver 17 and the storage medium 91 of the model generation device 1. The controller 21 includes a CPU, RAM, ROM, etc., as a hardware processor, and is configured to perform various information processing based on programs and data. The storage unit 22 is composed of, for example, a hard disk drive and a solid-state drive. In this embodiment, the storage unit 22 stores various information such as the inference program 82 and the learning result data 125.

[0071] Inference program 82 enables inference device 2 to perform information processing (described below) on images by using the trained inference model 5 to perform inference tasks. Figure 9 The inference program 82 includes a series of commands for information processing. At least one of the inference program 82 and the learning result data 125 can be stored in the storage medium 92. Therefore, the inference device 2 can retrieve at least one of the inference program 82 and the learning result data 125 from the storage medium 92.

[0072] It should be noted that, depending on the specific hardware configuration of the inference device 2, components may be appropriately omitted, replaced, or added according to this embodiment. For example, the controller 21 may include multiple hardware processors. The hardware processors may be composed of microprocessors, ECUs, FPGAs, GPUs, etc. At least one of the communication interface 23, external interface 24, input device 25, output device 26, and driver 27 may be omitted. The inference device 2 may be composed of multiple computers. In this case, the hardware configurations of these computers may be identical or inconsistent. Furthermore, in addition to being an information processing device specifically designed for providing services, the inference device 2 may also be a general-purpose server device, a general-purpose PC, a tablet PC, a portable terminal (e.g., a smartphone), an in-vehicle device, etc.

[0073] Software configuration example

[0074] Model generation device

[0075] Figure 6 An example of the software configuration of the model generation apparatus 1 according to this embodiment is schematically shown. The controller 11 of the model generation apparatus 1 expands the model generation program 81 stored in the storage unit 12 into RAM, and the CPU executes the commands included in the expanded model generation program 81 in RAM. As a result, the model generation apparatus 1 according to this embodiment is operated as a computer including a data acquisition unit 111, a learning processing unit 112, and a storage processing unit 113 as software modules. That is, in this embodiment, each software module of the model generation apparatus 1 is implemented by the controller 11 (CPU).

[0076] The data acquisition unit 111 is configured to acquire a plurality of training images 30. The learning processing unit 112 is configured to use the acquired plurality of training images 30 to implement machine learning of the inference model 5. In this embodiment, the inference model 5 includes a compression module 50 and an inference module 55. Implementing machine learning includes training the inference model 5 such that the inference result obtained by the inference module 55 by inputting each training image 30 as an input image 6 into the inference model 5 matches the correct answer to the task for a sub-region in each training image 30.

[0077] The inference model 5 consists of a machine learning model with parameters adjusted through machine learning. Training the inference model 5 includes adjusting (optimizing) the values ​​of the parameters included in the inference model 5 so that an output (inference result) matching each training image 30 can be derived from each training image 30. In this embodiment, both the compression module 50 and the inference module 55 may include one or more parameters, and adjusting the values ​​of the parameters of the inference model 5 may include adjusting the values ​​of one or more parameters of the compression module 50 and the inference module 55. The machine learning method can be appropriately selected according to the type of machine learning model to be used. As a machine learning method, for example, the backpropagation method or the method of solving an optimization problem can be used.

[0078] In this embodiment, the inference model 5 (compression module 50 and inference module 55) can be constructed from a neural network. In this case, by inputting each training image 30 as input image 6 into the inference model 5 and performing forward operations on the compression module 50 and inference module 55, the inference result for a sub-region in each training image 30 can be obtained as the output of the inference module 55. The learning processing unit 112 is configured to adjust the values ​​of the parameters of the inference model 5 so that the error between the inference result obtained for a sub-region in each training image 30 and the correct answer is small during machine learning processing.

[0079] The storage processing unit 113 is configured to generate information about the inference model 5, which has been trained and generated through machine learning, as learning result data 125, and stores the generated learning result data 125 in a predetermined storage area. The configuration of the learning result data 125 is not particularly limited, as long as the trained inference model 5 can be reproduced, and can be appropriately determined according to the embodiments. As an example, the learning result data 125 may include information indicating the value of each parameter obtained through adjustments via machine learning. In some cases, the learning result data 125 may include information indicating the structure of the inference model 5. For example, the structure may be specified by the number of layers, the type of each layer, the number of nodes included in each layer, and the connection relationships between nodes in adjacent layers.

[0080] Reasoning device

[0081] Figure 7An example of the software configuration of the inference device 2 according to this embodiment is schematically shown. The controller 21 of the inference device 2 expands the inference program 82 stored in the storage unit 22 into RAM, and the CPU executes the commands included in the inference program 82 expanded in RAM. As a result, the inference device 2 according to this embodiment is operated as a computer including an acquisition unit 211, an inference unit 212, and an output unit 213 as software modules. That is, in this embodiment, each software module of the inference device 2 is implemented by the controller 21 (CPU) in the same manner as the model generation device 1.

[0082] Acquisition unit 211 is used to acquire target image 221. Inference unit 212 includes an inference model 5 trained by machine learning by maintaining learning result data 125. Inference unit 212 is configured to infer a solution to a task for the acquired target image 221 using the trained inference model 5. Inferring a solution to a task for the target image 221 includes: acquiring a result obtained by inputting the target image 221 as input image 6 into the trained inference model 5, and inferring a solution to the task from the inference module 55 of the trained inference model 5. In this embodiment, compression module 50 and inference module 55 may each be configured to have one or more parameters. Furthermore, one or more parameters of each of compression module 50 and inference module 55 may be adjusted by machine learning such that the inference result obtained by inference model 5 by inputting each training image 30 as input image 6 into inference model 5 matches the correct answer to the task for a sub-region in each training image 30. Output unit 213 is configured to output information about the result obtained by inferring the solution to the task.

[0083] other

[0084] It should be noted that in this embodiment, an example is described where each software module of the model generation device 1 and the inference device 2 is implemented by a general-purpose CPU. However, some or all of these software modules may be implemented by one or more dedicated processors. The aforementioned modules can also be implemented as hardware modules. Furthermore, for the software configuration of the model generation device 1 and the inference device 2, modules may be appropriately omitted, replaced, or added according to the embodiments.

[0085] 3 Operations

[0086] Model generation device

[0087] Figure 8This is a flowchart illustrating an example of the processing procedure of the model generation apparatus 1 according to this embodiment. The processing procedure of the model generation apparatus 1 described below is an example of a model generation method. Furthermore, the processing procedure of the model generation apparatus 1 described below is merely an example, and each step can be modified as much as possible. In addition, for the processing procedure below, steps can be appropriately omitted, substituted, and added according to the embodiment.

[0088] Step S101

[0089] In step S101, the controller 11 operates as a data acquisition unit 111 to acquire multiple training images 30.

[0090] Each training image 30 can be generated appropriately. Training images 30 can be acquired by a sensor. Training images 30 can be generated by computer operation. Training images 30 can be obtained through simulation. One or more new training images 30 can be generated through computational processing of data expansion (e.g., rotation or translation) performed on the training images 30. Furthermore, training images 30 can be generated through any computational processing of a computer.

[0091] The correct answer (truth value) for the inference task can be appropriately given for each training image 30. In one example, a correct answer label (teacher signal) indicating the correct answer can be appropriately generated, and the generated correct answer label can be appropriately associated with each training image 30. As a result, the training data can be obtained in the format of a dataset (a combination of training images 30 and correct answer labels). In another example, the correct answer to the inference task can be given by an indicator such as any rule.

[0092] Each training image 30 can be automatically generated by computer operation or manually generated by operation that includes at least part of an operator's input. Furthermore, each training image 30 can also be generated by the model generation device 1 or by a computer other than the model generation device 1. That is, the controller 11 can generate each training image 30 automatically or manually. Alternatively, the controller 11 can acquire each training image 30 generated by another computer via, for example, a network, storage medium 91, and external storage devices. Some of the multiple training images 30 can be generated by the model generation device 1, while others can be generated by one or more other computers.

[0093] The number of training images 30 to be acquired is not particularly limited and can be appropriately determined according to the embodiments. When multiple training images 30 have been acquired, the controller 11 advances the processing to the next step S102.

[0094] Step S102

[0095] In step S102, the controller 11 operates as a learning processing unit 112 and uses the acquired multiple training images 30 to implement machine learning of the inference model 5.

[0096] As an example of machine learning processing, controller 11 first performs the initial setup of inference model 5, which is the processing target of machine learning. The initial values ​​of the structure and parameters of inference model 5 can be given by a template or determined by operator input. In the case of performing additional learning or relearning, controller 11 can perform the initial setup of inference model 5 based on the learning results data obtained through past machine learning.

[0097] Next, controller 11 trains inference model 5 by implementing machine learning, so that the inference result obtained for each training image 30 matches the correct answer (true value). Training inference model 5 includes adjusting (optimizing) the values ​​of the parameters of inference model 5.

[0098] As an example, controller 11 inputs each training image 30 into inference model 5 and performs forward computation processing. In this embodiment, controller 11 obtains extended region information from each training image 30, inputs the obtained extended region information into compression module 50, and performs forward computation processing in compression module 50. As a result of this computation processing, controller 11 obtains compressed information.

[0099] Subsequently, controller 11 acquires sub-region information from each training image 30 and inputs the acquired sub-region information and compressed information obtained by compression module 50 into inference module 55. In one example, controller 11 can generate integrated information by integrating sub-region information and compressed information, and input the generated integrated information into inference module 55. Furthermore, controller 11 performs forward computation processing by inference module 55. As a result of this computation processing, controller 11 obtains an output from inference module 55 corresponding to the result obtained by inferring the solution to the task for sub-regions in each training image 30. It should be noted that sub-regions can be appropriately designated in each training image 30. Controller 11 can repeat the above series of computation processing for each training image 30 while changing the range designated as a sub-region. As a result, controller 11 can obtain inference results for multiple ranges in each training image 30.

[0100] Next, controller 11 calculates the error between the obtained inference result and the corresponding correct answer. This error can be calculated using any loss function. Controller 11 calculates the gradient of the calculated error. Using the gradient of the error calculated by backpropagation, controller 11 sequentially calculates the error in the values ​​of the parameters of inference model 5 from the output side. Controller 11 updates the value of each parameter of inference model 5 based on each calculated error. In one example, controller 11 may update the values ​​of the parameters of inference module 55 and compression module 50.

[0101] The controller 11 adjusts the value of each parameter of the inference model 5 through a series of update processes, such that the sum of the calculated errors for each training image 30 is small. For example, the controller 11 can repeat the adjustment of the value of each parameter through a series of update processes until predetermined conditions are met, such as a predetermined number of executions or the sum of the calculated errors being less than or equal to a threshold. As a result of this machine learning process, the controller 11 can generate an inference model 5 that has been trained and has acquired the ability to perform inference tasks on sub-regions in an image. Upon completion of the machine learning process, the controller 11 proceeds the process to the next step S103.

[0102] Step S103

[0103] In step S103, the controller 11 operates as a storage processing unit 113 to generate information about the inference model 5 that has been trained through machine learning and is being generated as learning result data 125. Furthermore, the controller 11 stores the generated learning result data 125 in a predetermined storage area.

[0104] The predetermined storage area can be, for example, RAM in controller 11, storage unit 12, external storage device, storage medium, or a combination thereof. For example, the storage medium can be a CD or DVD, and controller 11 can store the learning result data 125 in the storage medium via drive 17. The external storage device can be, for example, a data server, such as Network Attached Storage (NAS). In this case, controller 11 can store the learning result data 125 in the data server via a network using communication interface 13. Furthermore, the external storage device can be, for example, an external storage device connected to model generation device 1 via external interface 14.

[0105] Having completed the storage of the learning result data 125, the controller 11 terminates the processing of the model generation device 1 according to this operation example.

[0106] It should be noted that the generated learning result data 125 can be provided to the inference device 2 by any method and at any time. In one example, the learning result data 125 can be provided to the inference device 2 via a network from, for example, the model generation device 1, another computer, or a data server. In another example, the learning result data 125 can be provided to the inference device 2 via storage medium 92 or an external storage device. In yet another example, the learning result data 125 can be pre-integrated into the inference device 2.

[0107] Furthermore, the controller 11 can update or generate new learning result data 125 by re-executing steps S101 to S103 at any time. In this re-execution, at least some of the training images 30 used for machine learning can be appropriately changed, modified, added, deleted, etc. The controller 11 can provide the updated or newly created learning result data 125 to the inference device 2 by any method and at any time. Therefore, the controller 11 can update the learning result data 125 held by the inference device 2.

[0108] Reasoning device

[0109] Figure 9 This is a flowchart illustrating an example of the processing procedure of the inference device 2 according to this embodiment. The processing procedure of the inference device 2 described below is an example of an inference method. It should be noted that the processing procedure of the inference device 2 described below is only an example, and each step can be changed as much as possible. Furthermore, for the processing procedure below, steps can be appropriately omitted, substituted, and added according to the embodiment.

[0110] Step S201

[0111] In step S201, the controller 21 operates as an acquisition unit 211 to acquire the target image 221.

[0112] Target image 221 can be generated appropriately. Target image 221 can be acquired by a sensor. Target image 221 can be generated by computer operation. Target image 221 can be obtained through simulation. Target image 221 can be generated through any computational processing of a computer. In one example, controller 21 can directly acquire target image 221 by receiving computer operations and performing generation processing such as simulation. In another example, controller 21 can acquire target image 221 from another computer, sensor, storage medium 92, external storage device, etc. Upon acquiring target image 221, controller 21 advances the processing to the next step S202.

[0113] Step S202

[0114] In step S202, controller 21 operates as inference unit 212 to execute the settings of inference model 5, which has been trained with reference to learning result data 125. Furthermore, controller 21 infers a solution for the task on the acquired target image 221 using the trained inference model 5. This inference operation can be the same as the forward operation in machine learning training. Controller 21 inputs the target image 221 into the trained inference model 5 and executes the forward operation of the trained inference model 5. As a result of this operation, controller 21 obtains the result obtained by inferring a solution for the task on a sub-region in the target image 221 from the inference module 55 of the trained inference model 5. It should be noted that the sub-region can be appropriately designated in the target image 221. Controller 21 can repeatedly execute the above series of operations on the target image 221 while changing the range designated as the sub-region. As a result, controller 21 can obtain inference results for multiple ranges of the target image 221. Having obtained the inference results, controller 21 advances the process to the next step S203.

[0115] Step S203

[0116] In step S203, the controller 21 operates as an output unit 213 to output information about the reasoning result.

[0117] The output destination and the content of the information to be output can be appropriately determined according to the embodiment. For example, the controller 21 can output the reasoning result obtained in step S202 as is to the output device 26 or the output device of another computer. Furthermore, the controller 21 can perform any information processing based on the obtained reasoning result. Additionally, the controller 21 can output the result of the information processing as information about the reasoning result. The output of the result of the information processing can include controlling the operation of the target device based on the reasoning result. The output destination can be, for example, the output device 26, the output device of another computer, or the target device. As an example, the reasoning task can be reasoning about events occurring within the observation range of the vehicle's onboard sensors. In this case, the target device can be a vehicle, and the controller 21 can determine instructions for the vehicle based on the reasoning result and control the vehicle's operation in response to the determined instructions. For example, if an obstacle is detected near the vehicle's door through reasoning, the controller 21 can output an instruction to lock the vehicle's door to the vehicle's control device.

[0118] Having completed the output of information regarding the inference result, controller 21 terminates the processing of the inference device 2 according to this operational example. It should be noted that controller 21 may repeatedly execute the series of information processing steps S201 to S203. The repetition time may be appropriately determined according to the embodiment. As a result, inference device 2 can be configured to repeatedly perform inference tasks for images.

[0119] feature

[0120] In this embodiment, since the compression module 50 is provided, in the inference processing of the inference module 55, in addition to referring to the sub-region information 60 of the input image 6, the inference model 5 can also refer to information (compressed information 65) about the extended region including the region outside the sub-region. As a result, features appearing in the sub-region can be inferred based on features existing outside the sub-region, thereby improving the accuracy of the inference processing. Therefore, through the processing of steps S101 and S102, a trained inference model 5 capable of performing inference processing with high accuracy can be generated. Furthermore, by using this trained inference model 5 in the processing of steps S201 and S202, the accuracy of inference processing for sub-regions in the target image 221 can be improved.

[0121] Furthermore, in this embodiment, during the machine learning processing in step S102, the compression module 50 can be trained together with the inference module 55. Therefore, the compression module 50 can be optimized to acquire the ability to generate compressed information 65 suitable for the inference task. That is, the compression processing performed by the compression module 50 can be optimized for the inference task. Therefore, it is expected that the accuracy of the inference can be improved.

[0122] 4. Examples of Deformation

[0123] While embodiments of the present disclosure have been described in detail above, the foregoing description is merely illustrative in all respects. Needless to say, various modifications or variations can be made without departing from the scope of the present disclosure. For example, the following changes can be made. The following examples of variations can be appropriately combined.

[0124] In the above embodiment, the compression module 50 and the inference module 55 are trained together using machine learning. However, the scope of machine learning is not limited to this example. In another example, only the inference module 55 may be the target of machine learning, and the compression module 50 need not be the target. In this case, the compression module 50 may be configured to generate compressed information 65 through rule-based computation.

[0125] Furthermore, in the above embodiments, sub-region information 60 and compressed information 65 can be integrated, and inference module 55 can be configured to receive integrated information 67. However, the form in which information is input to inference module 55 is not limited to such an example. In another example, inference module 55 can be configured to receive sub-region information 60 and compressed information 65 separately. In this case, compressed information 65 does not necessarily have the same dimensions as sub-region information 60.

[0126] Furthermore, in the above embodiments, the inference model 5 may include multiple compression modules 50. In this case, the size of the extended region to be processed by at least some of the compression modules 50 may be different. In one example, each compression module 50 may be configured to generate compressed information of the same dimension as the sub-region information 60 from extended region information of different sizes. The inference module 55 may be configured to obtain integrated information by integrating the compressed information and the sub-region information 60 obtained from each compression module 50, and to infer the solution to the task from the obtained integrated information.

[0127] 5 Examples

[0128] To verify the effectiveness of the above embodiments, an inference model trained based on the following examples and comparative examples is generated. It should be noted that this disclosure is not limited to the following embodiments.

[0129] First, an inference model based on an example including two compression modules and an inference module having the same configuration as in the embodiments described above was prepared. Each compression module is configured to execute through a neural network including pooling layers, convolutional layers, and dilation layers. Figure 3 The computational processing is shown in the diagram. Furthermore, the inference module is configured to acquire ensemble information by integrating the compressed information and sub-region information obtained from each compression module using a convolutional neural network, and to infer the solution to the task from the acquired ensemble information. On the other hand, by omitting the compression module from the example configuration, an inference model based on the comparative example is prepared.

[0130] For the reasoning task to be performed by the inference model, inference is based on the R-value of each part of the component. Therefore, point cloud data of the 3D CAD model indicating the component is collected, and this collected point cloud data is used as training images. Each point in the point cloud data consists of three channels: the probability of existence of the shape (point), the R-value of the concave shape, and the R-value of the convex shape. In the case of using point cloud data, 128×128×128 point cloud data is extracted from point cloud data of size greater than or equal to 256×256×256. The trained inference model is generated by training an inference model based on example and contrast examples on a personal computer under the following machine learning conditions.

[0131] Conditions for machine learning

[0132] • Number of training images: Approximately 90,000

[0133] • Sub-region: 32×32×32

[0134] • The extended area processed by the first compression module: 128×128×128

[0135] • Extended area processed by the second compression module: 64×64×64

[0136] • Learning rate: Variable via a warm-up method (maximum: 0.01)

[0137] • Optimization Algorithm: Adam

[0138] Subsequently, point cloud data of the 3D CAD model (with ribs) of the indicator component was prepared as the evaluation image. Furthermore, an inference task was performed for each sub-region in the evaluation image using a pre-trained inference model based on example and comparative examples. The inference accuracy of the pre-trained inference model based on the example and comparative examples was then evaluated.

[0139] Figure 10 The inference results for the comparative examples of the evaluation images are shown. Figure 11 The inference results for an example of an evaluation image are shown. Figure 10 and Figure 11 In the diagram, the light-colored point cloud area at the far end of the dialog box indicator indicates the range where the inference error occurred. For example... Figure 10 and Figure 11 As shown, in the example, the R value for each part can be inferred more accurately than in the comparative example. In particular, it is difficult to perform correct inference for parts with R values ​​greater than or equal to 7 in the comparative example; however, it is possible to perform correct inference even for parts with R values ​​greater than or equal to 7 in the example. From this result, it is found that, according to this disclosure, it is possible to improve the inference accuracy for sub-regions in an input image.

[0140] 6 Supplements

[0141] Provided there is no technical inconsistency, the processing and methods described in this disclosure can be freely combined and implemented.

[0142] Furthermore, a process described in the specification as being performed by one device can be distributed and executed by multiple devices. Alternatively, processes described in the specification as being performed by different devices can be performed by one device. In a computer system, the hardware configuration for implementing each function can be flexibly changed.

[0143] This disclosure can also be implemented by providing a computer program that performs the functions described in the above embodiments to a computer and by having one or more processors included in the computer read and execute the program. Such a computer program may be provided to the computer by a non-volatile computer-readable storage medium that can be connected to the computer's system bus, or it may be provided to the computer via a network. Non-volatile computer-readable storage media include, for example, any type of disk, such as a magnetic disk (floppy disk, hard disk drive, HDD, etc.) or optical disk (CD-ROM, DVD, Blu-ray disc, etc.), read-only memory (ROM), random access memory (RAM), EPROM, EEPROM, magnetic cards, flash memory, optical cards, and any type of medium suitable for storing electronic commands.

Claims

1. A model generation method, executed by a computer, the method comprising: Acquire multiple training images; as well as The acquired training images are used to perform machine learning on the inference model, wherein: The inference model includes multiple compression modules and an inference module. The inference model is configured to divide the input image into multiple sub-regions with different ranges, and the inference module is configured to infer a solution for a task in each of the multiple sub-regions in the input image. Specifically, the multiple compression modules and the inference module perform the following i) and ii) for each of the multiple sub-regions in the input image: i) Each of the plurality of compression modules is configured to generate compressed information of the same dimension as the sub-region by acquiring information about an extended region that includes the sub-region and is wider than the sub-region from the input image and compressing the acquired information about the extended region; wherein the size of the extended region to be processed by at least some of the plurality of compression modules is different; and ii) The inference module is configured to generate integrated information of the same dimension as the sub-region by integrating information about the sub-region obtained from the input image and compressed information obtained by the plurality of compression modules respectively, and to derive a solution for the task for the sub-region from the generated integrated information; and Implementing the machine learning includes training the inference model such that the inference result of each sub-region of a plurality of sub-regions in the input image obtained by the inference model, by inputting each of the plurality of training images as the input image, matches the correct answer to the task for a corresponding sub-region of the plurality of sub-regions in each of the plurality of training images.

2. The model generation method according to claim 1, wherein: Each of the plurality of compression modules includes one or more parameters; The inference module includes one or more parameters; and Training the inference model includes adjusting the values ​​of one or more parameters of each of the plurality of compression modules and the values ​​of one or more parameters of the inference module.

3. The model generation method according to claim 1, wherein, The extended region includes the outer perimeter of the entire perimeter surrounding the sub-region.

4. The model generation method according to claim 1, wherein, The size of the extended region is within eight times the size of the sub-region.

5. The model generation method according to any one of claims 1 to 4, wherein: Each of the plurality of training images depicts an object; and The task for the sub-region is to infer the properties of the object.

6. The model generation method according to any one of claims 1 to 4, wherein: Each of the plurality of training images is obtained through an onboard sensor; and The task for the sub-region is to infer features that appear within the observation range of the onboard sensor.

7. A model generation apparatus, comprising a controller configured to perform: Acquire multiple training images, and The acquired training images are used to perform machine learning on the inference model, wherein: The inference model includes multiple compression modules and an inference module. The inference model is configured to divide the input image into multiple sub-regions with different ranges, and the inference module is configured to infer a solution for a task in each of the multiple sub-regions in the input image. Specifically, the multiple compression modules and the inference module perform the following i) and ii) for each of the multiple sub-regions in the input image: i) Each of the plurality of compression modules is configured to generate compressed information of the same dimension as the sub-region by acquiring information about an extended region that includes the sub-region and is wider than the sub-region from the input image and compressing the acquired information about the extended region; wherein the size of the extended region to be processed by at least some of the plurality of compression modules is different; and ii) The inference module is configured to generate integrated information of the same dimension as the sub-region by integrating information about the sub-region obtained from the input image and compressed information obtained by the plurality of compression modules respectively, and to derive a solution for the task for the sub-region from the generated integrated information; and Implementing the machine learning includes training the inference model such that the inference result of each sub-region of a plurality of sub-regions in the input image obtained by the inference model, by inputting each of the plurality of training images as the input image, matches the correct answer to the task for a corresponding sub-region of the plurality of sub-regions in each of the plurality of training images.

8. The model generation apparatus according to claim 7, wherein: Each of the plurality of compression modules includes one or more parameters; The inference module includes one or more parameters; and Training the inference model includes adjusting the values ​​of one or more parameters of each of the plurality of compression modules and the values ​​of one or more parameters of the inference module.

9. The model generation apparatus according to claim 7, wherein, The extended region includes the outer perimeter of the entire perimeter surrounding the sub-region.

10. The model generation apparatus according to claim 7, wherein, The size of the extended region is within eight times the size of the sub-region.

11. The model generation apparatus according to any one of claims 7 to 10, wherein: Each of the plurality of training images depicts an object; and The task for the sub-region is to infer the properties of the object.

12. The model generation apparatus according to any one of claims 7 to 10, wherein: Each of the plurality of training images is obtained through an onboard sensor; and The task for the sub-region is to infer features that appear within the observation range of the onboard sensor.

13. An inference device, comprising a controller configured to perform: Acquire the target image, and The inference model, which has been trained using machine learning, is used to infer a solution for the task in each of the multiple sub-regions in the acquired target image, where: The inference model includes multiple compression modules and an inference module. The inference model is configured to divide the input image into multiple sub-regions with different ranges, and the inference module is configured to infer a solution for a task in each of the multiple sub-regions in the input image. Specifically, the multiple compression modules and the inference module perform the following i) and ii) for each of the multiple sub-regions in the input image: i) Each of the plurality of compression modules is configured to generate compressed information of the same dimension as the sub-region by acquiring information about an extended region that includes the sub-region and is wider than the sub-region from the input image and compressing the acquired information about the extended region; wherein the size of the extended region to be processed by at least some of the plurality of compression modules is different; and ii) The inference module is configured to generate integrated information of the same dimension as the sub-region by integrating information about the sub-region obtained from the input image and compressed information obtained by the plurality of compression modules respectively, and to derive a solution for the task for the sub-region from the generated integrated information; and The inference of a solution for a task in each of the multiple sub-regions in the target image includes: obtaining a result by inputting the target image as the input image into the trained inference model, and inferring a solution for a task in each of the multiple sub-regions in the target image from the trained inference model.

14. The inference apparatus according to claim 13, wherein: Each of the plurality of compression modules includes one or more parameters; The inference module includes one or more parameters; and The values ​​of one or more parameters of each of the plurality of compression modules and the values ​​of one or more parameters of the inference module are adjusted by the machine learning.

15. The reasoning apparatus according to claim 13, wherein, The extended region includes the outer perimeter of the entire perimeter surrounding the sub-region.

16. A storage medium storing a reasoning program for enabling a computer to execute an information processing method, said information processing method comprising: Acquire the target image; as well as The inference model, which has been trained using machine learning, is used to infer a solution for the task in each of the multiple sub-regions in the acquired target image, where: The inference model includes multiple compression modules and an inference module. The inference model is configured to divide the input image into multiple sub-regions with different ranges, and the inference module is configured to infer a solution for a task in each of the multiple sub-regions in the input image. Specifically, the multiple compression modules and the inference module perform the following i) and ii) for each of the multiple sub-regions in the input image: i) Each of the plurality of compression modules is configured to generate compressed information of the same dimension as the sub-region by acquiring information about an extended region that includes the sub-region and is wider than the sub-region from the input image and compressing the acquired information about the extended region; wherein the size of the extended region to be processed by at least some of the plurality of compression modules is different; and ii) The inference module is configured to generate integrated information of the same dimension as the sub-region by integrating information about the sub-region obtained from the input image and compressed information obtained by the plurality of compression modules respectively, and to derive a solution for the task for the sub-region from the generated integrated information; and The inference of a solution for a task in each of the multiple sub-regions in the target image includes: obtaining a result by inputting the target image as the input image into the trained inference model, and inferring a solution for a task in each of the multiple sub-regions in the target image from the trained inference model.

17. The storage medium according to claim 16, wherein: Each of the plurality of compression modules includes one or more parameters; The inference module includes one or more parameters; and The values ​​of one or more parameters of each of the plurality of compression modules and the values ​​of one or more parameters of the inference module are adjusted by the machine learning.