Inference method, inference device, and program
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- NIPPON TELEGRAPH & TELEPHONE CORP
- Filing Date
- 2022-11-18
- Publication Date
- 2026-06-25
Smart Images

Figure 0007880067000024 
Figure 0007880067000025 
Figure 0007880067000026
Abstract
Description
[Technical Field]
[0001] The present invention relates to an inference method, an inference apparatus, and a program. [Background technology]
[0002] Deep convolutional neural networks (DCNNs), formed by stacking convolutional neural networks (CNNs), have become a mainstream technique in computer vision and image processing in recent years. DCNNs have made significant contributions to improving the performance of computer vision tasks such as image recognition, object detection, and semantic segmentation. In particular, semantic segmentation is a crucial element in many vision applications, including video surveillance, medical image processing, and autonomous driving.
[0003] Incidentally, existing semantic segmentation algorithms exhibit high recognition accuracy when the target image is a clean image, i.e., an image without image degradation. In contrast, images actually obtained from applications such as video surveillance and autonomous driving often have common image degradations such as noise, blurring, and compression distortion. However, even though these are common image degradations, such degradations are not anticipated by semantic segmentation algorithms. When such image degradation occurs in the image to be recognized, a domain shift occurs in the distribution of the training data, as shown in Non-Patent Document 1, for example. Therefore, if existing semantic segmentation algorithms are applied directly to images containing image degradation, the recognition accuracy will decrease significantly. [Prior art documents] [Non-patent literature]
[0004]
Non-Patent Document 1
Non-Patent Document 2
Non-Patent Document 3
Non-Patent Document 4
Non-Patent Document 5
[0005] A common approach to address image degradation is, for example, a combination of image restoration and semantic segmentation. However, many existing image restoration algorithms are built to correspond to specific degradation models such as Gaussian noise, compression distortion, and blur (see, for example, Non-Patent Documents 2 and 3). In other words, existing image restoration algorithms are built with pre-defined degradation models in mind, and are effective for images containing image degradation of the assumed degradation model, but cannot restore images containing image degradation of other degradation models.
[0006] For example, as shown in Figure 9, suppose we have image data 300-1 containing expected image degradation and image data 300-2 containing unexpected image degradation. If we apply an image restoration algorithm corresponding to the expected image degradation occurring in image data 300-1 to these image data 300-1 and 300-2, we obtain restored image data 310-1 and 310-2, respectively. Restored image data 310-1 is restored appropriately. In contrast, restored image data 310-2 is not restored appropriately. Therefore, even if we perform semantic segmentation on each of the restored image data 310-1 and 310-2, we get the following result: In the recognition result image data 320-1 obtained from the restored image data 310-1, the class of each pixel is correctly recognized. In contrast, in the recognition result image data 320-2 obtained from the restored image data 310-2, the class of each pixel is not correctly recognized.
[0007] In contrast to the general approach described above, Non-Patent Document 4 proposes a novel neural network for performing semantic segmentation on images that include image degradation. However, the neural network proposed in Non-Patent Document 4 needs to be trained for each degradation model. Therefore, for images where degradation of an unknown degradation model has occurred, the technology disclosed in Non-Patent Document 4, like the general approach described above, cannot correctly perform semantic segmentation.
[0008] Furthermore, in real-world scenarios, the degradation model is often unknown. Non-patent document 5 proposes a learning schema that is robust to various degradations. However, in order to achieve effectiveness against typical degradations such as blurring using the technology disclosed in non-patent document 5, it is necessary to retrain the model at least once.
[0009] Thus, while conventional semantic segmentation, which divides an image into several objects, can perform image segmentation specialized for specific degradations that are anticipated in advance, it has the challenge of not being able to perform correct image segmentation for unknown degradations.
[0010] This invention has been made in view of the above circumstances, and aims to provide an inference method, inference apparatus, and program that can perform image segmentation that is robust against unknown degradation in images. [Means for solving the problem]
[0011] One aspect of the present invention comprises: a conversion parameter specification step of specifying one or more conversion parameters; a conversion step of generating converted data by performing a conversion on input data based on each of the specified conversion parameters; a confidence map generation step of generating a confidence map for each of the converted data, which is data that shows the characteristics of each of the converted data; an inverse conversion step of generating inverse converted data by performing an inverse conversion on each of the confidence maps based on the specified conversion parameters to the conversion performed when the converted data was generated; and selecting a reference inverse converted confidence map from the inverse converted data, and the reference inverse converted confidence map and each of the inverse converted confidence maps The inference method comprises: a difference calculation step of generating a difference confidence map by calculating the difference with a confidence map; a difference correction step of generating a corrected difference confidence map by correcting the values of each region of the difference confidence map by referring to the values of surrounding regions; a confidence map correction step of generating a corrected confidence map for each region of the corrected difference confidence map based on the corrected difference confidence map, the reference inversely transformed confidence map, and each of the inversely transformed confidence maps; an integration step of generating integrated data whose number of dimensions matches the confidence map of the input data by performing an integration process to integrate each of the inversely transformed data; and an analysis step of performing an analysis process on the integrated data.
[0012] Furthermore, one aspect of the present invention includes: a conversion parameter specification unit that specifies one or more conversion parameters; a conversion unit that generates converted data by performing a conversion on input data based on each of the specified conversion parameters; a confidence map generation unit that generates a confidence map for each of the converted data, which is data that shows the characteristics of each of the converted data; an inverse conversion unit that generates inverse converted data by performing an inverse conversion on each of the confidence maps based on the specified conversion parameters to the conversion performed when the converted data was generated; and a reference inverse converted confidence map selected from the inverse converted data, and the reference inverse converted confidence map and each of the inverse converted The inference device comprises: a difference calculation unit that generates a difference confidence map by calculating the difference with the confidence map; a difference correction unit that generates a corrected difference confidence map by correcting the values of each region of the difference confidence map by referring to the values of surrounding regions; a confidence map correction unit that generates a corrected confidence map for each region of the corrected difference confidence map based on the corrected difference confidence map, the reference inversely transformed confidence map, and each of the inversely transformed confidence maps; an integration unit that generates integrated data whose number of dimensions matches the confidence map of the input data by performing an integration process that integrates each of the inversely transformed data; and an analysis unit that performs an analysis process on the integrated data.
[0013] Furthermore, one aspect of the present invention includes: a conversion parameter specification step of specifying one or more conversion parameters; a conversion step of generating converted data by performing a conversion on input data based on each of the specified conversion parameters; a confidence map generation step of generating a confidence map for each of the converted data, which is data that shows the characteristics of each of the converted data; an inverse conversion step of generating inverse converted data by performing an inverse conversion on each of the confidence maps based on the specified conversion parameters to the conversion performed when the converted data was generated; and a difference calculation step of selecting a reference inverse converted confidence map from the inverse converted data and generating a difference confidence map by calculating the difference between the reference inverse converted confidence map and each of the inverse converted confidence maps. This program is for executing the following steps: a difference correction step that generates a corrected difference confidence map by correcting the values of each region of the difference confidence map by referring to the values of surrounding regions; a confidence map correction step that generates a corrected confidence map for each region of the corrected difference confidence map based on the corrected difference confidence map, the reference inversely transformed confidence map, and each of the inversely transformed confidence maps; an integration step that generates integrated data whose number of dimensions matches the confidence map of the input data by performing an integration process that integrates each of the inversely transformed data; and an analysis step that performs an analysis process on the integrated data. [Effects of the Invention]
[0014] According to the present invention, it is possible to perform image segmentation that is robust against unknown degradation in images. [Brief explanation of the drawing]
[0015] [Figure 1] This is a block diagram showing the configuration of an inference device in one embodiment of the present invention. [Figure 2] This figure shows an overview of the method employed in the inference device in one embodiment of the present invention. [Figure 3]This figure shows the advantages and disadvantages of the low-resolution and high-resolution splitting results. [Figure 4] This figure shows an overview of the difference calculation unit and the difference correction unit in one embodiment of the present invention. [Figure 5] This figure shows an overview of the confidence map correction unit in one embodiment of the present invention. [Figure 6] This is a flowchart showing the processing flow performed by the inference device in one embodiment of the present invention. [Figure 7] This is a block diagram showing the configuration of a condition selection device in one embodiment of the present invention. [Figure 8] This is a flowchart showing the processing flow performed by the condition selection device in one embodiment of the present invention. [Figure 9] This figure outlines a common approach to applying semantic segmentation to image data that exhibits image degradation. [Modes for carrying out the invention]
[0016] The inference method, inference apparatus, and program in one embodiment of the present invention will be described below with reference to the drawings.
[0017] Figure 1 is a block diagram showing the configuration of the inference device 1 in one embodiment of the present invention. Figure 2 is a diagram showing an overview of the method employed in the inference device 1 according to the embodiment shown in Figure 1. The image data 100-1 shown in Figure 2 is image data to be recognized for semantic segmentation obtained from applications such as video surveillance and autonomous driving. Here, it is assumed that the image data 100-1 has undergone unknown image degradation. The method employed in the inference device 1 is based on the following two technical grounds: [1] and [2].
[0018] [1] When low-resolution image data 100-2, ..., 100-N are obtained by, for example, performing a reduction image conversion on image data 100-1, fine textures and other patterns are lost, but image degradation such as JPEG (Joint Photographic Experts Group) compression and blur is also reduced. Therefore, low-resolution image data 100-2, ..., 100-N are image data with reduced image degradation.
[0019] [2] Semantic segmentation algorithms are also effective for low-resolution image data 100-2, ..., 100-N. In other words, there is not much difference in the accuracy of semantic segmentation between low-resolution image data without image degradation and high-resolution image data without image degradation.
[0020] Based on the two technical grounds described in [1] and [2] above, the method employed in the inference device 1 is, for example, to perform semantic segmentation on image data 100-1 including image degradation as follows: An arbitrarily defined image transformation is performed on image data 100-1 to obtain image data 100-2, ..., 100-N. For example, an encoder-decoder type semantic segmentation neural network such as SegNet or U-Net is used, and the encoder part and decoder part of a trained neural network are separated. Each of the image data 100-1, 100-2, ..., 100-N is fed to the encoder part of the neural network to perform downsampling and obtain the corresponding confidence maps 110-1, 110-2, ..., 110-N.
[0021] Here, the confidence map is data also known as a feature map, which shows the features contained in the image data 100-1, 100-2, ..., 100-N. The value of each element in the confidence map, that is, the value that indicates the feature quantity, is called a logit, and based on the logit value, the class with the highest confidence level is assigned to each pixel of the original image data. Here, the class with the highest confidence level for each pixel is, for example, the class with the highest logit value corresponding to each pixel, or the class with the highest normalized logit value. A class is, for example, the type of object such as "person" or "car" contained in each of the image data 100-1, 100-2, ..., 100-N.
[0022] The inference device 1 performs an inverse transform on each of the confidence maps 110-2, ..., 110-N, corresponding to the image transformation performed when the corresponding image data 100-2, ..., 100-N was obtained. The confidence map 110-1 and the inversely transformed confidence maps 110-2, ..., 110-N are integrated to generate integrated data. This method of integrating confidence maps 110-1, ..., 110-N corresponds to a technique called ensemble in machine learning. The generated integrated data is fed to the neural network of the decoder part to perform upsampling and obtain image data 150, which is the result of semantic segmentation. The neural networks of the separated encoder part and the decoder part are used with the coefficients applied to the neurons, i.e., the weights and bias values, fixed in a learned state. Hereafter, the state in which the coefficients are fixed in a learned state will also be called "coefficient freezing".
[0023] Before describing each functional component of the inference device 1, the meaning of the variables and functions used in the description of each functional component will be explained below. Image data obtained from applications such as video surveillance and autonomous driving, which serves as input to the inference device 1 (hereinafter referred to as "input data"), is represented by the symbol in equation (1) below. Hereafter, when the symbol in equation (1) below is shown, it will be written as vector x.
[0024]
number
[0025] If the input data is, for example, RGB color image data, then there will be three channel directions, each representing the red, green, and blue pixel values for each pixel. In this case, vector x will be three-dimensional data with vertical, horizontal, and channel directions. The calculation of the confidence map from vector x by the backbone neural network is represented by the function S(·) shown in equation (2) below.
[0026]
number
[0027] The neural network that performs the operation of function S(·) is a pre-trained neural network and is used in a coefficient-frozen state. Specific examples of neural networks that perform the operation of function S(·) include, for example, the encoder portion of encoder-decoder type semantic segmentation neural networks such as SegNet and U-Net mentioned above, and FCN (Fully Convolutional Network).
[0028] The confidence map obtained by applying the function S(·) to the vector x is represented by the vector p, which is the first symbol from the left in equation (3), or the vector p, which is the second symbol from the left. chw It is expressed as follows. Note that p chwThe subscripts c, h, and w represent the channel index, h, and w respectively. In other words, the vector p representing the confidence map is 3-dimensional data. Therefore, by determining a set of c, h, and w values, one feature at the position of c, h, and w in vector p is identified.
[0029]
number
[0030] The operation of assigning a class to each pixel of vector x from a confidence map vector p is represented by the function g(·) shown in equation (4) below.
[0031]
number
[0032] A neural network that performs operations on function g(·) is, like a neural network that performs operations on function S(·), a pre-trained neural network used in a coefficient-frozen state. Specific examples of neural networks that perform operations on function g(·) include, for example, the decoder portion of encoder-decoder type semantic segmentation neural networks such as SegNet and U-Net mentioned above, and neural networks that perform upsampling, returning the size of vector p to the size of the original input data vector x.
[0033] The result of semantic segmentation, that is, the recognition result data obtained by applying the function S(·) to the vector x, and then applying the function g(·) to the result of function S(·), is represented by the symbol on the left side of equation (5) below. Hereafter in this text, the circumflexed vector y on the left side of equation (5) below will be written as "vector^y".
[0034]
number
[0035] (Configuration of the inference device in the embodiment) As shown in Figure 1, the inference device 1 comprises a transformation integration condition storage unit 10, a transformation parameter specification unit 11, a data acquisition unit 12, a transformation unit 13, a confidence map generation unit 14, an inverse transformation unit 15, a difference calculation unit 16, a difference correction unit 17, a confidence map correction unit 18, an integration unit 19, an analysis unit 20, and an output unit 21. The transformation integration condition storage unit 10 stores in advance N transformation parameters and one integration arithmetic expression that is selected in advance. Here, N is an integer of 2 or more. The set of transformation parameters stored in the transformation integration condition storage unit 10 is represented by the following equation (6).
[0036]
number
[0037] ξ1, ..., ξ in equation (6) n ,…,ξ N Each of these symbols represents an individual transformation parameter, and "{·}" is a symbol representing a set. Here, n is any integer between 1 and N. Below are the transformation parameters ξ n When written as such, it refers to any one arbitrary transformation parameter. Transformation parameter ξ1 is a transformation parameter that does not perform image transformation. As shown in Figure 2, the original input data, image data 100-1, is subjected to function S(·) without image transformation, and the confidence map 110-1 obtained by applying function S(·) becomes the target of integration. Therefore, the transformation parameters ξ1,…,ξ stored in the transformation integration condition storage unit 10 are… n ,…,ξ N It is necessary to include one transformation parameter that does not perform any transformation. Therefore, here, transformation parameter ξ1 is set as a transformation parameter that does not perform any image transformation.
[0038] Transformation parameters ξ2~ξ for N=2 or greater N These are the transformation parameters that perform the image transformation. Transformation parameters ξ2~ξN may be transformation parameters indicating transformations other than the above-described reduced image transformation, for example, geometric image transformations such as linear transformations such as enlargement, rotation, affine transformation, projective transformation, and non-linear transformation using B-spline interpolation. Further, the transformation parameters ξ2 to ξ N may be transformation parameters indicating optical image transformations such as color tone and color temperature conversions. Further, the transformation parameters ξ2 to ξ N may be parameters indicating image transformations related to improvement or deterioration of image quality (hereinafter referred to as "image quality") such as noise removal or addition, blur removal or addition.
[0039] Here, the noise is not limited to general white noise, and may be compression noise, adversarial noise that causes incorrect recognition, etc. Note that each of the transformation parameters ξ2 to ξ N contains data indicating the type of image transformation, and also contains a variable indicating the ratio of the transformation, for example, in the case of reduced image transformation, a variable indicating the reduction ratio. For example, when the transformation parameter ξ n is a transformation parameter indicating a two-dimensional rotation image transformation, one variable will be included. When the transformation parameter ξ n is a transformation parameter indicating an affine transformation, six variables will be included.
[0040] Note that the reason for assuming that transformation parameters indicating image transformations that add image degradation may be included in addition to transformation parameters indicating image transformations that reduce the influence of image degradation such as reduced image transformation is as follows. It is assumed that existing semantic segmentation or neural networks of future-proposed semantic segmentation are robust against image transformations that increase image degradation. In this case, it is assumed that adding an image transformation that increases image degradation will result in a higher-accuracy recognition result as the recognition result of semantic segmentation.
[0041] The conversion parameter specification unit 11 specifies the conversion parameters to be used by the conversion unit 13 and the inverse conversion unit 15. More specifically, the conversion parameter specification unit 11 specifies the N conversion parameters ξ1 to ξ stored in the conversion integration condition storage unit 10. N Read the data and the read conversion parameters ξ1~ξ N The conversion parameters are specified by outputting them one by one to the conversion unit 13 and the inverse conversion unit 15. The data acquisition unit 12 acquires input data, which is image data that is the object to be recognized and is provided from an external source, and outputs the acquired input data to the conversion unit 13.
[0042] The conversion unit 13 takes the input data output by the data acquisition unit 12 and the N conversion parameters ξ1 to ξ output by the conversion parameter specification unit 11. N Based on each of these, an image transformation represented by the following equation (7) is performed.
[0043]
number
[0044] In equation (7), D(·;ξ) on the right-hand side n ) applies the transformation parameter ξ to the image data given as an argument. n This is a transformation function that performs image transformation operations based on the following: The tilde-prefixed vector x on the left side of equation (7) n (Hereafter, the text will refer to vectors ~x n It is written as follows: ) is the conversion parameter ξ n The vector x shown below is the result of the image transformation based on the following. n This is called converted data. The set of N converted data generated by the conversion unit 13 through image conversion is represented by the following equation (8).
[0045]
number
[0046] As described above, the transformation parameter ξ1 is a transformation parameter that does not perform image transformation, so in equation (8), vector ~x1 and vector x1 are the same data.
[0047] The confidence map generation unit 14 applies the function S(·) shown in equation (2) to each of the N transformed data in the set of transformed data in equation (8) to generate N confidence map vectors ~p1,…,~p as shown in equation (9). n ,…~p N Generates.
[0048]
number
[0049] The inverse conversion unit 15 converts the N conversion parameters ξ1 to ξ output by the conversion parameter specification unit 11. N Based on each of these, the confidence map generation unit 14 performs an inverse transformation on each of the N confidence maps it generates, corresponding to the image transformation performed when the corresponding transformation data was generated. For example, the nth transformation parameter ξ n If the image transformation performed by is an image transformation that rotates by θ°, the inverse transformation unit 15 performs the inverse transformation by performing an image transformation that rotates the nth confidence map by -θ°. The inverse transformation performed by the inverse transformation unit 15 is expressed by the following equation (10).
[0050]
number
[0051] In equation (10), U(·;ξ) on the right-hand side n ) applies the transformation parameter ξ to the confidence map given as an argument. n This is a transformation function that performs the inverse transformation operation corresponding to the image transformation based on [the given formula]. The vector p on the left side of equation (10) n The inverse transform unit 15 converts the transformation parameter ξ n Based on this, the nth confidence map is a vector ~p nThis is the data obtained by inverse transforming. Below, vector p n This is called inverse transformed data.
[0052] Furthermore, when the integration unit 19, described below, integrates N inverse transform data, it is necessary for the integration unit 19 to be able to compare features at spatially identical locations in each of the N inverse transform data. Here, "spatially identical location" refers to the following location: The N inverse transform data are data generated based on vector x1. Therefore, for example, in vector x1, there exists a location where (c,h,w) corresponds to the location (1,1,1) in each of the N inverse transform data. In other words, the location of each of the N inverse transform data corresponds to one of the points in vector x1, and the locations of the N inverse transform data that have the same corresponding location in vector x1 are considered to be spatially identical.
[0053] Therefore, the inverse transform unit 15 generates N inverse transform data by performing alignment processing so that the positions indicated by the same c, h, and w in the N inverse transform data are spatially identical, in order for the integration unit 19 to compare feature quantities at the same spatial position in each of the N inverse transform data. The set of N inverse transform data generated by the inverse transform unit 15 is represented by the following equation (11).
[0054]
number
[0055] The difference calculation unit 16 selects a reference inverted confidence map from the inverted data, calculates the difference between that reference inverted confidence map and each of the inverted confidence maps, and generates a difference confidence map. Alternatively, the difference calculation unit 16 may select a reference inverted confidence map, calculate the difference between that reference inverted confidence map and each of the inverted confidence maps, and further calculate the maximum value for each pixel (for each position coordinate) from those confidence maps, and use this as the difference confidence map.
[0056] Before explaining the effect of the difference confidence map obtained by this difference calculation unit 16, we will use Figure 3 to explain the difference in image division results when the image is reduced to a large extent and image division is performed from a low-resolution image, and when the image is reduced to a small extent and image division is performed from a high-resolution image.
[0057] When performing image segmentation using low-resolution images, the advantage is that the segmentation results tend to be the same regardless of the type of degradation, and the segmentation results for large objects tend to be correct (Figure 3, 501). However, a disadvantage of using low-resolution images is that the segmentation results for small objects tend to be distorted (in Figure 3, 502, the person (yellow) is not detected). On the other hand, when performing image segmentation using high-resolution images, the advantage is that the segmentation results for small objects are more likely to be correctly inferred (in Figure 3, 503, the person is correctly inferred). However, a disadvantage of using high-resolution images is that the segmentation results for large objects are more likely to be significantly incorrect (in Figure 3, 504, the sidewalk is not correctly detected).
[0058] Therefore, as shown in Figure 4, the present invention aims to integrate the advantages of the image segmentation results of high-resolution images and the advantages of the image segmentation results of low-resolution images. To this end, the difference calculation unit 16 selects, for example, a confidence map corresponding to the image segmentation results in a low-resolution image as a reference inversely transformed confidence map, calculates the difference between that reference inversely transformed confidence map and each inversely transformed confidence map (for example, a confidence map corresponding to the segmentation results of a high-resolution image), and generates a difference confidence map.
[0059] As shown in Figure 4, 600, this difference confidence map shows large differences in areas where the high-resolution image segmentation result is significantly incorrect (i.e., disadvantages in the high-resolution image segmentation result) and areas where the high-resolution image segmentation result is correctly segmented, albeit in small areas (i.e., advantages in the high-resolution image segmentation result), as shown in Figure 4, 602. Therefore, the difference correction unit 17 and the confidence map correction unit 18, described later, correct only the portion of these difference areas shown in Figure 4, 600, in the confidence map corresponding to the high-resolution image, thereby improving the accuracy of the confidence map corresponding to the high-resolution image.
[0060] The difference correction unit 17 corrects the values of each region of the difference confidence map by referring to the values of the surrounding regions and generates a corrected difference confidence map. More specifically, the difference correction unit 17 refers to the values of surrounding pixels and performs region reduction and region expansion processing to erase the difference confidence map in small difference regions (e.g., 602) while saving the difference confidence map in large difference regions (e.g., 600).
[0061] For example, the region reduction process described here can be performed by taking a local minimum value for each local region. Taking a local minimum value for each local region, in this context, means, for example, finding the minimum value that a pixel in a predetermined region possesses and assigning that minimum value to the pixel in that region. Alternatively, the region expansion process described here can be performed by taking a local maximum value for each local region. Taking a local maximum value for each local region, in this context, means finding the maximum value that a pixel in a predetermined region possesses and assigning that maximum value to the pixel in that region.
[0062] By performing region reduction and region expansion in this way, it is possible to save a large difference region, as shown in 601 in Figure 4, while deleting a small difference region, as shown in 603 in Figure 4.
[0063] The confidence map correction unit 18 generates a corrected confidence map for each region of the corrected differential confidence map from the corrected differential confidence map, a certain reference inversely transformed confidence map, and each of the inversely transformed confidence maps. More specifically, as shown in Figure 5, regions where the value of the corrected differential confidence map is large are set to a value closer to the value of the confidence map of the low-resolution image (large reduction ratio) (i.e., the reference confidence map). On the other hand, regions where the value of the corrected differential confidence map is small are set to a value closer to the value of the confidence map of the high-resolution image (small reduction ratio) (i.e., the reference confidence map).
[0064] This is an example of using a reference confidence map p i The confidence map of the high-resolution image is p j The difference confidence map corresponding to this high-resolution image is S i Let's assume that the corrected confidence map can then be provided as follows.
[0065]
number
[0066] Here, ← means that a new value from the right-hand side is assigned to the left-hand side. Also, * indicates element-wise multiplication of a matrix with respect to spatial coordinates. Furthermore, σ(·) is a monotonically increasing function that takes values from zero to one, and activation functions such as the sigmoid function, Heaviside function, or hyperbolic tangent function can be used.
[0067] The integration unit 19 integrates N inverse transform data included in the set of inverse transform data of expression (11) based on one pre-selected integration arithmetic expression stored in the transformation integration condition storage unit 10. The integration process performed by the integration unit 19 is expressed by the following expression (13) using the concatenation operator ([·]).
[0068]
number
[0069] The left-hand side of equation (13) is a symbol representing the integrated data generated by the integration unit 19 by integrating N inverse transform data, and in the text below, the integrated data will be referred to as vector^p. The integration unit 19 performs the integration process so that the number of dimensions of the integrated data (i.e., vector^p) generated by the integration process is the same as the number of dimensions of the confidence map (i.e., vector p in equation (3)) obtained by applying the function S(·) shown in equation (2) to the input data vector x. The same number of dimensions here means that the size of the data, expressed as c×h×w, is the same, and more specifically, if the vector p in equation (3) is, for example, 8×256×256 in size, then the size of the integrated data vector^p is also 8×256×256.
[0070] The concatenation operator in equation (13) may simply be a unification operation expression that concatenates the N features at positions where c, h, and w are the same in each of the N inverse transformed data, that is, the sum of the N features becomes the feature of the unified data at that position. The concatenation operator in equation (13) may also be any of the unification operation expressions shown below.
[0071] For example, the concatenation operator in equation (13) may be five unification expressions represented by equations (14) to (18). The unification expression represented by equation (14) is an unification expression that performs an unification process in each of the N inverse transformed data such that the feature that is the maximum value among the N features at the same position where c, h, and w are the same becomes the feature of the unified data at that position.
[0072]
number
[0073] The unification formula represented by equation (15) below is an unification formula that performs unification processing such that, for each of the N inverse transformed data, the average value of the N features at positions where c, h, and w are the same is used as the feature of the unified data at that position.
[0074]
number
[0075] The unified operation formula, represented by equation (16), is such that for each of the N inverse transformed data, the weight w is predetermined for each of the N inverse transformed data, where c, h, and w are the same for the N features at the same position. n This is an integration formula that performs integration processing so that the largest multiplicative value obtained by multiplying by a certain factor is used as the feature of the integrated data at that location. This largest multiplicative value is also called the weighted maximum value.
[0076]
number
[0077] The unified operation formula, represented by equation (17), is that for each of the N inverse transformed data, a predetermined weight w is applied to the N features at the same position where c, h, and w are the same for each of the N inverse transformed data. n The multiplied values obtained by multiplying the values are assigned N weights w n This is an integration formula that performs integration processing so that the division value obtained by dividing by the sum of the values is used as the feature of the integrated data at that location. This division value is also called the weighted mean or weighted average.
[0078]
number
[0079] The unification formula represented by equation (18) is an unification formula that performs unification processing such that, for each of the N inverse transformed data, when the N features at positions where c, h, and w are the same are arranged in descending order, the average of the top k features is used as the feature of the unified data at that position.
[0080]
number
[0081] The unified expression of equation (18) can be expressed in more detail by equation (19).
[0082]
number
[0083] φ in equation (19) (k) nchw It is defined by the following equation (20).
[0084]
number
[0085] In equation (20), the function R(·) is a function that outputs the rank of the value given as an argument as its return value. For example, the inverse transformed data vector p1~p N In this case, the nth vector p n p is a feature of a single point. nchw However, suppose it is applied as an argument to the function R(·). In this case, the function R(·) is defined by the argument p nchw N p for which c, h, and w are identical 1chw ~p Nchw When the features are sorted in descending order, the argument p nchw The rank of the result is output as the return value. The value of k is an integer between 1 and N, and is a predetermined integer value. When k=1, equation (18) becomes the same as equation (14), that is, the unified operation expression that performs the unified operation to select the maximum value, and when k=N, equation (18) becomes the same as equation (15), that is, the unified operation expression that performs the unified operation to select the average value.
[0086] The concatenation operator in equation (13) may also be an unification expression as follows: For example, it may be an unification expression that unifies each of the N inverse transformed data so that the median feature among the N features at positions where c, h, and w are the same becomes the feature of the unified data at that position. Alternatively, it may be an unification expression that unifies each of the N inverse transformed data so that the median multiplier among the multiplied values obtained by multiplying the N features at positions where c, h, and w are the same by a weight wn, which has a predetermined value for each of the N inverse transformed data, becomes the feature of the unified data at that position. This multiplier is also called the weighted median.
[0087] The concatenation operator in equation (13) may also be an unification expression as follows: If one position in the unified data is c1, h1, w1, then when obtaining the feature quantities of the position c1, h1, w1 using the unification expression, instead of only considering the feature quantities of the position c1, h1, w1 in each of the N inverse transformed data, the unification process may be performed by applying the unification expressions shown in equations (14) to (17) above, including the feature quantities of neighboring positions to the position c1, h1, w1.
[0088] In this case, the types of transformation parameters that consider neighbors may be predetermined, and for inverse transformation data corresponding to the predetermined types of transformation parameters, neighbor features may be included, while for inverse transformation data corresponding to types other than the predetermined types of transformation parameters, neighbor features may not be included. When applying the unified calculation formula shown in equation (16) or equation (17), for inverse transformation data corresponding to the predetermined types of transformation parameters, neighbor features may be included and the value of the weight wn may be increased, while for inverse transformation data corresponding to types other than the predetermined types of transformation parameters, neighbor features may be included and the value of the weight wn may be decreased.
[0089] Here, "neighboring positions" may refer to, for example, 26 positions adjacent to c1, h1, w1 in the up, down, left, right, front, back, and diagonal directions, or 6 positions adjacent to c1, h1, w1 in the up, down, left, right, front, back, or any range including c1, h1, w1 that is included in a predetermined range.
[0090] An integrated calculation formula may be an expression that combines multiple of the integrated calculation formulas described above. For example, the integrated unit 19 may use the result of further integrating the integrated data generated based on formula (14) and the integrated data generated based on formula (15) based on formula (14) as the final integrated data.
[0091] The analysis unit 20 applies the function g(·) of equation (4) to the vector ^p, which is the integrated data generated by the integration unit 19 by integrating N inverse transform data. The process of applying the function g(·) of equation (4) is a recognition process, and specifically, it is an image segmentation process that divides the image region into classes, as performed in the semantic segmentation algorithm. The recognition process by the analysis unit 20 is expressed by the following equation (21).
[0092]
number
[0093] As shown in equation (21), the recognition processing by the analysis unit 20 yields a vector ^y, which is the recognition result data (i.e., data showing the result of semantic segmentation).
[0094] The output unit 21 may be a display device equipped with a screen, such as a liquid crystal display, or a storage device, such as a semiconductor memory or an HDD (Hard Disk Drive). If the output unit 21 is a display device, it displays the vector ^y, which is the recognition result data generated by the analysis unit 20 through the recognition process, on the screen. If the output unit 21 is a storage device, the analysis unit 20 writes the vector ^y, which is the recognition result data generated by the recognition process, to the output unit 21 for storage.
[0095] (Processing by the inference device of the embodiment) Figure 6 is a flowchart showing the processing flow by the inference device 1 in one embodiment of the present invention. When the inference device 1 is started, the conversion parameter specification unit 11 specifies the N conversion parameters ξ1 to ξ that are pre-stored in the conversion integration condition storage unit 10. N Read the data and retrieve the conversion parameters ξ1~ξ N The data is written to the internal memory area for storage (Step Sa1). The integration unit 19 reads one integration arithmetic expression that has been pre-stored in the conversion integration condition storage unit 10, and writes the read integration arithmetic expression to the internal memory area for storage (Step Sa2). The data acquisition unit 12 acquires input data, which is image data of the object to be recognized, provided from the outside, and outputs the acquired input data to the conversion unit 13 (Step Sa3).
[0096] The conversion parameter specification unit 11 specifies the conversion parameters ξ1 to ξ stored in the internal memory area. N Choose one of the following transformation parameters ξ n Select and read the selected conversion parameter ξ. The conversion parameter specification unit 11 reads the conversion parameter ξ. n The output is sent to the conversion unit 13 and the inverse conversion unit 15. This initiates the first loop processing from La1s to La1e, as shown in Figure 6.
[0097] The conversion unit 13 takes the input data output by the data acquisition unit 12 and the conversion parameter ξ output by the conversion parameter specification unit 11. nThe unit 13 takes in the input data vector x and the acquired transformation parameter ξ. n Based on this, the image transformation represented by equation (7) is performed to generate transformed data. The transformation unit 13 then processes the generated transformed data (i.e., vector ~x n The result is output to the confidence map generation unit 14 (step Sa4).
[0098] The confidence map generation unit 14 generates a vector ~x which is the transformed data output by the transformation unit 13. n The imported vector ~x n The confidence map is generated by applying the function S(·) shown in equation (2) to the result. The confidence map generation unit 14 generates the vector ~p which is the generated confidence map. n This is output to the inverse transformer 15 (step Sa5).
[0099] The inverse transformation unit 15 takes the confidence map output from the confidence map generation unit 14 and the transformation parameter ξ output from the transformation parameter specification unit 11. n The inverse transform unit 15 takes in the vector ~p which is the acquired confidence map. n The incorporated transformation parameter ξ n Based on this, the inverse transformation represented by equation (10) (i.e., the transformation unit 13 is the transformation parameter ξ n The inverse transform unit 15 performs an inverse transform (corresponding to the image transformation performed based on the above) to generate inverse transform data. The inverse transform unit 15 generates the generated inverse transform data (i.e., vector p n) This is output to the integration unit 19 (step Sa6).
[0100] The difference calculation unit 16 calculates the difference confidence level from the inverse transformed data (step Sa7). More specifically, the difference calculation unit 16 selects a certain standard inverse transformed confidence map from the inverse transformed data, calculates the difference between that standard inverse transformed confidence map and the inverse transformed data generated in step Sa6, and generates a difference confidence map.
[0101] The difference correction unit 17 corrects the difference confidence (step Sa8). More specifically, the difference correction unit 17 corrects the values of each region of the difference confidence map by referring to the values of the surrounding regions, and generates a corrected difference confidence map.
[0102] The confidence map correction unit 18 calculates the differential confidence from the inverse transformed data (step Sa9). More specifically, the confidence map correction unit 18 generates a corrected confidence map for each region of the corrected differential confidence map from the corrected differential confidence map generated in step Sa8, a certain standard inverse transformed confidence map, and the inverse transformed data generated in step Sa6.
[0103] The conversion parameter specification unit 11 specifies the conversion parameters ξ1 to ξ stored in the internal memory area. N One unselected transformation parameter ξ n (That is, one of the conversion parameters ξ that is not output to the conversion unit 13 and the inverse conversion unit 15) n This process involves repeatedly selecting the conversion parameters ξ1~ξ. N The processes from step Sa4 to Sa6, corresponding to each of these steps, will be repeated (loop processing from La1s to La1e).
[0104] In this case, the conversion parameter ξ that the conversion parameter specification unit 11 outputs to the conversion unit 13 is n However, if the conversion parameter ξ1 indicates that no image conversion is performed, the conversion unit 13 and the inverse conversion unit 15 will perform the following processing. In the processing of step Sa4, the conversion unit 13 outputs the input data vector x, which is acquired from the data acquisition unit 12, to the confidence map generation unit 14 as vector ~x1 without performing image conversion. Also, in the processing of step Sa6, the inverse conversion unit 15 outputs the confidence map vector ~p1, which is output by the confidence map generation unit 14, to the integration unit 19 as vector p1 without performing inverse conversion.
[0105] The integration unit 19 refers to the conversion integration condition storage unit 10 and converts the conversion parameters ξ1~ξN The number of such elements, "N", is detected. During the loop processing from La1s to La1, the integration unit 19 processes the inverse transform data, which is the vector p, output by the inverse transform unit 15. n The vector p is repeatedly incorporated. The integration unit 19 incorporates the incorporated vector p n As long as the number of elements does not match N, the vector p n The integration unit continues to process the incorporated vector p. n If the number of matches N, then based on the unified operation formula stored in the internal memory, N vectors p n The data is integrated to generate integrated data. The integration unit 19 outputs the generated integrated data (i.e., vector^p) to the analysis unit 20 (step Sa10).
[0106] The analysis unit 20 takes in vector ^p, which is integrated data output by the integration unit 19. The analysis unit 20 applies the function g(·) shown in equation (4) to the taken-in vector ^p and performs the recognition process shown in equation (21) to generate recognition result data. The analysis unit 20 outputs vector ^y, which is the recognition result data, to the output unit 21 (step Sa11).
[0107] The output unit 21 receives the recognition result data vector ^y output from the analysis unit 20. As described above, if the output unit 21 is a display device, the output unit 21 displays the received recognition result data vector ^y on the screen. If the output unit 21 is a storage device, the output unit 21 stores the received recognition result data vector ^y (step Sa12). This completes the inference processing performed by the inference device 1 for one input data. When the next input data is provided to the data acquisition unit 12, the processing from steps Sa3 to Sa9 will be performed again.
[0108] (Configuration of the condition selection device in the embodiment) In the inference device 1 described above, in order to obtain highly accurate recognition results, the combination of multiple conversion parameters pre-stored in the conversion integration condition storage unit 10 and one integration calculation formula must be selected so as to be the optimal combination for the image degradation occurring in the input data. The condition selection device 2 shown in Figure 7, which will be described below, is a device that selects this optimal combination.
[0109] The input data provided to the data acquisition unit 12 of the inference device 1 described above is assumed to be multiple image data arranged in a time series obtained from a specific application, such as video surveillance or autonomous driving. This specific application, for example, acquires multiple image data captured at regular time intervals by a specific camera. Since the acquired multiple image data are captured by the same camera and acquired by the same application, each of the acquired multiple image data will experience a common image degradation. Here, the image degradation occurring in each of the multiple image data is an unknown image degradation, but it is a common image degradation across the multiple image data, and is assumed to be of a degree that allows a person to visually determine the class of each pixel of the image data.
[0110] Figure 7 is a block diagram showing the configuration of a condition selection device 2 in one embodiment of the present invention. In the condition selection device 2 shown in Figure 7, components similar to those in the inference device 1 shown in Figure 1 are denoted by the same reference numerals and their descriptions may be omitted. As shown in Figure 7, the condition selection device 2 comprises a conversion integrated condition storage unit 10, a conversion parameter specification unit 11a, a conversion unit 13, a confidence map generation unit 14, an inverse conversion unit 15, a difference calculation unit 16, a difference correction unit 17, a confidence map correction unit 18, an integration unit 19a, an analysis unit 20, a training data storage unit 22, a data reading unit 23, an integrated calculation formula selection unit 24, and a processing result storage unit 25.
[0111] The training data storage unit 22 pre-stores multiple training data. As described above, each of the multiple image data obtained from a specific application, arranged in time series, contains unknown image degradation, but this degradation is such that a person can visually determine the class of each pixel in the image data. Therefore, not all of the multiple image data obtained from a specific application (i.e., multiple input data) is used as data to be given to the inference device 1, but rather some of the input data is selected as training data to be applied to the condition selection device 2. Then, ground truth data indicating the class of each pixel corresponding to each of the selected training input data is generated. Each of the generated ground truth data is associated with its corresponding training input data, and thus becomes multiple training data. That is, one training data contains one training input data and one ground truth data corresponding to that training input data.
[0112] Multiple training data sets generated in this manner are pre-stored in the training data storage unit 22. Here, the correct answer data corresponding to a single training input data vector x is represented by the symbol shown in equation (22), and hereafter, in this text, the symbol in equation (22) will be written as vector y.
[0113]
number
[0114] Each time the data reading unit 23 receives a start instruction signal or a continue instruction signal, it reads one training data from the training data storage unit 22. If the data reading unit 23 is unable to read training data from the training data storage unit 22, it outputs a termination instruction signal to the integrated calculation formula selection unit 24. If the data reading unit 23 is able to read training data from the training data storage unit 22, it outputs a start instruction signal to the conversion parameter specification unit 11a. The data reading unit 23 outputs the training input data contained in the read training data to the conversion unit 13, and outputs the correct answer data contained in the training data to the integrated calculation formula selection unit 24.
[0115] The conversion parameter specification unit 11a specifies N conversion parameters ξ1~ξ that indicate the image conversion provided from an external source. N The conversion parameter specification unit 11a, upon receiving the start instruction signal, takes in the N conversion parameters ξ1~ξ. N One transformation parameter ξ from among them n Select the selected conversion parameter ξ. The conversion parameter specification unit 11a specifies the selected conversion parameter ξ n The conversion parameter ξ is output to the conversion unit 13 and the inverse conversion unit 15, and used by the conversion unit 13 and the inverse conversion unit 15. n Specify.
[0116] The difference calculation unit 16 selects a reference inverted confidence map from the inverted data, calculates the difference between the reference inverted confidence map and each of the inverted confidence maps, and generates a difference confidence map. Alternatively, the difference calculation unit 16 may select a reference inverted confidence map, calculate the difference between the reference inverted confidence map and each of the inverted confidence maps, and further calculate the maximum value for each pixel (for each position coordinate) from those confidence maps, and use this as the difference confidence map.
[0117] Before explaining the effect of the difference confidence map obtained by this difference calculation unit 16, we will use Figure 3 to explain the difference in image division results when the image is reduced to a large extent and image division is performed from a low-resolution image, and when the image is reduced to a small extent and image division is performed from a high-resolution image.
[0118] When performing image segmentation using low-resolution images, the advantage is that the segmentation results tend to be the same regardless of the type of degradation, and the segmentation results for large objects tend to be correct (Figure 3, 501). However, a disadvantage of using low-resolution images is that the segmentation results for small objects tend to be distorted (in Figure 3, 502, the person (yellow) is not detected). On the other hand, when performing image segmentation using high-resolution images, the advantage is that the segmentation results for small objects are more likely to be correctly inferred (in Figure 3, 503, the person is correctly inferred). However, a disadvantage of using high-resolution images is that the segmentation results for large objects are more likely to be significantly incorrect (in Figure 3, 504, the sidewalk is not correctly detected).
[0119] Therefore, as shown in Figure 4, the present invention aims to integrate the advantages of the image segmentation results of high-resolution images and the advantages of the image segmentation results of low-resolution images. To this end, the difference calculation unit 16 selects, for example, a confidence map corresponding to the image segmentation results in a low-resolution image as a reference inversely transformed confidence map, calculates the difference between that reference inversely transformed confidence map and each inversely transformed confidence map (for example, a confidence map corresponding to the segmentation results of a high-resolution image), and generates a difference confidence map.
[0120] As shown in Figure 4, 600, this difference confidence map shows large differences in areas where the high-resolution image segmentation result is significantly incorrect (i.e., disadvantages in the high-resolution image segmentation result) and areas where the high-resolution image segmentation result is correctly segmented, albeit in small areas (i.e., advantages in the high-resolution image segmentation result), as shown in Figure 4, 602. Therefore, the difference correction unit 17 and the confidence map correction unit 18, described later, correct only the portion of these difference areas shown in Figure 4, 600, in the confidence map corresponding to the high-resolution image, thereby improving the accuracy of the confidence map corresponding to the high-resolution image.
[0121] The difference correction unit 17 corrects the values of each region of the difference confidence map by referring to the values of the surrounding regions and generates a corrected difference confidence map. More specifically, the difference correction unit 17 refers to the values of surrounding pixels and performs region reduction and region expansion processing to erase the difference confidence map in small difference regions (e.g., 602) while saving the difference confidence map in large difference regions (e.g., 600).
[0122] For example, the region reduction process described here can be performed by taking a local minimum value for each local region. Taking a local minimum value for each local region, in this context, means, for example, finding the minimum value that a pixel in a predetermined region possesses and assigning that minimum value to the pixel in that region. Alternatively, the region expansion process described here can be performed by taking a local maximum value for each local region. Taking a local maximum value for each local region, in this context, means finding the maximum value that a pixel in a predetermined region possesses and assigning that maximum value to the pixel in that region.
[0123] By performing region reduction and region expansion in this way, it is possible to save a large difference region, as shown in 601 in Figure 4, while deleting a small difference region, as shown in 603 in Figure 4.
[0124] The confidence map correction unit 18 generates a corrected confidence map for each region of the corrected differential confidence map from the corrected differential confidence map, a certain reference inversely transformed confidence map, and each of the inversely transformed confidence maps. More specifically, as shown in Figure 5, regions where the value of the corrected differential confidence map is large are set to a value closer to the value of the confidence map of the low-resolution image (large reduction ratio) (i.e., the reference confidence map). On the other hand, regions where the value of the corrected differential confidence map is small are set to a value closer to the value of the confidence map of the high-resolution image (small reduction ratio) (i.e., the reference confidence map).
[0125] This is an example of using a reference confidence map pi The confidence map of the high-resolution image is p j The difference confidence map corresponding to this high-resolution image is S i Let's assume that the corrected confidence map can then be provided as follows.
[0126]
number
[0127] Here, ← means that a new value from the right-hand side is assigned to the left-hand side. Also, * indicates element-wise multiplication of a matrix with respect to spatial coordinates. Furthermore, σ(·) is a monotonically increasing function that takes values from zero to one, and any activation function such as the sigmoid function, Heaviside function, or hyperbolic tangent function can be used.
[0128] The integration unit 19a takes in M integration formulas provided from the outside and generates M indices for "integration formula 1", ..., "integration formula M" corresponding to each of the M integration formulas taken in. Here, M is an integer greater than or equal to 2. Based on each of the M integration formulas taken in, the integration unit 19a performs integration processing to integrate N inverse transform data contained in the set of inverse transform data of formula (11) and generate integrated data.
[0129] The integration unit 19a outputs the index corresponding to the integration formula used in the integration process to the integration formula selection unit 24, one by one, in the order in which they were used. The integration unit 19a also outputs the integrated data to the analysis unit 20, one by one, in the order in which it was generated. In other words, when the integration unit 19a performs integration processing based on a certain integration formula and generates integrated data, it outputs the index corresponding to that certain integration formula to the integration formula selection unit 24, and also outputs the integrated data generated based on that certain integration formula to the analysis unit 20.
[0130] The integrated calculation formula selection unit 24 outputs a continuation instruction signal to the data reading unit 23 when it has obtained M recognition result data for a given training data set. When the integrated calculation formula selection unit 24 receives a termination instruction signal, it selects the optimal integrated calculation formula for the image degradation occurring in the input data included in the training data, based on all the recognition result data obtained up to the time the termination instruction signal was received, and the correct answer data corresponding to each of the recognition result data.
[0131] The processing result storage unit 25 stores, for each correct data output by the data reading unit 23 to the integrated calculation formula selection unit 24, N conversion parameters output by the conversion parameter specification unit 11a to the integrated calculation formula selection unit 24 and M recognition result data output by the analysis unit 20 to the integrated calculation formula selection unit 24, in association with each correct data output by the data reading unit 23 to the integrated calculation formula selection unit 24. The conversion integrated condition storage unit 10 of the condition selection device 2 does not store data in its initial state, and stores multiple conversion parameters and one integrated calculation formula when the selection process by the integrated calculation formula selection unit 24 is completed. The conversion integrated condition storage unit 10 that stores multiple conversion parameters and one integrated calculation formula will be used as the conversion integrated condition storage unit 10 of the inference device 1 shown in Figure 1.
[0132] (Processing by the condition selection device of the embodiment) Figure 8 is a flowchart showing the processing flow by the condition selection device 2 in one embodiment of the present invention. As a prerequisite for the start of the flowchart shown in Figure 8, multiple training data sets are pre-written to the training data storage unit 22 of the condition selection device 2. Furthermore, the conversion integration condition storage unit 10 and the processing result storage unit 25 are assumed to be initialized and not storing any data.
[0133] The conversion parameter specification unit 11a of the condition selection device 2 specifies N conversion parameters ξ1~ξ provided from an external source. N The N transformation parameters ξ1~ξ are incorporated and imported. NThe data is written to the internal memory area and stored (step Sb1). The integration unit 19a takes in M integration formulas provided from the outside and generates M indices of "integration formula 1", ..., "integration formula M" corresponding to each of the M integrated formulas that have been taken in. The integration unit 19a associates the M integrated formulas that have been taken in with the M indices of "integration formula 1", ..., "integration formula M" corresponding to each of the M integrated formulas and writes them to the internal memory area and stores them. The integration unit 19a outputs the generated M indices of "integration formula 1", ..., "integration formula M" to the integration formula selection unit 24.
[0134] The integrated calculation formula selection unit 24 takes in the indexes of M "integrated calculation formula 1", ..., "integrated calculation formula M". The integrated calculation formula selection unit 24 generates a table in the processing result storage unit 25 that has the items "correct data" and "conversion parameters" and the M items of "integrated calculation formula 1", ..., "integrated calculation formula M" corresponding to each of the M indexes of the taken "integrated calculation formula 1", ..., "integrated calculation formula M" (step Sb2).
[0135] When the data reading unit 23 receives a start command signal from an external source (step Sb3), the first loop processing from Lb1s to Lb1e begins. The data reading unit 23 reads one of the training data from the training data storage unit 22. After reading the training data, the data reading unit 23 outputs a start command signal to the conversion parameter specification unit 11a. The data reading unit 23 outputs the input data contained in the read training data to the conversion unit 13. The conversion unit 13 takes in the input data output by the data reading unit 23. The data reading unit 23 outputs the correct answer data contained in the read training data to the integrated calculation formula selection unit 24 (step Sb4).
[0136] The integrated arithmetic expression selection unit 24 captures the correct answer data output by the data reading unit 23. Hereinafter, the vector y, which is the correct answer data captured by the integrated arithmetic expression selection unit 24, is denoted as vector y1, vector y2, …, with the number indicating the order of capture as the subscript. When the integrated arithmetic expression selection unit 24 captures the first correct answer data, i.e., vector y1, it generates one record in the table of the processing result storage unit 25 and writes the captured vector y1 into the "correct answer data" item of the generated record.
[0137] When the conversion parameter specifying unit 11a acquires a start instruction signal from the data reading unit 23, it reads out N conversion parameters ξ1 to ξ N from its internal storage area and outputs the read N conversion parameters ξ1 to ξ N to the integrated arithmetic expression selection unit 24. The integrated arithmetic expression selection unit 24 captures the N conversion parameters ξ1 to ξ N output by the conversion parameter specifying unit 11a. The integrated arithmetic expression selection unit 24 writes the captured N conversion parameters ξ1 to ξ N into the "conversion parameter" item of the record generated most recently in the table of the processing result storage unit 25 (i.e., the record in which vector y1 is written in the "correct label" item). The conversion parameter specifying unit 11a outputs the value of "N", which is the number of the N conversion parameters ξ1 to ξ N stored in its internal storage area, to the integration unit 19a. The integration unit 19a captures the value of "N" output by the conversion parameter specifying unit 11a (step Sb5).
[0138] When the conversion parameter specifying unit 11a acquires a start instruction signal from the data reading unit 23, it reads out any one of the N conversion parameters ξ1 to ξ N stored in its internal storage area. The conversion parameter specifying unit 11a reads out the read conversion parameter ξ n and outputs the read conversion parameter ξ nThe output is sent to the conversion unit 13 and the inverse conversion unit 15. This starts the first loop processing from Lb2s to Lb2e. The loop processing from Lb2s to Lb2e is the same as the loop processing from La1s to La1e shown in Figure 6. Note that the processing in step Sb5 and the loop processing from Lb2s to Lb2e are both processes that start when the data reading unit 23 outputs the input data and the correct answer data in the processing in step Sb4, and are therefore performed in parallel.
[0139] The integration unit 19a references the value of "N" acquired in the processing of step Sb5. During the loop processing from Lb2s to Lb2e, the integration unit 19a processes the inverse transform data, which is the vector p1~p output by the inverse transform unit 15. N It takes in the vector p. The integration unit 19a takes in the vector p. n When the number of such expressions reaches N, one integrated expression is selected from the M integrated expressions stored in the internal memory area, each assigned an index as "integrated expression 1", ..., "integrated expression M", along with the index of "integrated expression m" associated with that expression. Here, m is any integer between 1 and M.
[0140] The integration unit 19a outputs the index of the selected "integration formula m" to the integration formula selection unit 24. Based on the selected integration formula, the integration unit 19a processes the acquired vectors p1~p N The data is integrated to generate integrated data. The integration unit 19a outputs the generated integrated data to the analysis unit 20 (step Sb9).
[0141] The analysis unit 20 performs recognition processing similar to the process in step Sa8 shown in Figure 6 to generate recognition result data. The analysis unit 20 outputs the generated recognition result data to the integrated calculation formula selection unit 24 (step Sb10).
[0142] The integrated expression selection unit 24 takes in the index of the "integrated expression m" output from the integration unit 19a in the process of step Sb9 and the recognition result data output from the analysis unit 20 in the process of step Sb10. The integrated expression selection unit 24 detects the most recently generated record (i.e., the record in which the vector y1 is written in the "correct label" item) in the table of the processing result storage unit 25. The integrated expression selection unit 24 writes the taken-in recognition result data into the item of the "integrated expression m" corresponding to the index of the taken-in "integrated expression m" in the detected record. Here, when m = 2, the vector y1 which is the correct data and the vector ^y which is the recognition result data corresponding to the index of the "integrated expression 2" with m = 2 1,2 is written into the item of the "integrated expression 2" (step Sb11).
[0143] The integration unit 19a selects any one of the unselected integrated expressions. Thereby, the processes from step Sb9 to Sb11 (i.e., the loop process from Lb3s to Lb3e) are performed again. When the processes from step Sb9 to Sb11 corresponding to each of the M integrated expressions are completed, the loop process from Lb3s to Lb3e is completed. When the integrated expression selection unit 24 writes the recognition result data into all the items of "integrated expression 1", …, "integrated expression M" in the record corresponding to the vector y1 in the table of the processing result storage unit 25, it outputs a continuation instruction signal to the data reading unit 23.
[0144] When the data reading unit 23 acquires the continuation instruction signal from the integrated expression selection unit 24, it reads out any one of the training data that has not been read out as the processing target so far from the training data storage unit 22. When the data reading unit 23 reads out the training data, it outputs a start instruction signal to the conversion parameter specifying unit 11a. Thereby, for the training data read out by the data reading unit 23, the processes of step Sb4, Sb5, the loop process from Lb2s to Lb2e, and the loop process from Lb3s to Lb3e (i.e., the loop process from Lb1s to Lb1e) are performed.
[0145] When the data reading unit 23 has read all the training data stored in the training data storage unit 22 for processing, it cannot read any more training data. Therefore, it does not output a start instruction signal to the conversion parameter specification unit 11a, but outputs a stop instruction signal to the integrated calculation formula selection unit 24.
[0146] When the integrated calculation formula selection unit 24 receives a termination instruction signal from the data reading unit 23, it refers to the processing result storage unit 25. Based on all the recognition result data stored in the processing result storage unit 25 and the correct answer data corresponding to each of the recognition result data, the integrated calculation formula selection unit 24 calculates, for example, the degree to which the recognition result data matches the correct answer data corresponding to that recognition result data. The integrated calculation formula selection unit 24 detects the combination of correct answer data and recognition result data with the largest calculated degree of match. Based on the detected recognition result data, the integrated calculation formula selection unit 24 detects the conversion parameters and integrated calculation formula to be written to the conversion integration condition storage unit 10. The integrated calculation formula selection unit 24 writes the detected conversion parameters and integrated calculation formula to the conversion integration condition storage unit 10 for storage (step Sb12).
[0147] For example, the integrated calculation formula selection unit 24 selects the combination of the correct data and the recognition result data that has the greatest degree of agreement, which is the vector y2 of the correct data and the vector ^y of the recognition result data. 2,2 It is assumed that the following has been detected. In this case, the integrated calculation formula selection unit 24 selects the conversion parameters ξ1~ξ written in the "Conversion Parameters" field of the record containing the vector y2. N And, vector ^y 2,2 The integrated calculation formula corresponding to the index of "Integrated Calculation Formula 2" is written to the conversion integrated condition storage unit 10. Once the writing process to the conversion integrated condition storage unit 10 by the integrated calculation formula selection unit 24 is completed, the process shown in the flowchart of Figure 8 is completed.
[0148] Conversion parameters ξ1~ξ written to the conversion integration condition storage unit 10 NThe combination of the integrated calculation formula corresponding to the index of "integrated calculation formula 1" is the optimal combination for dealing with the image degradation occurring in the input data provided to the inference device 1. Therefore, by using the conversion integrated condition storage unit 10 as the conversion integrated condition storage unit 10 of the inference device 1, highly accurate recognition results can be obtained in the inference device 1.
[0149] Furthermore, the combination of multiple transformation parameters and one integrated calculation formula selected by the condition selection device 2 is optimized for multiple input data containing common image degradation. Therefore, when input data containing image degradation different from that of the input data stored in the training data storage unit 22 of the condition selection device 2 is provided to the inference device 1, it is necessary to use the condition selection device 2 again to select a new combination of multiple transformation parameters and one integrated calculation formula.
[0150] (Other configuration examples of the condition selection device of the embodiment) In the condition selection device 2 of the above embodiment, the conversion parameter specification unit 11a selects N conversion parameters ξ1 to ξ from an external source. N All of these are used to perform loop processing from Lb2s to Lb2e. Alternatively, the conversion parameter specification unit 11a may change the combination of conversion parameters specified to the conversion unit 13 and the inverse conversion unit 15 for each training data. For example, the conversion parameter specification unit 11a takes in more than N conversion parameters given from an external source. Each time the conversion parameter specification unit 11a receives a start instruction signal from the data reading unit 23, it randomly selects N conversion parameters, including a conversion parameter ξ1 that does not undergo image conversion, from the acquired conversion parameters. Then, the loop processing from Lb2s to Lb2e may be performed based on each of the N conversion parameters randomly selected by the conversion parameter specification unit 11a.
[0151] In this case, if the number of conversion parameters provided externally to the conversion parameter specification unit 11a is L, then, for example, the value of N may be set to a value of about 10% of L. The conversion parameter specification unit 11a may not fix the number of randomly selected parameters to N, but may arbitrarily change the number of conversion parameters selected each time a selection is made.
[0152] By the way, as described above, in the processing of step Sb5, the conversion parameter specification unit 11a provides the integration unit 19a with N conversion parameters ξ1~ξ stored in its internal memory area. N The system outputs the value "N", which is the number of such parameters. In contrast, the conversion parameter specification unit 11a, each time it receives a start instruction signal, will output the number of selected conversion parameters to the integration unit 19a if it changes the combination of conversion parameters to be specified.
[0153] As described above, if the combination of transformation parameters changes for each training data set, the transformation parameters stored in the "Transformation Parameters" column of the table in the processing result storage unit 25 will change for each record. This makes it possible to select a combination of transformation parameters and an integrated calculation formula that is optimal for multiple input data sets containing common image degradation, and in which the number of transformation parameters is less than the number of transformation parameters provided externally. Therefore, it becomes possible to reduce the processing load on the inference device 1.
[0154] Although M, the number of integration formulas provided externally to the integration unit 19a of the condition selection device 2, is assumed to be an integer of 2 or more, if the combination of transformation parameters changes for each training data set, M may be set to 1 (i.e., the number of integration formulas in the integration unit 19a may be set to 1). In the case of M=1, the selection of integration formulas will not be performed, but the condition selection device 2 will select the optimal transformation parameters for multiple input data sets that contain common image degradation.
[0155] In the inference device 1 of the above embodiment, the conversion parameter specification unit 11 specifies the conversion parameters. The conversion unit 13 performs a conversion on the input data based on each of the specified conversion parameters to generate converted data. The confidence map generation unit 14 generates a confidence map for each of the converted data, which is data that shows the characteristics of each of the converted data. The inverse conversion unit 15 performs an inverse conversion on each of the confidence maps, based on each of the specified conversion parameters, to generate inverse converted data. The integration unit 19 performs an integration process to integrate each of the inverse converted data, generating integrated data whose number of dimensions matches the confidence map of the input data. The analysis unit 20 performs recognition processing on the integrated data as an example of analysis processing.
[0156] In this way, by generating a confidence map of the transformed data generated from the input data through transformation, integrating the generated confidence map by inverse transformation, and performing recognition processing on the integrated data, it becomes possible to perform inference processing that is robust to unknown degradation occurring in the input data.
[0157] In other words, the inference device 1 according to the embodiment has a configuration that reduces image degradation by performing multiple image transformations on the input data before performing recognition processing, then extracts features, and integrates the confidence map containing the extracted features by inverse transformation. To put it another way, the method employed in the inference device 1 is a combination of existing methods, including an image transformation by the transformation unit 13, an inverse transformation of the image transformation by the inverse transformation unit 15, an ensemble by the integration unit 19, and a semantic segmentation algorithm that includes downsampling and upsampling by the confidence map generation unit 14 and the analysis unit 20.
[0158] Although this method combines existing techniques, it is an effective method that can mitigate the effects of unknown image degradation. This method also allows the use of existing semantic segmentation algorithms without retraining them. Therefore, inference device 1 enables robust inference processing against unknown degradation occurring in the input data without requiring retraining such as fine-tuning of the trained semantic segmentation neural network.
[0159] According to the embodiment described above, the inference device comprises a conversion parameter specification unit, a conversion unit, a confidence map generation unit, an inverse conversion unit, a difference calculation unit, a difference correction unit, a confidence map correction unit, an integration unit, and an analysis unit. For example, the inference device is the inference device 1 in the embodiment, the conversion parameter specification unit is the conversion parameter specification unit 11 in the embodiment, the conversion unit is the conversion unit 13 in the embodiment, the confidence map generation unit is the confidence map generation unit 14 in the embodiment, the inverse conversion unit is the inverse conversion unit 15 in the embodiment, the difference calculation unit is the difference calculation unit 16 in the embodiment, the difference correction unit is the difference correction unit 17 in the embodiment, the confidence map correction unit is the confidence map correction unit 18 in the embodiment, the integration unit is the integration unit 19 in the embodiment, and the analysis unit is the analysis unit 20 in the embodiment.
[0160] The above conversion parameter specification unit specifies one or more conversion parameters. The above conversion unit generates converted data by performing conversions on the input data based on each of the specified conversion parameters. For example, the conversion parameters are ξ1, ..., ξ in the embodiment. n ,…,ξ N Each of these is the input data, the vector x in the embodiment, and the transformation is the function D(·;ξ in the embodiment. n This is an operation of ), and the transformed data is the vector ~x1, ..., ~x in the embodiment. n ,…,~x NThese are each of the above. The confidence map generation unit generates a confidence map for each converted data, which is data that shows the characteristics of each of the converted data. For example, the confidence map is a vector ~p1, ..., ~p in the embodiment. n ,…~p N The inverse transformation unit described above generates inverse transformation data by performing an inverse transformation on each of the confidence maps based on the specified transformation parameters, which is the same transformation performed when the transformation data was generated. For example, the inverse transformation is an operation on the function U(·;ξn) in the embodiment, and the inverse transformation data is the vector p in the embodiment. n The difference calculation unit described above selects a reference inverted confidence map from the inverted data and generates a difference confidence map by calculating the difference between the reference inverted confidence map and each inverted confidence map. For example, the difference confidence map is S in the embodiment. i The above difference correction unit generates a corrected difference confidence map by correcting the values of each region of the difference confidence map by referring to the values of the surrounding regions. For example, the corrected difference confidence map is S in the embodiment. i The above confidence map correction unit generates a corrected confidence map for each region of the corrected differential confidence map based on the corrected differential confidence map, the reference inversely transformed confidence map, and each of the inversely transformed confidence maps. For example, the corrected confidence map is p in the embodiment. j The above integration unit generates integrated data whose number of dimensions matches the confidence map of the input data by performing an integration process that integrates each of the inverse transformed data. For example, the integration process is ^p=[p1,…,p n ,…,p N This is an operation, and the integrated data is the vector ^p in the embodiment. The analysis unit performs analysis on the integrated data. For example, the recognition process is a recognition process that applies the function g(·) to the vector ^p in the embodiment.
[0161] Furthermore, the analysis unit may perform recognition processing as part of its analysis process, and the integration unit may perform integration processing using one optimal integration formula selected from a plurality of integration formulas based on the recognition result obtained by the analysis unit's recognition processing on the integrated data generated from input data to which the correct answer data is attached, and the correct answer data corresponding to the integrated data.
[0162] The above integration formula may also be a formula that calculates the feature value of the integrated data at the corresponding position based on the feature value of the corresponding position in each of the inverse transformed data to be integrated.
[0163] The above integration formula may also be a formula that calculates the feature value of the integrated data at the corresponding position in each of the inverse transformed data to be integrated, based on the feature values of the corresponding position and the neighboring positions of that position.
[0164] Furthermore, the above-mentioned inference device may further include a quality evaluation unit for evaluating the quality of the input data, and if the integration unit includes an operation that uses weights applied to each of the inverse-transformed data in the integration process, the integration unit may determine the weight values of the input data corresponding to the inverse-transformed data based on quality data obtained by the quality evaluation unit.
[0165] Furthermore, the above-mentioned analysis unit may perform recognition processing as part of its analysis process, and the conversion parameter specification unit may specify the selected conversion parameters based on the recognition results obtained by the analysis unit's recognition processing performed on each of the inversely transformed data obtained from the input data to which the correct answer data is attached, and the correct answer data corresponding to the inversely transformed data.
[0166] Furthermore, the above-described processing may be performed by recording the programs for realizing the functions of the inference device 1 and the condition selection device 2 in the embodiment onto a computer-readable recording medium, loading the program recorded on this recording medium into a computer system, and executing it. Here, "computer system" includes hardware such as the OS and peripheral devices. "Computer system" also includes a WWW system equipped with a homepage provisioning environment (or display environment). "Computer-readable recording medium" refers to portable media such as flexible disks, magneto-optical disks, ROMs, CD-ROMs, and storage devices such as hard disks built into a computer system. Moreover, "computer-readable recording medium" also includes volatile memory (RAM) inside a computer system that acts as a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line, which holds the program for a certain period of time.
[0167] Furthermore, the above program may be transmitted from a computer system that stores the program in a memory device or the like to another computer system via a transmission medium or by transmission waves within the transmission medium. Here, the "transmission medium" for transmitting the program refers to a medium that has the function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. Also, the above program may be for the purpose of realizing only a part of the functions described above. Furthermore, it may be a so-called differential file (differential program) that can realize the above functions in combination with a program already recorded in the computer system.
[0168] Although one embodiment of this invention has been described in detail above with reference to the drawings, the specific configuration is not limited to this embodiment and includes designs and the like that do not depart from the spirit of this invention. [Explanation of Symbols]
[0169] 1...Inference unit, 2...Condition selection unit, 10...Conversion integration condition storage unit, 11,11a...Conversion parameter specification unit, 12...Data acquisition unit, 13...Conversion unit, 14...Confidence map generation unit, 15...Inverse conversion unit, 16...Difference calculation unit, 17...Difference correction unit, 18...Confidence map correction unit, 19,19a...Integration unit, 20...Analysis unit, 21...Output unit, 22...Training data storage unit, 23...Data reading unit, 24...Integration formula selection unit, 25...Processing result storage unit
Claims
1. A conversion parameter specification step in which one or more conversion parameters are specified, A conversion step that generates converted data by performing a conversion on the input data based on each of the specified conversion parameters, A confidence map generation step for each of the aforementioned converted data, which generates a confidence map that is data showing the respective characteristics of the aforementioned converted data, For each of the confidence maps, an inverse transformation step is performed to generate inverse transformed data by performing an inverse transformation on the transformation performed when the transformed data was generated, based on the specified transformation parameters. A difference calculation step to select a reference inverted confidence map from the inverted data, and generate a difference confidence map by calculating the difference between the reference inverted confidence map and each of the inverted confidence maps, A differential correction step to generate a corrected differential confidence map by correcting the values of each region in the differential confidence map by referring to the values of surrounding regions, A confidence map correction step that generates a corrected confidence map for each region of the corrected differential confidence map based on the corrected differential confidence map, the reference inversely transformed confidence map, and each of the inversely transformed confidence maps, An integration step is to perform an integration process that combines each of the inverse transform data to generate integrated data whose dimensionality matches that of the confidence map of the input data. An analysis step which involves performing an analysis on the integrated data, An inference method having
2. In the aforementioned analysis step, recognition processing is performed as the analysis process, In the integration step, the integration process is performed using one optimal integration formula selected from a plurality of integration formulas based on the recognition result obtained by the recognition process of the analysis step, which is performed on the integrated data generated from the input data to which the correct answer data is attached, and the correct answer data corresponding to the integrated data. The inference method according to claim 1.
3. The aforementioned integrated calculation formula is: This is an expression that performs a calculation to determine the feature value of the integrated data at the corresponding position based on the feature value of the corresponding position in each of the inverse transformed data to be integrated. The inference method according to claim 2.
4. The aforementioned integrated calculation formula is: This is an expression that performs a calculation to determine the feature value of the integrated data at a given position, based on the feature values of the corresponding position in each of the inverse transformed data to be integrated, and the feature values of the positions near that position. The inference method according to claim 2.
5. In the aforementioned analysis step, recognition processing is performed as the analysis process, In the conversion parameter specification step, the selected conversion parameters are specified based on the recognition results obtained by the recognition process of the analysis step, which is performed on each of the inverse conversion data obtained from the input data to which the correct answer data is attached, and the correct answer data corresponding to the inverse conversion data. The inference method according to any one of claims 1 to 4.
6. A conversion parameter specification section for specifying one or more conversion parameters, A conversion unit that generates converted data by performing a conversion on the input data based on each of the specified conversion parameters, A confidence map generation unit generates a confidence map for each of the aforementioned converted data, which is data that shows the characteristics of each of the aforementioned converted data. For each of the confidence maps, an inverse transformation unit generates inverse transformation data by performing an inverse transformation on the transformation performed when the transformation data was generated, based on the specified transformation parameters. A difference calculation unit selects a reference inverted confidence map from the inverted data, and generates a difference confidence map by calculating the difference between the reference inverted confidence map and each of the inverted confidence maps. A differential correction unit generates a corrected differential confidence map by correcting the values of each region in the differential confidence map by referring to the values of surrounding regions. A confidence map correction unit generates a corrected confidence map for each region of the corrected differential confidence map based on the corrected differential confidence map, the reference inversely transformed confidence map, and each of the inversely transformed confidence maps. An integration unit generates integrated data whose dimensionality matches that of the confidence map of the input data by performing an integration process that integrates each of the aforementioned inverse transform data. An analysis unit performs analytical processing on the aforementioned integrated data, An inference device equipped with the following features.
7. On the computer, A conversion parameter specification step in which one or more conversion parameters are specified, A conversion step that generates converted data by performing a conversion on the input data based on each of the specified conversion parameters, A confidence map generation step for each of the aforementioned converted data, which generates a confidence map that is data showing the respective characteristics of the aforementioned converted data, For each of the confidence maps, an inverse transformation step is performed to generate inverse transformed data by performing an inverse transformation on the transformation performed when the transformed data was generated, based on the specified transformation parameters. A difference calculation step to select a reference inverted confidence map from the inverted data, and generate a difference confidence map by calculating the difference between the reference inverted confidence map and each of the inverted confidence maps, A differential correction step to generate a corrected differential confidence map by correcting the values of each region in the differential confidence map by referring to the values of surrounding regions, A confidence map correction step that generates a corrected confidence map for each region of the corrected differential confidence map based on the corrected differential confidence map, the reference inversely transformed confidence map, and each of the inversely transformed confidence maps, An integration step is to perform an integration process that combines each of the inverse transform data to generate integrated data whose dimensionality matches that of the confidence map of the input data. An analysis step which involves performing an analysis on the integrated data, A program to execute.