Feature map coding device, feature map coding method, feature map decoding device, feature map decoding method

The feature map encoding and decoding system efficiently converts and decodes multi-scale feature maps into single-scale maps for effective transmission and storage, addressing the challenge of large data sizes in neural networks.

JP2026104065APending Publication Date: 2026-06-25JVC KENWOOD CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
JVC KENWOOD CORP
Filing Date
2024-12-13
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

The enormous amount of information in feature maps makes them unsuitable for efficient transmission and storage in existing neural network systems.

Method used

A feature map encoding and decoding system that includes a feature map reduction model and restoration model storage, selection, and conversion units to convert multi-scale feature maps into single-scale maps, followed by packing and encoding, and inverse processes for decoding.

Benefits of technology

Enables efficient encoding and decoding of feature maps with minimal processing load, facilitating effective transmission and storage.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026104065000001_ABST
    Figure 2026104065000001_ABST
Patent Text Reader

Abstract

This provides technology for encoding and decoding feature maps. [Solution] The present invention provides a feature map encoding device comprising: a feature map reduction model registration unit that registers a feature map reduction model and a corresponding feature map restoration model at the position indicated by a first model index in a feature map reduction model storage unit; a feature map reduction model selection unit that determines a second model index; a feature map reduction unit that generates a single-scale feature map using the feature map reduction model identified by the second model index; and a feature map restoration parameter encoding unit that transmits information indicating the first model index, the second model index, and the feature map restoration model identified by the second model index.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] It relates to the encoding and decoding of feature maps in a neural network.

Background Art

[0002] As a neural network technology used for image recognition such as detection of objects of various scales in an image, segmentation of regions for each object, or tracking of objects, FPN (Feature Pyramid Network) of Non-Patent Document 1 is known. In FPN, a plurality of feature maps of various scales are generated from the image to be processed, and various image recognitions are performed using the feature maps.

[0003] The FPN used for image recognition generates a plurality of feature maps from an image, and its structure uses a CNN (Convolutional Neural Network). The CNN reads an image and is composed of convolution and pooling, and can be divided into a feature amount extraction unit (backbone) that generates a feature map and an identification unit (head) that is composed of a hierarchical fully connected layer and generates an output suitable for tasks such as object detection, instance segmentation, and object tracking. FPN uses the backbone of CNN.

[0004] The feature amount extraction unit of FPN is typically configured with a convolutional processing unit 301, an activation processing unit 302, and a pooling processing unit 303 shown in FIG. 3 as one basic unit, and has a hierarchical structure that repeats this basic unit.

[0005] Figure 4 shows the structure of the FPN. The FPN consists of a bottom-up processing unit 322 that generates a multi-scale feature map composed of multiple hierarchical layers using a CNN backbone, and a top-down processing unit 324 that aggregates features from deeper layer feature maps to shallower layer feature maps using the inverse configuration of the CNN backbone. The bottom-up processing unit 322 repeatedly performs the convolution processing unit 301, activation processing unit 302, and pooling processing unit 303, which are the basic units in Figure 3, reducing the resolution of the feature map by half each time to generate a pyramid of multiple layered feature maps. On the other hand, the top-down processing unit 324 adds feature maps with resolutions corresponding to the bottom-up processing unit 322, expanding the resolution of the feature map to the same resolution as the input image to generate a pyramid of feature maps. In other words, the FPN generates multiple feature maps for each layer from the image 326 targeted for feature extraction processing.

[0006] The convolution processing unit 301 performs convolution on the data to be processed (image or feature map) using multiple predetermined filters (kernels). In the convolution processing in the convolution processing unit 301, predetermined filtering is performed on the entire data to be processed while sliding at predetermined intervals. At this time, the sliding interval is called the stride. The convolution processing unit 301 may determine the stride based on the number of data to be processed. For example, the convolution processing unit 301 may determine the stride to be 1 if the number of data to be processed is less than a predetermined value, and determine the stride to be 2 if it is greater than or equal to the predetermined value. Multiple feature maps are generated by preparing multiple predetermined filters at each layer and generating one feature map for each filter. The unit of a feature map is called a channel. If the number (types) of predetermined filters is N (N types), then N (N channels) feature maps are generated.

[0007] The activation processing unit 302 performs an activation process that non-linearly transforms the feature map output from the convolution processing unit 301. Here, the function used for the activation process is called the activation function. The activation processing unit 302 uses the ReLU (Rectified Linear Unit) function or the sigmoid function, etc., as the activation function.

[0008] The pooling processing unit 303 is a process that downsamples the feature map output from the activation processing unit 302 by replacing local values ​​of the feature map with representative values.

[0009] On the other hand, when performing classification using a neural network, it has the capability to execute the task using feature maps of multiple channels at each layer.

[0010] In image recognition, the multi-channel feature maps of each hierarchical level are subjected to a convolution process at predetermined size intervals based on the scale of the feature maps, and the probability of the object's class is calculated for each pixel. [Prior art documents] [Non-patent literature]

[0011] [Non-Patent Document 1] Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature Pyramid Networks for Object Detection. In CVPR, 2017. [Overview of the Initiative] [Problems that the invention aims to solve]

[0012] The amount of information in feature maps is enormous, making them unsuitable for transmission and storage. In view of the above problems, the present invention aims to provide a technology for encoding and decoding feature maps. [Means for solving the problem]

[0013] To solve the above problems, the feature map encoding device of the present invention includes: a feature map reduction model storage unit that stores a plurality of feature map reduction models and feature map restoration models; a feature map reduction model registration unit that determines a first model index indicating a position to register a feature map reduction model in the feature map reduction model storage unit, and registers the feature map reduction model and a feature map restoration model corresponding to the feature map reduction model at the position indicated by the first model index; a feature map reduction model selection unit that determines a second model index from the feature map reduction model storage unit to identify a feature map reduction model to be used for reducing feature maps; a feature map reduction unit that uses the feature map reduction model identified by the second model index to convert a multi-scale feature map and generate a single-scale feature map; a packing unit that packs the single-scale feature map into a frame and generates a decimal-type packed feature frame; a feature map restoration parameter encoding unit that transmits information indicating the first model index, the second model index, and the feature map restoration model identified by the second model index; and a feature map internal encoding unit that encodes the integer-type packed feature frame.

[0014] The feature map decoding device of the present invention includes a feature map restoration model storage unit that stores a plurality of feature map restoration models, a feature map restoration model storage unit that decodes a feature map restoration model stored in the feature map restoration model storage unit, a first model index that specifies the location where the feature map restoration model is stored, and a second model index that identifies the feature map restoration model to be used to restore the feature map, a feature map restoration model registration unit that registers the feature map restoration model at the location indicated by the first model index in the feature map restoration model storage unit, and the feature map restoration model used to restore the feature map from the feature map restoration model storage unit based on the second model index. The system comprises: a feature map restoration selection unit that selects a top restoration model; a feature map internal decoding unit that decodes the single-scale feature map packed into a frame and encoded using the feature map restoration model selected by the feature map restoration selection unit, and generates an integer-type packed feature frame; an inverse quantization unit that converts the elements of the integer-type packed feature frame into decimal values ​​to generate a decimal-type packed feature frame; an unpacking unit that unpacks the decimal-type packed feature frame to generate the single-scale feature map; and a feature map restoration unit that transforms the single-scale feature map to generate the multi-scale feature map. [Effects of the Invention]

[0015] According to the present invention, feature maps can be encoded and decoded efficiently with minimal processing load. [Brief explanation of the drawing]

[0016] [Figure 1] This is a block diagram illustrating the configuration of the feature map encoding device 100. [Figure 2] This is a block diagram illustrating the configuration of the feature map decoding device 200. [Figure 3] This block diagram illustrates the basic unit of processing in each layer of the FPN. [Figure 4] This is a diagram illustrating the structure of the FPN. [Figure 5] It is a block diagram for explaining the detailed configuration of a feature map reduction model. [Figure 6] It is a block diagram for explaining the detailed configuration of a feature map restoration model. [Figure 7] It is a block diagram for explaining the detailed configuration of the feature map conversion unit 103. [Figure 8] It is a block diagram for explaining the detailed configuration of the feature map inverse conversion unit 202. [Figure 9] It is a block diagram for explaining the detailed configuration of the feature map internal encoding unit 104. [Figure 10] It is a block diagram for explaining the detailed configuration of the feature map internal decoding unit 201. [Figure 11] It is a diagram for explaining the number of channels, the width, and the height of the feature maps x1, x2, x3. [Figure 12] It is a diagram for explaining the state of packing multiple-channel feature maps into one frame. [Figure 13] It is a diagram for explaining the flipping when packing multiple-channel feature maps into one frame. [Figure 14] It is a diagram for explaining the layers and units handled by the feature map encoding device and the feature map decoding device of the present embodiment. [Figure 15] It is a flowchart for explaining the details of the operations of the feature map reduction model registration unit 105, the feature map reduction model storage unit 106, the feature map reduction model selection unit 107, and the feature map restoration parameter encoding unit 108 of the first embodiment. [Figure 16] It is a flowchart for explaining the details of the operations of the feature map restoration parameter decoding unit 205, the feature map restoration model registration unit 206, the feature map restoration selection unit 207, and the feature map restoration model storage unit 208 of the first embodiment. [Figure 17] It is a diagram for explaining the configurations of the feature map reduction model storage unit 106 and the feature map restoration model storage unit 208 in the first embodiment. [Figure 18] This figure illustrates the syntax structure of the feature map reconstruction parameters in the first embodiment. [Figure 19] This is a block diagram illustrating the detailed configuration of another feature map reduction model. [Figure 20] This is a block diagram illustrating the detailed configuration of another feature map reconstruction model. [Modes for carrying out the invention]

[0017] This section defines the technologies and technical terms used in this embodiment.

[0018] <Features and Feature Maps> In a convolutional neural network (CNN), a filter is used to extract features from the input layer data. The data obtained by scanning the target image (input layer data) while changing its position, and the filter coefficients are then combined to produce features or feature maps.

[0019] <Packing> The process of combining two or more frames (pictures) into a single frame (picture) by arranging them in a tile-like manner is called frame packing. In this application, packing refers to the process of combining feature maps of multiple channels into a single frame. Figure 12 shows an example of frame packing.

[0020] <Data type> A data type that represents integer values ​​is called an integer type, and a data type that represents decimal values ​​is called a decimal type.

[0021] <Layer, Unit> Figure 14 illustrates the layers and units handled by the feature map encoding and decoding devices of this embodiment. Figure 14 shows a sequence of consecutive feature frames or feature maps for all channels over time, and is referred to as a sequence layer or sequence unit. Figure 14 shows a sequence of consecutive feature maps for one channel over time, and is referred to as a sequence layer for each channel or a sequence unit for each channel. Figure 14 shows a feature frame or feature map for all channels at the same time, and is referred to as a frame layer or frame unit. Figure 14 shows a feature map for one channel at a certain time, and is referred to as a feature map layer for each channel (a feature map unit for each channel).

[0022] (First Embodiment) A feature map encoding device 100 and a feature map decoding device 200 according to a first embodiment of the present invention will be described.

[0023] Figure 1 is a block diagram of the feature map encoding device 100 according to the first embodiment. The feature map encoding device 100 of this embodiment includes a feature map reduction unit 102, a feature map conversion unit 103, a feature map internal encoding unit 104, a feature map reduction model registration unit 105, a feature map reduction model storage unit 106, a feature map reduction model selection unit 107, and a feature map restoration parameter encoding unit 108. The feature map encoding device 100 is a device that encodes the feature map generated by the neural network feature extraction unit 101 to generate a bitstream and output it.

[0024] The neural network feature extraction unit 101 reads the image to be feature extracted, generates a feature map through convolution, activation, and pooling processes using an FPN, and supplies it to the feature map reduction unit 102. In this embodiment, a three-layer multi-scale feature map x1, x2, and x3 is generated.

[0025] The feature map reduction unit 102 uses a feature map reduction model selected by the feature map reduction model selection unit 107 from among multiple feature map reduction models stored in the feature map reduction model storage unit 106 to convert the three-layer multi-scale feature maps x1, x2, and x3 obtained from the neural network feature extraction unit 101 into a single-layer single-scale feature map xf, and supplies it to the feature map conversion unit 103. The details of the feature map reduction unit 102 will be explained in detail with reference to Figure 5.

[0026] The feature map conversion unit 103 takes the fractional single-scale feature map xf supplied from the feature map reduction unit 102, performs packing and quantization processing to convert it into an integer-type packed feature frame, and supplies it to the feature map internal encoding unit 104.

[0027] The details of the feature map conversion unit 103 will be explained in detail with reference to Figure 7.

[0028] The feature map internal encoding unit 104 encodes the integer-type packing feature frame supplied from the feature map conversion unit 103 using an image encoding standard such as VVC, HEVC, or AV1 to generate and output a bitstream. The output bitstream is supplied to the feature map decoding device 200, etc., via a network or the like.

[0029] The details of the feature map internal encoding unit 104 will be explained in detail with reference to Figure 9.

[0030] The feature map reduction model registration unit 105 generates a feature map reduction model and a corresponding feature map restoration model using machine learning or the like, and determines a model index that indicates the location where each model should be registered in the feature map reduction model storage unit 106. The feature map reduction model storage unit 106 has a function to register the generated feature map reduction model and feature map restoration model at the location indicated by the model index.

[0031] The feature map reduction model storage unit 106 has the function of storing multiple feature map reduction models and multiple feature map restoration models. The configuration of the feature map reduction model storage unit 106 is shown in Figure 17. The feature map reduction model storage unit 106 consists of a feature map reduction model array, ReductionArray, which stores feature map reduction models, and a feature map restoration model array, RestorationArray, which stores feature map restoration models. Each element of ReductionArray and RestorationArray is accessed using a model index. Since a feature map restoration model corresponding to a single feature map reduction model is uniquely determined, the same model index is used for both ReductionArray and RestorationArray. In this embodiment, the size of ReductionArray and RestorationArray is set to 8, but the size of the arrays can be freely set by the user, and the size of the arrays may be variable.

[0032] The feature map reduction model selection unit 107 has the function of selecting the optimal feature map reduction model to be used by the feature map reduction unit 102 from the feature map reduction model registration unit 105. It transmits the model index that identifies the feature map reduction model to the feature map restoration parameter coding unit 108.

[0033] The feature map restoration parameter encoding unit 108 encodes the feature map restoration model generated by the feature map reduction model registration unit 105, the model index specifying the generated feature map restoration model, and the model index used in the frame selected by the feature map reduction model selection unit 107 to generate and output a feature map restoration parameter bitstream. The output bitstream is supplied to the feature map decoding device 200, etc., via a network or the like. The feature map restoration parameter bitstream may be transmitted for every packing feature frame, or it may be transmitted only for packing feature frames in which the feature map reduction model and feature map restoration model have been registered, or for packing feature frames in which the feature map reduction model and feature map restoration model to be used have been changed. The feature map restoration parameter encoding unit 108 may supply the feature map restoration parameter bitstream to the feature map internal encoding unit 104 of the feature map encoding device 100, and the feature map internal encoding unit 104 may integrate the bitstream for feature map internal encoding and the feature map restoration parameter bitstream and supply them to the feature map decoding device 200, etc.

[0034] The detailed operation of the feature map reduction model registration unit 105, the feature map reduction model storage unit 106, the feature map reduction model selection unit 107, and the feature map restoration parameter encoding unit 108 will be described later with reference to Figure 15.

[0035] Figure 2 is a block diagram showing the configuration of a feature map decoding device 200 according to an embodiment of the present invention, corresponding to the feature map encoding device 100 in Figure 1. The feature map decoding device 200 of this embodiment includes a feature map internal decoding unit 201, a feature map inverse conversion unit 202, a feature map reconstruction unit 203, a feature map reconstruction parameter decoding unit 205, a feature map reconstruction model registration unit 206, a feature map reconstruction selection unit 207, and a feature map reconstruction model storage unit 208. The feature map decoding device 200 also receives a bitstream encoded by the feature map encoding device 100, decodes the bitstream to generate three-layer multi-scale feature maps x1up, x2up, and x3up, and supplies them to the neural network identification unit 204.

[0036] The feature map internal decoding unit 201 decodes the bitstream encoded by the feature map internal encoding unit 104 of the feature map encoding device 100 using an image encoding standard such as VVC, HEVC, or AV1, generates an integer-type packing feature frame, and supplies it to the feature map inverse transform unit 202.

[0037] The details of the feature map internal decoding unit 201 will be explained in detail using Figure 10.

[0038] The feature map inverse transform unit 202 performs inverse quantization and unpacking on the integer-type packing feature frame supplied from the feature map internal decoding unit 201, converts it into a fractional-type single-scale feature map xr, and supplies it to the feature map reconstruction unit 203.

[0039] The details of the feature map inverse transform unit 202 will be explained in detail with reference to Figure 8.

[0040] The feature map reconstruction unit 203 uses the feature map reconstruction model selected by the feature map reconstruction selection unit 207 from among multiple feature map reconstruction models stored in the feature map reconstruction model storage unit 208 to convert the single-scale feature map xr supplied from the feature map inverse conversion unit 202 into three-layer multi-scale feature maps x1up, x2up, and x3up, and supplies them to the neural network identification unit 204 as the output of the feature map decoding device 200.

[0041] The details of the feature map reconstruction unit 203 will be explained in detail with reference to Figure 6.

[0042] The neural network identification unit 204 performs identification processing such as identifying objects in the target image, identifying locations and landscapes, and identifying people and living things, based on the three-layer multi-scale feature maps x1up, x2up, and x3up supplied by the feature map reconstruction unit 203.

[0043] The feature map restoration parameter decoding unit 205 decodes the feature map restoration parameter bitstream encoded by the feature map restoration parameter encoding unit 108. It supplies the decoded feature map restoration model to be registered and the model index specifying the feature map restoration model to the feature map restoration model registration unit 206. Furthermore, the feature map restoration parameter decoding unit 205 supplies the decoded model index to be used in the frame to the feature map restoration selection unit 207.

[0044] The feature map reconstruction model registration unit 206 has the function of registering the feature map reconstruction model to be registered, obtained from the feature map reconstruction parameter decoding unit 205, at the location of the model index that specifies the feature map reconstruction model in the feature map reconstruction model storage unit 208.

[0045] The feature map restoration model storage unit 208 has the function of storing multiple feature map restoration models. The feature map restoration model storage unit 208 is composed of a feature map restoration model array RestorationArray that stores the feature map restoration model shown in Figure 17. Since a feature map reduction model is not required in the feature map decoding device 200, the feature map restoration model storage unit 208 does not include a feature map reduction model array ReductionArray that stores a feature map reduction model.

[0046] The feature map reconstruction selection unit 207 selects a feature map reconstruction model from the feature map reconstruction model storage unit 208 that is specified by the model index to be used in the frame, which was obtained from the feature map reconstruction parameter decoding unit 205.

[0047] Details of the operation of the feature map restoration parameter decoding unit 205, the feature map restoration model registration unit 206, the feature map restoration model storage unit 208, and the feature map restoration selection unit 207 will be described later with reference to Figure 16.

[0048] <About Feature Map Reduction and Feature Map Restoration> In this embodiment, feature map reduction and restoration are performed by a feature map reduction model and a feature map restoration model, respectively. The feature map reduction model storage unit 106 stores feature map reduction models. The feature map restoration model storage unit 208 stores feature map restoration models. The feature map reduction model storage unit 106 and the feature map restoration model storage unit 208 each have areas capable of storing multiple feature map reduction models and feature map restoration models. The feature map reduction unit 102 and the feature map restoration unit 203 use the selected feature map reduction model and feature map restoration model, respectively, to perform feature map reduction and restoration. Switching between feature map reduction models and feature map restoration models is done on a packing feature frame basis. The feature map reduction unit 102 has the function of converting a multi-layer multi-scale feature map obtained from the neural network feature extraction unit 101 into a single-layer single-scale feature map using one feature map reduction model selected by the feature map reduction model selection unit 107 from among multiple feature map reduction models stored in the feature map reduction model storage unit 106.

[0049] Figure 5 shows an example of a feature map reduction model selected by the feature map reduction model selection unit 107. The feature map reduction model in Figure 5 is stored, for example, in ReductionArray[0]. ReductionArray[0] consists of a first feature map reduction unit 501, a first channel joiner 502, a second feature map reduction unit 503, a second channel joiner 504, a third feature map reduction unit 505, a first padding unit 506, a second padding unit 507, and a third padding unit 508. ReductionArray[0] is an example of a configuration that converts a three-layer multi-scale feature map into a single-scale feature map.

[0050] The ReductionArray[0] takes a three-layer multi-scale feature map, consisting of a first feature map x1, a second feature map x2, and a third feature map x3, as input, converts it into a single-layer single-scale feature map xf, and supplies it to the feature map conversion unit 103. Here, n is the index indicating the layer, Cn is the number of channels in the nth layer, Wn is the width of the feature map, and Hn is the height of the feature map. In this embodiment, the values ​​of Cn, Wn, and Hn for each layer are as shown in Figure 11. However, H and W are the width and height of the image from which feature extraction is performed, respectively.

[0051] The first padding unit 506 has the function of performing padding on the first feature map x1 and generating the first padded feature map x1pad. The first padding unit 506 determines the padding size such that the width and height of x1pad are multiples of 64. The number of channels in x1pad is the same as x1, which is 256.

[0052] The second padding unit 507 has the function of performing wrapping padding on the second feature map x2 to generate the second padded feature map x2pad. The second padding unit 507 determines the padding size such that the width and height of x2pad are multiples of 32. The number of channels in x2pad is the same as x2, which is 256.

[0053] The third padding unit 508 has the function of performing wrapping padding on the third feature map x3 to generate the third padded feature map x3pad. The third padding unit 508 determines the padding size such that the width and height of x3pad are multiples of 16. The number of channels in x3pad is the same as x3, which is 256.

[0054] In the first padding section 506, the second padding section 507, and the third padding section 508, the padding size on the left and the padding size on the right are the same, and the padding size on the top and the padding size on the bottom are the same. That is, feature maps x1, x2, and x3 are positioned at the center of x1pad, x2pad, and x3pad, respectively.

[0055] The first feature map reduction unit 501 performs convolution in the spatial and channel directions on the first padded feature map x1pad obtained from the first padding unit 506 to generate the first intermediate feature map y1. The number of channels in y1 is 192, the width is Wx1pad / 2, and the height is Hx1pad / 2. Here, Wx1pad and Hx1pad are the width and height of the first padded feature map x1pad, respectively.

[0056] The first channel merging unit 502 has the function of combining the first intermediate feature map y1 obtained from the first feature map reduction unit 501 and the second padded feature map x2pad obtained from the second padding unit 507 in the channel direction to generate an intermediate feature map y1Cx2pad. Since y1 has 192 channels and x2pad has 256 channels, the intermediate feature map y1Cx2pad has 448 channels (192 + 256).

[0057] The second feature map reduction unit 503 performs convolution in the spatial and channel directions on the intermediate feature map y1Cx2pad obtained from the first channel joining unit 502 to generate the second intermediate feature map y2. The number of channels in y2 is 192, the width is Wy1Cx2pad / 2, and the height is Hy1Cx2pad / 2. Here, Wy1Cx2pad and Hy1Cx2pad are the width and height of the intermediate feature map y1Cx2pad, respectively.

[0058] The second channel merging unit 504 has the function of combining the second intermediate feature map y2 obtained from the second feature map reduction unit 503 and the third padded feature map x3pad obtained from the third padding unit 508 in the channel direction to generate an intermediate feature map y2Cx3pad. Since the intermediate feature map y2 has 192 channels and x3pad has 256 channels, the number of channels in y1Cx2pad is 448 (192 + 256).

[0059] The third feature map reduction unit 505 performs convolution in the spatial and channel directions on the intermediate feature map y2Cx3pad obtained from the second channel joining unit 504 to generate a third intermediate feature map y3. The number of channels in y3 is 192, the width is Wy2Cx3pad / 2, and the height is Hy2Cx3pad / 2. Here, Wy2Cx3pad and Hy2Cx3pad are the width and height of the intermediate feature map y2Cx3pad, respectively.

[0060] The feature map reduction unit 102 outputs the third intermediate feature map y3 as a single-scale feature map xf and supplies it to the feature map conversion unit 103. Figure 19 shows another example of a feature map reduction model selected by the feature map reduction model selection unit 107. The feature map reduction model in Figure 19 is stored, for example, in ReductionArray[1].

[0061] The ReductionArray[1] consists of a first padding section 506, a second padding section 507, a third padding section 508, a first feature map reduction section 511, a second feature map reduction section 512, a third feature map reduction section 513, and a channel merging section 514.

[0062] The first padding section 506, the second padding section 507, and the third padding section 508 have the same configuration as those shown in Figure 5, so their description is omitted.

[0063] The first feature map reduction unit 511 performs convolution in the spatial and channel directions on the first padded feature map x1pad obtained from the first padding unit 506 to generate the first intermediate feature map y1. The number of channels in y1 is 160, the width is Wx1pad / 2, and the height is Hx1pad / 2. Here, Wx1pad and Hx1pad are the width and height of the first padded feature map x1pad, respectively.

[0064] The second feature map reduction unit 512 performs convolution in the spatial and channel directions on the intermediate feature map x2pad obtained from the second padding unit 507 to generate the second intermediate feature map y2. The number of channels in y2 is 160, the width is Wx2pad / 4, and the height is Hx2pad / 4. Here, Wx2pad and Hx2pad are the width and height of the intermediate feature map x2pad, respectively.

[0065] The third feature map reduction unit 513 performs convolution in the spatial and channel directions on the intermediate feature map x3pad obtained from the third padding unit 508 to generate the third intermediate feature map y3. The number of channels in y3 is 160, the width is Wx3pad / 8, and the height is Hx3pad / 8. However, Wx3pad and Hx3pad are the width and height of the intermediate feature map y2Cx3pad, respectively. The channel merging unit 514 has the function of combining the intermediate feature maps y1, y2, and y3 obtained from the first feature map reduction unit 511, the second feature map reduction unit 512, and the third feature map reduction unit 513 in the channel direction to generate a fourth intermediate feature map y4. Since the number of channels for y1, y2, and y3 is 160 each, the number of channels for the intermediate feature map y1Cx2pad is 480 (160 x 3). The ReductionArray[1] outputs the fourth intermediate feature map y4 as a single-scale feature map xf and supplies it to the feature map conversion unit 103. The feature map reconstruction unit 203 has the function of having the feature map reconstruction selection unit 207 select one feature map reconstruction model from among multiple feature map reconstruction models stored in the feature map reconstruction model storage unit 208, and using the selected feature map reconstruction model to convert the single-scale feature map xr obtained from the feature map inverse conversion unit 202 into three-layer multi-scale feature maps x1up, x2up, and x3up.

[0066] Figure 6 shows an example of a feature map restoration model selected by the feature map restoration selection unit 207. The feature map restoration model in Figure 6 is stored, for example, in RestorationArray[0]. RestorationArray[0] consists of an 8x magnification unit 601, a 4x magnification unit 602, a 2x magnification unit 603, a first feature map mixing unit 604, a second feature map mixing unit 605, a first padding removal unit 606, a second padding removal unit 607, and a third padding removal unit 608.

[0067] The 8x magnification unit 601 has the function of expanding the feature map and reducing the number of channels of the single-scale feature map xr obtained from the feature map inverse transformation unit 202 by performing transposition convolution in the spatial direction and convolution in the channel direction, thereby generating an intermediate feature map z1. The number of channels in z1 is 196. If the width and height of the single-scale feature map xr are xrwidth and xrheight, respectively, then the width and height of z1 are xrwidth × 8 and xrheight × 8, respectively. Here, rwidth × 8 and xrheight × 8 are the same as the width and height of the first padded feature map x1pad, which is the output of the first padding unit 506 of the feature map reduction unit 102.

[0068] The quadruple magnification unit 602 has the function of expanding features and reducing channels by performing transposition convolution in the spatial direction and convolution in the channel direction on the single-scale feature map xr obtained from the feature map inverse transformation unit 202, thereby generating an intermediate feature map z2. The number of channels in z2 is 196. The width and height of z2 are xrwidth × 4 and xrheight × 4, respectively. Here, rwidth × 4 and xrheight × 4 are the same as the width and height of the second padded feature map x2pad, which is the output of the second padding unit 507 of the feature map reduction unit 102.

[0069] The doubling unit 603 has the function of expanding the feature map and reducing the number of channels of the single-scale feature map xr obtained from the feature map inverse transformation unit 202 by performing transposition convolution in the spatial direction and convolution in the channel direction, thereby generating an intermediate feature map z3. The number of channels in z3 is 196. The width and height of z3 are xrwidth × 2 and xrheight × 2, respectively. Here, rwidth × 2 and xrheight × 2 are the same as the width and height of the third padded feature map x3pad, which is the output of the third padding unit 508 of the feature map reduction unit 102.

[0070] The first feature map mixing unit 604 has the function of generating an intermediate feature map z2up with improved quality from the intermediate feature map z2 obtained from the 4x magnification unit 602, using the intermediate feature map z1 obtained from the 8x magnification unit 601.

[0071] The second feature map mixing unit 605 has the function of generating an intermediate feature map z3up with improved quality from the intermediate feature map z3 obtained from the doubling unit 603, using the intermediate feature map z2up obtained from the first feature map mixing unit 604.

[0072] The padding removal unit 606 has the function of removing padding from the intermediate feature map z1 acquired from the 8x magnification unit 601 and generating a first output feature map x1up. The width and height of x1up are the same as the width and height of the first feature map x1 input to the feature map reduction unit 102.

[0073] The padding removal unit 607 has the function of removing padding from the intermediate feature map z2up obtained from the first feature map mixing unit 604 and generating a second output feature map x2up. The width and height of x2up are the same as the width and height of the second feature map x2 input to the feature map reduction unit 102.

[0074] The padding removal unit 608 has the function of removing padding from the intermediate feature map z3up obtained from the second feature map mixing unit 605 and generating a third output feature map x3up. The width and height of x3up are the same as the width and height of the third feature map x3 input to the feature map reduction unit 102.

[0075] In the first padding removal section 606, the second padding removal section 607, and the third padding removal section 608, similar to the first padding section 506, the second padding section 507, and the third padding section 508 of the feature map reduction section 102, the padding size on the left and the padding size on the right are set to be the same, and the padding size on the upper and lower sides are set to be the same. That is, each output feature map x1up, x2up, and x3up is assumed to be positioned at the center of each intermediate feature map z1, z2up, and z3up, respectively, and padding is removed from the top, bottom, left, and right. Figure 20 shows another example of a feature map reconstruction model selected by the feature map reconstruction selection unit 207. The feature map reconstruction model in Figure 2 is stored, for example, in RestorationArray[1].

[0076] RestorationArray[1] is composed of a first padding release unit 606, a second padding release unit 607, a third padding release unit 608, an 8x magnification unit 611, a 4x magnification unit 612, and a 2x magnification unit 613. RestroationArray[1] is a configuration obtained by removing the first feature map mixing unit 604 and the second feature map mixing unit 605 from Figure 6.

[0077] <Regarding updates to feature map reduction parameters and feature map restoration parameters> Figure 18 shows an example of the syntax structure of the feature map reconstruction parameters in this embodiment. This syntax structure is transmitted in units of packing feature frames. The syntax element updates_restoration indicates whether or not to switch the feature map reconstruction model used for the packing feature frame. A syntax element updates_restoration=1 indicates that the feature map reconstruction model will be switched for that packing feature frame. A syntax element updates_restoration=0 indicates that the feature map reconstruction model will not be switched for that packing feature frame. If the syntax element updates_restoration=1, the syntax element used_restoration_model_idc is also transmitted. The syntax element used_restoration_model_idc identifies the feature map reconstruction model used for the packing feature frame from among the stored feature map reconstruction models. The syntax element `registers_restoration_model` indicates whether or not to register a new feature map restoration model. `registers_restoration_model=1` indicates that a new feature map restoration model will be registered. If `registers_restoration_model=0`, a new feature map restoration model will not be registered, and therefore, subsequent syntax elements will not be transmitted. The syntax element registered_restoration_model_idc is an index that identifies the feature map restoration model to be registered. The syntax element `register_mode` indicates a means of representing the feature map reconstruction model. By providing multiple means of representing feature map reconstruction models, flexible representation becomes possible without being limited to a specific representation method. The syntax element register_mode=0 indicates that a feature map reconstruction model will be transmitted, according to the ISO / IEC 15938-17 standard, “Compression of neural networks for multimedia content description and analysis”. ISO / IEC 15938-17 is a standard for compressing neural network models. When the syntax element register_mode=0, the data length payload_length is also transmitted, and the actual data payload equal to the data length payload_length is transmitted. The interpretation of the actual data payload shall follow ISO / IEC 15938-17. The syntax element register_mode=1 indicates that the feature map reconstruction model will be transmitted via restoration_uri, which is a URI (Uniform Resource Identifier). In this case, it refers to the feature map reconstruction model stored on an external resource. The syntax element register_mode=2 indicates that the feature map reconstruction model will be transmitted using a general-purpose format for representing the neural network model. When the syntax element register_mode=2 is used, the format-specific index format_idc is then transmitted. For example, format_idc=0 indicates ONNX (Open Neural Network Exchange), format_idc=1 indicates NNEF (Neural Network Exchange Format), and format_idc=2 indicates PyTorch. In all cases, the data length payload_length is then transmitted, followed by the actual data payload of that length. The interpretation of the actual data payload shall follow the respective format.

[0078] The flowchart in Figure 15 will be used to explain in detail the operation of the feature map reduction model registration unit 105, the feature map reduction model storage unit 106, the feature map reduction model selection unit 107, and the feature map restoration parameter coding unit 108.

[0079] The feature map reduction model registration unit 105 determines whether or not to register new feature map reduction models and feature map restoration models, and generates the feature map reduction models and feature map restoration models to be registered. If new feature map reduction models and feature map restoration models are to be registered, the syntax element registers_restoration_model=1 is set; otherwise, the syntax element registers_restoration_model=0 is set (step S201).

[0080] The feature map restoration parameter encoding unit 108 encodes a syntax element, registers_restoration_model, which indicates whether or not to register a new feature map restoration model (step S202). If the syntax element registers_restoration_model=0, that is, if no new feature map reduction model and feature map restoration model are registered (No. in step S203), the feature map restoration parameter encoding unit 108 does not register the new feature map reduction model and feature map restoration model and proceeds to step S206. If the syntax element registers_restoration_model=1, that is, if a new feature map reduction model and feature map restoration model are to be registered (Yes in step S203), the feature map restoration parameter encoding unit 108 encodes the syntax element registered_restoration_model_idc which identifies the feature map reduction model and feature map restoration model to be registered (step S204).

[0081] The feature map restoration parameter encoding unit 108 determines the value of the syntax element register_mode and encodes the determined syntax element register_mode. It encodes the feature map restoration model according to the format indicated by the syntax element register_mode. The feature map reduction model registration unit 105 registers the feature map reduction model in the ReductionArray[registered_restoration_model_idc] and the feature map restoration model in the RestorationArray[registered_restoration_model_idc] of the feature map reduction model storage unit 106 (step S205). A predetermined feature map restoration model is registered in ReductionArray[0]. If a feature map restoration model is already registered in ReductionArray[registered_restoration_model_idc], the existing feature map restoration model is overwritten and a new feature map restoration model is stored. ReductionArray[0] may always store a predetermined feature map restoration model, and overwriting of ReductionArray[0] may be prohibited.

[0082] The feature map reduction model selection unit 107 determines the syntax element updates_restoration, which indicates whether or not to switch the feature map restoration model to be used in the packing feature frame, and the model index used_restoration_model_idc, which identifies the feature map restoration model to be used in the packing feature frame. The feature map reduction model selection unit 107 selects the feature map reduction model ReductionArray[used_restoration_model_idc] indicated by the syntax element used_restoration_model_idc as the feature map reduction model to be used in the packing feature frame (step S206).

[0083] The feature map restoration parameter encoding unit 108 encodes the syntax element updates_restoration (step S207).

[0084] If the syntax element updates_restoration=0, meaning that the feature map reconstruction model used for the packing feature frame is not to be switched (No in step S208), the feature map reconstruction model is not switched and the process terminates.

[0085] If the syntax element updates_restoration=1, i.e., switching the feature map restoration model used for the packing feature frame (Yes in step S208), the syntax element used_restoration_model_idc, which indicates the model index, is encoded and the process ends (step S209).

[0086] The flowchart in Figure 16 will be used to explain in detail the operation of the feature map reconstruction model registration unit 206, the feature map reconstruction model storage unit 208, the feature map reconstruction selection unit 207, and the feature map reconstruction parameter decoding unit 205.

[0087] The feature map restoration parameter decoding unit 205 decodes the syntax element registers_restoration_model, which indicates whether or not to register a new feature map restoration model (step S101). If the syntax element registers_restoration_model=0 (No. in step S102), the feature map restoration parameter decoding unit 205 proceeds to step S105 without registering a new feature map restoration model. If the syntax element registers_restoration_model=1 (Yes in step S102), the feature map restoration parameter decoding unit 205 decodes the syntax element registered_restoration_model_idc which identifies the feature map restoration model to be registered (step S103).

[0088] The feature map restoration parameter decoding unit 205 decodes the syntax element register_mode. It decodes the feature map restoration model according to the decoded syntax element register_mode. The feature map restoration model registration unit 206 registers the decoded feature map restoration model in the RestorationArray[registered_restoration_model_idc] of the feature map restoration model storage unit 208 (step S104). A predetermined feature map restoration model is registered in ReductionArray[0]. If a feature map restoration model is already registered in ReductionArray[registered_restoration_model_idc], the existing feature map restoration model is overwritten and a new feature map restoration model is stored. ReductionArray[0] may always store a predetermined feature map restoration model, and overwriting of ReductionArray[0] may be prohibited.

[0089] The feature map restoration parameter decoding unit 205 decodes the syntax element updates_restoration, which indicates whether or not to switch the feature map restoration model used in the packing feature frame (step S105).

[0090] If the syntax element updates_restoration=0 (No. in step S106), the process terminates without switching the feature map restoration model.

[0091] If the syntax element updates_restoration=1 (Yes in step S106), the feature map restoration parameter decoding unit 205 decodes the syntax element used_restoration_model_idc, which indicates a model index that identifies the feature map restoration model to be used in the packing feature frame (step S107). The feature map restoration selection unit 207 selects the feature map restoration model RestorationArray[used_restoration_model_idc] indicated by the syntax element used_restoration_model_idc as the feature map restoration model to be used in the packing feature frame (step S108). If no feature map restoration model is registered in ReductionArray[used_restoration_model_idc], an exception is performed to select ReductionArray[0] as the feature map restoration model.

[0092] <About Feature Map Transformation and Inverse Feature Map Transformation> The feature map conversion unit 103 has the function of taking the multi-channel fractional single-scale feature map xf supplied from the feature map reduction unit 102, performing packing and quantization processing, and converting it into an integer-type packed feature frame for supply to the feature map internal encoding unit 104.

[0093] Figure 7 will be used to explain the details of the feature map conversion unit 103 on the encoding side. The feature map conversion unit 103 consists of a packing unit 701 and a feature map quantization unit 702.

[0094] The packing unit 701 has the function of generating a packed feature frame by combining the input feature maps of multiple channels into a single frame. Figure 12 is a diagram illustrating the state in which multiple channel feature maps are packed into one frame. The feature maps of each channel are sequentially placed in one frame from left to right, from top to bottom, in the order of the raster scan.

[0095] Furthermore, the packing unit 701 performs flipping based on the in-frame position where the channel feature map is placed. The ability to select whether or not to perform flipping may be provided and transmitted from the encoding side to the decoding side via the bitstream.

[0096] Figure 13 illustrates the flipping process when packing multiple channel feature maps into a single frame. Flipping involves reversing the position of the elements (pixels) of each channel's feature map horizontally (left / right), vertically (up / down), or both horizontally and vertically (up / down / left / right) when packing the feature maps for each channel. In Figure 13, the four channel feature maps A (top left), B (top right), C (bottom left), and D (bottom right) are treated as a single set. No flipping is performed at position A in Figure 13. At position B, the feature map is reversed horizontally (left / right). At position C, the feature map is reversed vertically (up / down). At position D, the feature map is reversed horizontally and vertically (up / down / left / right). When the distribution of elements in each channel's feature map is similar, performing flipping based on the in-frame position where the channels are placed reduces the boundaries between each channel's feature map, improving encoding efficiency.

[0097] The feature map quantization unit 702 has the function of converting the elements of a decimal-type packing feature frame (feature map of all channels) into an N-bit integer type (an integer N=8 to approximately 16) within a predetermined range and outputting an integer-type packing feature frame. In this embodiment, it is assumed that it is converted into a 10-bit integer type from 0 to 1023. The feature map quantization unit 702 detects the minimum and maximum values ​​of the elements of the decimal-type packing feature frame and transmits the detected minimum and maximum values ​​of the elements of the packing feature frame (feature map of all channels) to the decoding side as metadata. In order to convert from a decimal-type packing feature frame (feature map) to an integer-type packing feature frame (feature map), a linear transformation is performed in which the minimum value of the decimal type corresponds to the minimum value of the integer type, and the maximum value of the decimal type corresponds to the maximum value of the integer type. For example, when the range of the integer type is represented by 10 bits, the minimum value of the elements of the integer-type packing feature frame (feature map) is 0, and the maximum value is 1023 (2 10 -1) is obtained. Linear quantization is performed on values ​​between the minimum and maximum values.

[0098] Next, the feature map inverse transform unit 202 has the function of performing inverse quantization and unpacking on integer-type packing feature frames decoded by VVC, HEVC, AV1, etc., supplied from the feature map internal decoding unit 201, and transforming them into a decimal-type single-scale feature map xr for supply to the feature map reconstruction unit 203.

[0099] Figure 8 will be used to explain the details of the feature map inverse transform unit 202 on the decoding side. The feature map inverse transform unit 202 is the inverse process of the feature map transform unit 103 and is composed of a feature map inverse quantization unit 801 and an unpacking unit 802.

[0100] The feature map inverse quantization unit 801 performs the inverse processing of the encoding-side feature map quantization unit 702 and has the function of converting the elements of the integer-type packing feature frame from integer type to decimal type. The feature map inverse quantization unit 801 converts the integer-type packing feature frame decoded by the feature map internal decoding unit 201 into a decimal-type packing feature frame using the minimum and maximum decimal values ​​transmitted as metadata. A linear transformation is performed to make the minimum integer value equivalent to the minimum decimal value and the maximum integer value equivalent to the maximum decimal value. For values ​​between the minimum and maximum values, linear inverse quantization is performed.

[0101] The unpacking unit 802 extracts the feature maps for each channel from the packing feature frames arranged in a single frame in the order of the raster scan, and supplies them to the feature map reconstruction unit 203 as a single-scale feature map xr.

[0102] <About internal encoding and decoding of feature maps> Figure 9 will be used to explain the details of the feature map internal encoding unit 104. The feature map internal encoding unit 104 consists of a switch 901, a VVC encoding unit 902, an HEVC encoding unit 903, and an AV1 encoding unit 904. The switch 901 selects the encoding standard for internally encoding the feature map converted by the feature map conversion unit 103. The VVC encoding unit 902 encodes the feature map using the VVC standard and outputs a bitstream compliant with the VVC standard. The HEVC encoding unit 903 encodes the feature map using the HEVC standard and outputs a bitstream compliant with the HEVC standard. The AV1 encoding unit 904 encodes the feature map using the AV1 standard and outputs a bitstream compliant with the AV1 standard.

[0103] In the VVC, HEVC, and AV1 standards, images are divided into predetermined block sizes and then encoded.

[0104] It is also possible to implement only one of the following: VVC, HEVC, or AV1. Furthermore, it is possible to use image encoding schemes other than VVC, HEVC, and AV1.

[0105] Next, the details of the feature map internal decoding unit 201 will be explained using Figure 10. The feature map internal decoding unit 201 consists of a switch 1001, a VVC decoding unit 1002, an HEVC decoding unit 1003, and an AV1 decoding unit 1004. The switch 1001 selects the encoding standard to be internally decoded based on the information in the input bitstream that is used to select the internal decoding standard. The VVC decoding unit 1002 decodes the feature map using the VVC standard. The HEVC decoding unit 1003 decodes the feature map using the HEVC standard. The AV1 encoding unit 1004 decodes the feature map using the AV1 standard.

[0106] In the VVC, HEVC, and AV1 standards, decoding is performed for each predetermined block size.

[0107] It is also possible to implement only one of the following: VVC, HEVC, or AV1. Furthermore, it is possible to use image encoding schemes other than VVC, HEVC, and AV1.

[0108] This embodiment provides a configuration that can adaptively switch between a feature map reduction model and a feature map restoration model for a packing feature frame. Because the appropriate feature map reduction model and feature map restoration model can be selected, the prediction accuracy of the feature map can be improved.

[0109] Furthermore, this embodiment provides a configuration that allows users to select a registered feature map reduction model or feature map restoration model at any time by registering a new feature map reduction model or feature map restoration model and specifying it using a model index. Even when the characteristics of the multi-scale feature map to be encoded change significantly, the optimal feature map reduction model or feature map restoration model can be registered, thereby improving the compression accuracy of the feature map.

[0110] In all the embodiments described above, the bitstream output by the feature map encoding device has a specific data format so that it can be decoded according to the encoding method used in the embodiment. Furthermore, the feature map decoding device corresponding to this feature map encoding device can decode the bitstream of this specific data format.

[0111] When a wired or wireless network is used to exchange bitstreams between a feature map encoding device and a feature map decoding device, the bitstream may be converted to a data format suitable for the transmission mode of the communication channel before transmission. In this case, a transmitting device is provided that converts the bitstream output by the feature map encoding device into encoded data in a data format suitable for the transmission mode of the communication channel and transmits it to the network, and a receiving device is provided that receives the encoded data from the network, restores it to a bitstream, and supplies it to the feature map decoding device. The transmitting device includes a memory for buffering the bitstream output by the feature map encoding device, a packet processing unit for packetizing the bitstream, and a transmitting unit for transmitting the packetized encoded data over the network. The receiving device includes a receiving unit for receiving the packetized encoded data over the network, a memory for buffering the received encoded data, and a packet processing unit for packetizing the encoded data to generate a bitstream and providing it to the feature map decoding device.

[0112] The above encoding and decoding processes may be implemented not only as hardware-based transmission, storage, and receiving devices, but also by firmware stored in ROM (read-only memory) or flash memory, or by software on a computer. The firmware program or software program may be recorded on a recording medium readable by a computer and provided, provided from a server via a wired or wireless network, or provided as data broadcasting on terrestrial or satellite digital broadcasting.

[0113] The present invention has been described above based on embodiments. The embodiments are illustrative, and it will be understood by those skilled in the art that various modifications are possible in combinations of their components and processing processes, and that such modifications also fall within the scope of the present invention. [Explanation of Symbols]

[0114] 100 Feature map coding unit, 101 Neural network feature extraction unit, 102 Feature map reduction unit, 103 Feature map transformation unit, 104 Feature map internal coding unit, 105 Feature map reduction model registration unit, 106 Feature map reduction model storage unit, 107 Feature map reduction model selection unit, 108 Feature map restoration parameter coding unit, 200 Feature map decoding unit, 201 Feature map internal decoding unit, 202 Feature map inverse transformation unit, 203 Feature map restoration unit, 204 Neural network identification unit, 205 Feature map restoration parameter decoding unit, 206 Feature map restoration model registration unit, 207 Feature map restoration model selection unit, 208 Feature map restoration model storage unit, 301 Convolution processing unit, 302 Activation processing unit, 303 Pooling processing unit, 322 Bottom-up processing unit, 324 Top-down processing unit, 326 Image to be processed for feature extraction, 501 First feature map reduction unit, 502 First channel merging unit, 503 Second feature map reduction unit, 504 Second channel merging unit, 505 Third feature map reduction unit, 506 First padding unit, 507 Second padding unit, 508 Third padding unit, 601 8x enlargement unit, 602 4x enlargement unit, 603 2x enlargement unit, 604 First feature map mixing unit, 605 Second feature map mixing unit, 606 First padding removal unit, 607 Second padding removal unit, 608 Third padding removal unit, 701 Packing unit, 702 Feature map quantization unit, 801 Feature map inverse quantization unit, 802 Unpacking unit, 901 Switch, 902 VVC encoding unit, 903 HEVC encoding unit, 904 AV1 encoding unit, 1001 Switch, 1002 VVC decoding unit, 1003 HEVC decoding unit, 1004 AV1 decoding unit.

Claims

1. A feature map reduction model storage unit that stores multiple feature map reduction models and feature map restoration models, A first model index is determined that indicates the location where the feature map reduction model is registered in the feature map reduction model storage unit, and a feature map reduction model registration unit is established to register the feature map reduction model and a feature map restoration model corresponding to the feature map reduction model at the location indicated by the first model index. A feature map reduction model selection unit determines a second model index that identifies a feature map reduction model to be used for reducing feature maps, from the feature map reduction model storage unit. A feature map reduction unit that uses the feature map reduction model identified by the second model index to transform a multi-scale feature map and generate a single-scale feature map, A packing unit that packs the aforementioned single-scale feature map into a frame to generate a fractional-type packed feature frame, A feature map restoration parameter encoding unit that transmits information indicating the feature map restoration model identified by the first model index and the second model index, A feature map internal encoding unit that encodes the integer-type packing feature frame, Equipped with, A feature map coding device characterized by the following features.

2. A feature map reconstruction model storage unit that stores multiple feature map reconstruction models, A feature map restoration model storage unit stores a feature map restoration model, a first model index that specifies the location where the feature map restoration model is stored, and a feature map restoration parameter decoding unit that decodes a second model index that identifies the feature map restoration model to be used to restore the feature map. A feature map restoration model registration unit registers the feature map restoration model at the position indicated by the first model index in the feature map restoration model storage unit, A feature map restoration selection unit selects the feature map restoration model to be used to restore the feature map from the feature map restoration model storage unit based on the second model index, A feature map internal decoding unit that decodes the single-scale feature map packed into a frame and encoded using the feature map restoration model selected by the feature map restoration selection unit and generates an integer-type packed feature frame, An inverse quantization unit converts the elements of the integer-type packing feature frame to decimal values ​​to generate a decimal-type packing feature frame, An unpacking unit that unpacks the aforementioned fractional-type packing feature frame to generate the aforementioned single-scale feature map, A feature map reconstruction unit that converts the single-scale feature map to generate the multi-scale feature map, Equipped with, A feature map decoding device characterized by the following features.