Image information processing apparatus and method

By using multi-color channel masks and optical sensing chips to perform optical domain preprocessing on color images, the problems of high hardware cost and low computational efficiency are solved, achieving the effect of reducing hardware cost and improving processing efficiency.

CN116543058BActive Publication Date: 2026-06-19TSINGHUA UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TSINGHUA UNIVERSITY
Filing Date
2023-04-27
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies suffer from high hardware costs and low computational efficiency when processing color images, mainly because convolutional neural networks rely on computers and processors, resulting in large data processing volumes and computational complexity.

Method used

A multi-color channel mask is used to filter the light signal of the color image. A filter film is set on a transparent substrate. The filter film is designed in reverse optimization based on the target color image and the single-color channel convolution kernel of the convolutional neural network. The optical sensing chip receives and converts the signal into an electrical signal, and the processor processes it using a fully connected layer.

Benefits of technology

This reduces the amount of data processing required by hardware devices, lowers hardware costs, and improves convolution processing efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116543058B_ABST
    Figure CN116543058B_ABST
Patent Text Reader

Abstract

This application discloses an image information processing apparatus and method. The image information processing apparatus of this application includes: a multi-color channel mask, which includes a transparent substrate and a filter film disposed on the transparent substrate. The filter film is provided with a preset pattern. The multi-color channel mask is used to filter the light signal of a received target color image through the filter film to obtain filtered single-color channel information; an optical sensing chip, used to receive the filtered single-color channel information and convert it into electrical signals; and a processor, used to receive the electrical signals transmitted from the optical sensing chip and process the electrical signals using a fully connected layer of a target convolutional neural network to obtain the processing result of the target visual task of the target color image. According to the embodiments of this application, the convolution processing efficiency is improved and the hardware cost of processing color images is reduced.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of image recognition and image processing technology, and in particular relates to an image information processing device and method. Background Technology

[0002] Currently, machine vision is widely used in industrial inspection, smart homes, and various smart terminals.

[0003] With the advancements in processing power and parallel computing capabilities of modern graphics processing units (GPUs), deep learning based on convolutional neural networks (CNNs) has rapidly developed, demonstrating excellent performance in various artificial intelligence applications, such as image classification and detection of color images. However, current methods for processing color images typically involve inputting the color image into a CNN, which then performs the processing. CNNs, in turn, rely on computers and processors. This places high demands on the hardware resources of the computers and processors, thus increasing hardware costs. Furthermore, the large volume of data processed and the complexity of the computation reduce computational efficiency.

[0004] Therefore, how to reduce the hardware cost of processing color images and improve the processing efficiency of color images has become a problem to be solved. Summary of the Invention

[0005] This application provides an image information processing apparatus and method that can reduce the amount of data processing by hardware devices, lower the hardware cost of processing color images, and effectively improve the efficiency of convolution processing.

[0006] In a first aspect, embodiments of this application provide an image information processing apparatus, comprising:

[0007] A multi-color channel mask includes a transparent substrate and a filter film disposed on the transparent substrate. The filter film has a preset pattern, which is obtained by reverse optimization design based on the color information of the target color image and multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network. The single-color channel convolutional kernels are convolutional kernels obtained by optimizing the single-color channel convolutional kernels in the convolutional layer of the target convolutional neural network structure according to the target visual task. The multi-color channel mask is used to filter the light signal of the received target color image through the filter film to obtain filtered information of multiple single-color channels.

[0008] An optical sensing chip is used to receive the filtered multiple single-color channel information and convert the filtered multiple single-color channel information into electrical signals;

[0009] The processor is used to receive electrical signals transmitted from the optical sensing chip and process the electrical signals using the fully connected layer of the target convolutional neural network to obtain the processing result of the target visual task of the target color image.

[0010] The image information processing apparatus according to the embodiments of this application further includes:

[0011] A light generating component is used to generate a light signal carrying the target color image, wherein the light signal is an incoherent light signal.

[0012] According to embodiments of this application, the light generating component is a plurality of point light sources, and the light signal includes signals generated by the plurality of point light sources; or

[0013] The light-generating component is a display, and the light signal includes a multi-pixel image signal generated by the display.

[0014] According to embodiments of this application, the distance between two adjacent point light sources or two adjacent pixels is calculated using the following formula:

[0015]

[0016] Where, d L d represents the distance between two adjacent point light sources or two adjacent pixels. LM d is the distance between the light-generating component and the multi-color channel mask. MS Δ is the distance between the multi-color channel mask and the optical sensor chip, and Δ is the size of a single pixel on the optical sensor chip; a single pixel on the optical sensor chip is equivalent to a single pixel generated after the convolution calculation of the convolutional network layer.

[0017] According to an embodiment of this application, the spatial size of a single pixel on the optical sensing chip is determined based on the relationship between the geometric blurring during the transmission of the optical signal and the diffraction blurring generated by the optical signal through the multi-color channel mask.

[0018] According to an embodiment of this application, the geometric fuzziness d1 = Δ, and the diffraction fuzziness d2 = 2.44λd MS / Δ; Spatial size of a single pixel on the optical sensing chip

[0019] In a second aspect, embodiments of this application provide an image information processing method, applied to an image information processing apparatus as described in any one of the first aspects, the method comprising:

[0020] The light signal of the received target color image is filtered by the filter film of the multi-color channel mask to obtain multiple single-color channel information. The multi-color channel mask includes a transparent substrate and the filter film disposed on the transparent substrate. The filter film is provided with a preset pattern. The preset pattern is obtained by reverse optimization design based on the color information of the target color image and multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network. The single-color channel convolutional kernels are convolutional kernels obtained by optimizing the single-color channel convolutional kernels in the convolutional layer of the target convolutional neural network structure according to the target visual task.

[0021] The optical sensing chip receives the filtered single-color channel information and converts the filtered single-color channel information into electrical signals.

[0022] The processor receives the electrical signal transmitted from the optical sensing chip and processes the electrical signal using the fully connected layer of the target convolutional neural network to obtain the processing result of the target visual task of the target color image.

[0023] According to an embodiment of this application, the image information processing method further includes: generating an optical signal carrying the target color image through the light generating component, wherein the optical signal is an incoherent optical signal.

[0024] This application provides an image information processing method: wherein the light generating component is a plurality of point light sources, and the light signal is a signal generated by the plurality of point light sources; or

[0025] The light-generating component is a display, and the light signal includes a multi-pixel image signal generated by the display.

[0026] This application provides an image information processing method, wherein the distance between two adjacent point light sources or two adjacent pixels is calculated using the following formula:

[0027]

[0028] Where, d L d represents the distance between two adjacent point light sources or two adjacent pixels. LM d is the distance between the light-generating component and the multi-color channel mask. MS Δ is the distance between the multi-color channel mask and the optical sensor chip, and Δ is the size of a single pixel on the optical sensor chip; a single pixel on the optical sensor chip is equivalent to a single pixel generated after the convolution calculation of the convolutional network layer.

[0029] The image information processing apparatus and method of this application embodiment can filter the light signal of a color image by setting a filter film on a transparent substrate of a multi-color channel mask and setting a preset pattern on the filter film. Since the preset pattern is obtained by reverse optimization design based on the color information of the target color image and multiple different single-color channel convolution kernels on the convolutional network layer of the target convolutional neural network, the filtered multiple single-color channel information is equivalent to the multiple single-color channel information obtained by the convolutional network layer convolutional processing the target color image. This realizes the preprocessing of the color image in the optical domain, reduces the data processing volume of the back-end data processing equipment, and thus effectively improves the convolutional processing efficiency and reduces the hardware cost of processing color images. Attached Figure Description

[0030] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0031] Figure 1 This is a schematic diagram of an image information processing apparatus provided in one embodiment of this application;

[0032] Figure 2 This is a schematic diagram illustrating the principle of image information filtering using a mask as proposed in this application;

[0033] Figure 3 This is a schematic diagram of the image information presented in this application;

[0034] Figure 4 This is a schematic diagram of the result obtained after processing the image information proposed in this application using a mask;

[0035] Figure 5 This is a schematic diagram of the principle of optical multi-color channel convolution proposed in this application;

[0036] Figure 6 This is a schematic diagram of the color film transmission spectrum of the multi-color channel mask proposed in this application;

[0037] Figure 7 This is a schematic diagram showing the relationship between the target object, the multi-color channel mask, and the optical sensor chip proposed in this application;

[0038] Figure 8 This is a schematic flowchart of an image information processing method provided in one embodiment of this application;

[0039] Figure 9 This is a schematic diagram of the architecture corresponding to an application scenario embodiment of the image information processing device provided in this application.

[0040] Figure Labels

[0041] 1-Multi-color channel mask; 2-Target color image; 3-Optical sensor chip; 4-Processor; 5-Light generating component. Detailed Implementation

[0042] The features and exemplary embodiments of various aspects of this application will be described in detail below. To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only intended to explain this application and not to limit it. For those skilled in the art, this application can be implemented without some of these specific details. The following description of the embodiments is merely to provide a better understanding of this application by illustrating examples.

[0043] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising..." does not exclude the presence of additional identical elements in the process, method, article, or apparatus that includes the element.

[0044] Image processing technology is the technique of processing image information using computers. It mainly includes image digitization, image enhancement and restoration, image data encoding, image segmentation, and image recognition. Currently, image processing technology is primarily based on computers. Artificial Neural Networks (ANNs) are mathematical models that simulate the structure and behavior of biological nervous systems to perform distributed parallel information processing. ANNs achieve information processing by adjusting the weight relationships between their internal neurons. When processing images, the computer uses the original image or an appropriately preprocessed image as the input signal to the neural network, and obtains the processed image signal or classification result at the output of the neural network.

[0045] Convolutional Neural Networks (CNNs) are a type of feedforward neural network that excels in image processing. The basic structure of a CNN consists of an input layer, convolutional layers, pooling layers (also called sampling layers), fully connected layers, and an output layer. Several convolutional and pooling layers are typically used, alternating between them; that is, one convolutional layer is connected to one pooling layer, and vice versa, and so on. In a convolutional layer, each neuron in the output feature map is locally connected to its input. The input value of that neuron is obtained by weighting the local input with the corresponding connection weights and adding a bias value; this process is similar to the calculation of convolution.

[0046] Currently, machine vision is widely used in industrial inspection, smart homes, and various smart terminals.

[0047] With the advancements in processing power and parallel computing capabilities of modern graphics processing units (GPUs), deep learning based on convolutional neural networks (CNNs) has rapidly developed, demonstrating excellent performance in various artificial intelligence applications, such as image classification and detection of color images. However, current methods for processing color images typically involve inputting the color image into a CNN, which then performs the processing. CNNs, in turn, rely on computers and processors. This places high demands on the hardware resources of the computers and processors, thus increasing hardware costs. Furthermore, the large volume of data processed, the complexity of the computation, and the computational latency of the devices all contribute to reduced computational efficiency.

[0048] To improve computational efficiency, optical computing is considered a potential breakthrough in overcoming the limitations of electronic computing. The parallelism, high speed, and low loss of light can significantly increase computational speed while reducing energy consumption and latency. However, most current optical neural network architectures require monochromatic coherent lasers as their light source, making them unsuitable for natural light environments. Furthermore, they primarily target grayscale information and cannot process color image information, limiting their practicality. To further enhance practicality, hybrid optoelectronic neural network structures have been proposed, but these are mainly based on hardware systems using lens groups and filters (such as 4f systems), resulting in bulky form factors that are difficult to deploy in applications such as autonomous driving, robotics, or other IoT peripherals.

[0049] To address the aforementioned issues, this application proposes an image information processing method, apparatus, device, and computer storage medium. This method enables the filtering of light signals from a color image by setting a filter film on a transparent substrate of a multi-color channel mask and a preset pattern on the filter film. Since the preset pattern is designed through reverse optimization based on the color information of the target color image and multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network, the filtered single-color channel information is equivalent to the single-color channel information obtained by the convolutional network layer convolving the target color image. This achieves pre-processing of the color image in the optical domain, reducing the data processing load of the backend data processing equipment, thereby effectively improving convolutional processing efficiency and reducing the hardware cost of processing color images.

[0050] To address the problems of the prior art, this application provides an image information processing apparatus and method. The image information processing apparatus and method provided in this application will be described below.

[0051] Figure 1 A schematic diagram of an image information processing apparatus according to an embodiment of this application is shown. Figure 1 As shown, the image information processing device includes:

[0052] Multi-color channel mask 1 includes a transparent substrate and a filter film disposed on the transparent substrate. The filter film is provided with a preset pattern. The preset pattern is obtained by reverse optimization design based on the color information of the target color image 2 and multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network. The single-color channel convolutional kernel is a convolutional kernel obtained by optimizing the single-color channel convolutional kernels in the convolutional layer of the target convolutional neural network structure according to the target visual task. Multi-color channel mask 1 is used to filter the light signal of the received target color image 2 through the filter film to obtain filtered information of multiple single-color channels.

[0053] The optical sensor chip 3 is used to receive filtered information from multiple single color channels and convert the filtered information from multiple single color channels into electrical signals.

[0054] Processor 4 is used to receive electrical signals transmitted from optical sensor chip 3 and process the electrical signals using the fully connected layer of the target convolutional neural network to obtain the processing result of the target visual task of target color image 2.

[0055] The image information processing apparatus of this application uses a multi-color channel mask to filter the light signal of a received target color image, obtaining multiple filtered single-color channel information. Since the preset pattern is designed through reverse optimization based on the color information of the target color image and multiple different single-color channel convolution kernels on the convolutional network layer of the target convolutional neural network, the filtered single-color channel information is equivalent to the single-color channel information obtained by convolutional processing of the target color image by the convolutional network layer. In other words, it achieves the same effect as convolutional processing of the target color image. Therefore, the optical sensing chip does not need to detect the color information of the color image, reducing chip manufacturing costs. Furthermore, since the color image has been pre-filtered in the optical domain using a multi-color channel mask, the data processing load of the back-end data processing equipment is reduced, thereby lowering the hardware cost of processing color images.

[0056] It is understood that the carrier of the target color image 2 described in this application can directly come from the target object, such as a person or object. The target object reflects light, causing the light signal carrying the target color image 2 to be incident as incident light into the multi-color channel mask 1. The target color image 2 described in this application can come from the light generating component 5. For example, the light generating component 5 can be a display for displaying the target color image 2, or it can be a point light source or a combination of multiple point light sources. In any application scenario that can emit light and / or reflect light to the multi-color channel mask 1, the image information processing device of this application can be used to perform convolution processing operations. It should be understood that the light described in this application can refer to various forms of visible or invisible light, such as diffuse light, monochromatic light, and polychromatic light.

[0057] As one example of a possible implementation, the multi-color channel mask 1 includes a transparent substrate and a filter film disposed on the transparent substrate, the filter film having a preset pattern. The light signal of the target color image 2 passes through the multi-color channel mask 1 as incident light, and the multi-color channel mask 1 filters the light signal to obtain filtered information for multiple single-color channels.

[0058] As an example, the aforementioned transparent substrate can be a transparent glass substrate or a transparent resin substrate. The transparent glass substrate can be quartz glass, soda glass, or low-expansion glass. The opaque areas of the aforementioned filter film can be rigid light-blocking films or latex light-blocking films. The areas of the filter film that transmit some color light signals can be substrate materials with optical films coated on their surfaces, such as PET (polyester) films, PC (polycarbonate) films, PMMA (polymethyl methacrylate) films, etc.

[0059] The preset pattern described in this application is obtained by reverse optimization design based on the color information of the target color image 2 and multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network. The single-color channel convolutional kernel is obtained by optimizing the single-color channel convolutional kernels in the convolutional layer of the target convolutional neural network structure according to the target visual task.

[0060] As an example, the filter film with the preset pattern described above can be determined through the following steps:

[0061] Step 1: Determine the target convolutional neural network structure model based on the target vision task.

[0062] It should be noted that the aforementioned target visual task can be a classification task, localization task, detection task, or task segmentation task for color images, such as CIFAR, Fruit360, and other classification tasks; the aforementioned target convolutional neural network can be a commonly used classification network structure such as VGG16 or ResNet18.

[0063] Step 2: Adjust the floating-point convolutional kernel in the first convolutional layer of the target convolutional neural network to a binary convolutional kernel to obtain the adjusted first convolutional layer; or, add a binary convolutional layer corresponding to the first convolutional layer on the first convolutional layer, and determine the binary convolutional layer as the adjusted first convolutional layer.

[0064] It should be noted that the data obtained through the first convolutional layer can determine whether light of the corresponding color channel passes through the mask based on the value of each pixel in each convolutional kernel of the first convolutional layer. Therefore, the result obtained directly from the floating-point convolutional kernel in the first convolutional layer is not accurate. It can be adjusted to a binary convolutional kernel, and then the transmission status of the color channel of each of the multiple different single-color-channel convolutional kernels can be more accurately determined based on the binary convolutional kernel.

[0065] Step 3: Perform reverse optimization on multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network based on the color information of the target color image 2.

[0066] As an example of a possible implementation, the loss value between the prediction result obtained by the target convolutional neural network based on the target color image 2 and the actual result can be calculated according to a preset loss function. Based on the loss value, multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network can be back-optimized.

[0067] As an example, in step 3, the network parameters of the model on other datasets can also be loaded as pre-training weights (e.g., the network parameters of the ResNet18 network can be loaded from the ImageNet dataset), and these parameters can be used as initial values ​​to optimize the single-color-channel convolutional kernels in the convolutional layers of the target convolutional neural network structure.

[0068] Step 4: Obtain multiple different single-color channel convolution kernels after reverse optimization; wherein, the multiple different single-color channel convolution kernels can be the convolution kernels in the first convolutional layer of the convolutional network.

[0069] Step 5: Based on the values ​​of different pixel positions of multiple different single-color channel convolution kernels, determine the color channel transmission of each pixel position of the multiple different single-color channel convolution kernels.

[0070] It is understandable that each color channel has its own convolution kernel. For each color channel, if one of the pixels in the convolution kernel corresponding to that color channel has a value of 0, then the filter area corresponding to that pixel does not transmit light of that color; if one of the pixels in the convolution kernel has a value of 1, then the filter area corresponding to that pixel transmits light of that color.

[0071] Step 6: Determine the above preset pattern based on the patterns of multiple different single-color channel convolution kernels in the first convolutional layer.

[0072] As one possible implementation example, the colors of different regions in the preset pattern can be obtained from the color channels of multiple different single-color-channel convolution kernels in the first convolutional layer. For example, the color channels of the same pixel location region belonging to the same convolution kernel can be superimposed to obtain a superimposed multi-color channel; the wavelength of light corresponding to the multi-color channel of the pixel location region is determined, and the wavelength of light allowed to pass through different regions on the multi-color channel mask 1 is determined based on the wavelength of light, thereby determining the type of the filter film.

[0073] It should be noted that the point spread function (PSF) describes the response of an optical system to reflected light (point light source) from an object, or the impulse response of a system. It is the spatial domain form of the optical transfer function of an imaging system. An optical imaging system with incoherent light can be understood as a linear system. The image of a complex object formed by an optical system can be regarded as the convolution of the real object and the system's PSF. The image of multiple objects is equal to the sum of the independent images of each object.

[0074] According to the above definition, the PSF of an optical system is generally measured by treating a point light source as the object; in this case, the image of the system is the PSF. Let's take a grayscale mask as an example... Figure 2 From this, we can see that if the target object is as follows: Figure 2 (a) shows a single point light source, and the pattern etched on the mask is as follows: Figure 2 As shown in (b), the PSF of the system is the pattern etched on the mask, that is, the effect of the system's PSF is as follows: Figure 2 As shown in (c). If the target object changes from a single point light source to... Figure 2 (c) shows an image composed of multiple point light sources, the light signals of which pass through... Figure 2 Behind the mask shown in (e), the image detected by the optical sensor chip 3 is the information obtained by filtering the signal light of the object image through the mask, and its effect is as follows: Figure 2 As shown in (f). That is, when the pattern on the mask is the pattern of the set convolution kernel, the information detected by the optical sensor chip 3 can be equivalent to the information obtained by performing convolution operation on the above object image.

[0075] For example, such as Figure 3 As shown, there are two light sources: a first light source located at (1,1) in the planar coordinate system and a second light source located at (4,4) in the planar coordinate system. The horizontal and vertical distances between the centers of the first and second light sources are both 3 units.

[0076] for Figure 3 The image information in the image, after being processed by a mask, is as follows: Figure 4 As shown.

[0077] Assuming the mask apertures on the photomask have a binary distribution of 0 and 1, where 0 represents opaque and 1 represents fully transparent, then the convolution kernel corresponding to the mask apertures on the photomask is:

[0078]

[0079] Since the horizontal and vertical spacing between the center of the first light source and the center of the second light source are both 3, in an ideal situation, the horizontal and vertical spacing between the center of the first convolution result after the first light source is convolved and the center of the second convolution result after the second light source is convolved are both 3Δ. The size of a single pixel on the optical sensing chip 3 is Δ, also known as the feature size.

[0080] In related technologies, the optical sensing chip 3 uses a color camera chip, which has a color filter array (CFA) in front of its photosensitive surface. A Bayer array is arranged on the CFA to filter multiple color channels (e.g., R, G, and B channels). The image captured by the color camera chip is interpolated to obtain the RGB three-channel light intensity value of each pixel. However, since the light intensity value of each pixel is obtained through interpolation, there is an error between the above light intensity value and the actual light intensity value. Furthermore, the above interpolation method requires a computer, processor 4, and other equipment to perform, reducing computational efficiency.

[0081] In this embodiment, by setting a pattern corresponding to the convolution kernel on the filter film of the multi-color channel mask 1, different areas of the multi-color channel mask 1 can transmit light corresponding to the color of the convolution kernel, and the same area of ​​the multi-color channel mask 1 can transmit light of multiple colors simultaneously. Thus, after the light signal of the target color image 2 is filtered by the multi-color channel mask 1, the first information obtained is equivalent to the second information obtained by convolving and summing the components of each channel of the target color image 2. Therefore, the optical sensing chip 3 can directly detect the light intensity information of the corresponding pixel, eliminating the need for a color filter array in the optical sensing chip 3 and eliminating the need for interpolation processing of the captured image. This reduces the cost and manufacturing difficulty of the chip, and also saves computational load and energy consumption in the chip's digital signal processing (ISP, ImageSignal Processor) process.

[0082] To facilitate understanding, the principle of the equivalence between the first and second information mentioned above will be explained using a convolution kernel with R, G, and B channels as an example. Figure 5 As shown, the transmission spectra of the convolution kernels of the R, G, and B channels at their corresponding spatial positions are superimposed, and the components of each RGB channel of the target color image 2 are converted using formula (1). R I G I B Each channel's PSF is convolved with the PSF of its respective channel, and then the results are summed.

[0083]

[0084] From the above formula, it can be concluded that the processing result obtained after filtering a color image containing RGB three channels through the multi-color channel mask 1 is equivalent to the components (I) of each RGB channel of the color image. R I G I B The result is obtained by convolving each PSF with its corresponding channel and then summing the results.

[0085] It is understood that the filter film of the multi-color channel mask 1 proposed in this application can be designed for the three RGB channels, or for the five CMYK channels.

[0086] Taking a multi-color channel mask designed for the three RGB channels as an example, such as Figure 6 As shown, a multi-color channel mask can include any of the following types: a filter that blocks visible light from all three RGB channels, a filter that only transmits visible light from the R channel, a filter that only transmits visible light from the G channel, a filter that only transmits visible light from the B channel, a filter that transmits visible light from the RG channel, a filter that transmits visible light from the GB channel, and a filter that transmits visible light from the RB channel. Alternatively, filters can be omitted at certain locations on the transparent substrate of the multi-color channel mask, allowing visible light from all three RGB channels to pass through the multi-color channel mask.

[0087] As an example of a possible implementation, the filter film on the transparent substrate may be obtained by depositing a color channel mask on the transparent substrate of the photomask; it may also be obtained by etching the color channel mask disposed on the transparent substrate; or it may be obtained by attaching a color channel mask to the transparent substrate of the photomask.

[0088] In some embodiments of this application, the optical sensing chip 3 is used to receive filtered information from multiple single-color channels and convert this information into electrical signals. The optical sensing chip 3 can be a metal-oxide-semiconductor (CMOS) device. Each photosensitive element in a CMOS directly integrates an amplifier and analog-to-digital conversion logic. When a photodiode receives light and generates an analog electrical signal, the signal is first amplified by the amplifier in the photosensitive element and then directly converted into a corresponding digital signal. The main purpose of a CMOS-type optical sensing chip is to convert the acquired light signal into an electrical signal that can be processed by subsequent circuits or a computer. Any device that can convert light signals into signals that can be used by processing equipment can be considered an optical sensing chip 3 as described in this application.

[0089] It should be noted that since the light signal of the target color image 2 is filtered and processed by the multi-color channel mask 1 to achieve the same effect as convolution processing of the target color image 2, the optical sensing chip 3 described in this application does not need to detect the light signal of the color image through a color camera chip, thus reducing the chip manufacturing cost. Furthermore, this application employs a lensless image information processing device, effectively reducing the size, weight, and cost of the image information processing device.

[0090] Processor 4 is used to receive electrical signals transmitted from optical sensor chip 3 and process the electrical signals using the fully connected layer of the target convolutional neural network to obtain the processing result of the target visual task of target color image 2.

[0091] As a possible example, processor 4 can input the electrical signals transmitted from optical sensor chip 3 into the pooling layer and fully connected layer of the target convolutional neural network to process the electrical signals and obtain the processing results of the target color image 2 for the target visual task, so as to complete various tasks such as object classification, recognition, and detection. For example, it can classify images, recognize faces or other objects, and detect whether a certain type or several types of specific objects exist in an image for specific tasks.

[0092] As another possible example, before processor 4 processes the electrical signal using the fully connected layers of the target convolutional neural network, a pooling layer can also be used to pool the signal.

[0093] The image information processing apparatus of this application uses a multi-color channel mask to filter the light signal of a received target color image, obtaining multiple filtered single-color channel information. Since the preset pattern is designed through reverse optimization based on the color information of the target color image and multiple different single-color channel convolution kernels on the convolutional network layer of the target convolutional neural network, the filtered single-color channel information is equivalent to the single-color channel information obtained by convolutional processing of the target color image by the convolutional network layer. In other words, it achieves the same effect as convolutional processing of the target color image. Therefore, the optical sensing chip does not need to detect the color information of the color image, reducing chip manufacturing costs. Furthermore, since the color image has been pre-filtered in the optical domain using a multi-color channel mask, the data processing load of the back-end data processing equipment is reduced, thereby lowering the hardware cost of processing color images.

[0094] In order to improve the accuracy of the installation positions of various components in the image information processing device and reduce errors in the image information processing results caused by inaccurate installation positions, such as... Figure 1 As shown, in some embodiments of this application, the image information processing apparatus may further include:

[0095] The light generating component 5 is used to generate a light signal carrying a target color image, wherein the light signal is an incoherent light signal.

[0096] When using the light generating component 5 to generate the light signal of the target color image, the light signal of the target color image is an incoherent light signal. The light generating component 5 can be multiple point light sources, and the light signal of the target color image is generated by these multiple point light sources; alternatively, the light generating component 5 can be a display, and the light signal of the target color image is a multi-pixel image signal generated by the display. A multi-color channel mask receives the light signal of the target color image and filters the received light signal to obtain filtered information for multiple single-color channels.

[0097] In some embodiments of this application, the distance between two adjacent point light sources or two adjacent pixels is calculated using formula (2):

[0098]

[0099] Where, d L d represents the distance between two adjacent point light sources or two adjacent pixels. LM d is the distance between the light-generating component and the multi-color channel mask. MS Δ represents the distance between the multi-color channel mask and the optical sensor chip, and Δ represents the size of a single pixel on the optical sensor chip; a single pixel on the optical sensor chip is equivalent to a single pixel generated after the convolution calculation of the convolutional network layer.

[0100] In some embodiments of this application, the spatial size of a single pixel on the optical sensing chip is determined based on the relationship between geometric blurring during optical signal transmission and diffraction blurring generated by the optical signal passing through a multi-color channel mask.

[0101] In some embodiments of this application, the geometric fuzziness d1 = Δ(3), and the diffraction fuzziness d2 is calculated using formula (4):

[0102] d2=2.44λd MS / Δ (4)

[0103] In some embodiments of this application, the spatial size Δ of a single pixel on the optical sensing chip is determined by formula (5):

[0104]

[0105] The image information processing device of this application determines the correspondence between the distance between the light generating component and the multi-color channel mask, and between the multi-color channel mask and the optical sensing chip, by using the distance between two adjacent point light sources or two adjacent pixels and the size of a single pixel on the optical sensing chip. This enables the determination of the installation position of each component in the image information processing device, thereby improving the accuracy of the installation position of each component and reducing errors in the image information processing results caused by inaccurate installation positions. In addition, since the larger the target color image size, the larger the volume of the entire optical domain processing system, the value of Δ can be determined to be a minimum value before it is feasible, based on the calculation formula (5) of the spatial size of a single pixel on the optical sensing chip, so as to achieve the goal of making the entire system as compact as possible.

[0106] Figure 7 This is a schematic diagram illustrating the relationship between the distances between the target object (or light generating component), the multi-color channel mask, and the optical sensing chip according to an embodiment of this application.

[0107] Assume the distance between two adjacent point light sources or two adjacent pixels is d. L The distance between the light-generating component and the multi-color channel mask (per unit distance) is d. LM The distance between the multi-color channel mask and the optical sensor chip is d. MS The spatial size of a single pixel equivalent to the convolution kernel is Δ (i.e., the feature size). Since light travels in a straight line, the corresponding light source spacing d can be calculated according to the theory of triangle similarity (i.e., ΔABC ~ ΔFEC). L :

[0108]

[0109] For incoherent optical systems, the optical system is affected by diffraction blur and geometric blur.

[0110] The geometric blur d1 during light transmission and the diffraction blur d2 caused by light passing through the mask can be calculated using the following formulas:

[0111] d1=Δ (3)

[0112] d2=2.44λd MS / Δ (4)

[0113] Where λ is the wavelength of light, d MS This is the distance between the photomask and the optical sensor chip.

[0114] For incoherent optical systems, diffraction is not significant. The geometric blurring during optical image signal transmission is greater than or equal to the diffraction blurring caused by the optical image signal passing through the mask, i.e., d1 ≥ d2. After processing, the characteristic size can be obtained. That is, the spatial size Δ of a single pixel on the optical sensor chip can be determined based on the relationship between the geometric blur d1 during the transmission of the optical image signal and the diffraction blur d2 generated by the optical image signal through the mask. Meanwhile, according to formula (2), it can be seen that the larger the feature size, the larger the volume of the entire optical domain processing system. Considering that the entire system should be as compact as possible, the value of Δ should be as small as feasible.

[0115] For example, for d MS For an example where the wavelength of visible light is λ = 550nm and the wavelength is 1.02mm, Δ ≥ 37μm can be obtained by formula (5). In practical applications, to make the calculation and processing simpler, the feature size Δ can also be rounded up, i.e., the feature size is taken as 40μm.

[0116] Considering that the entire system should be as compact as possible, the distance d between the photomask and the optical sensor chip... MS The distance should be kept as small as possible within structural limits. After arranging the photomask and the optical sensor chip as close as possible, the distance between them can be obtained by measurement. Alternatively, the distance measurement can be achieved using various high-precision ranging methods such as high-precision rulers, electrical ranging, and optical ranging.

[0117] Therefore, under a fixed optical domain convolution system (Δ, d) MS (As determined by the above formula or measurement), d can be determined according to the actual target object or the size of the light-generating component and the specific application, based on formula (2). LM And the specific location distribution of the target object or light-generating components.

[0118] In some embodiments of this application, due to installation errors that may occur during the installation of the image information processing device, Δ and d MS The calculation and measurement may also introduce errors. In order to ensure the accuracy of the image information processing device, the image information processing device can be fine-tuned according to the convolution result.

[0119] For example, for Figure 3 and Figure 4In the example, for a first light source located at (1,1) in the planar coordinate system and a second light source located at (4,4) in the planar coordinate system, theoretically, the horizontal and vertical spacing between the center of the first convolution result after convolution of the first light source and the center of the second convolution result after convolution of the second light source should both be 3Δ. If, due to system errors, it is found that the horizontal and vertical spacing between the center of the first convolution result and the center of the second convolution result does not conform to 3Δ after convolution processing, the system is fine-tuned so that the final actual result matches the theoretical result, thereby verifying that the processing accuracy of the image information processing device can meet the usage requirements.

[0120] It should be understood that the system fine-tuning process here is to verify whether the processing results during the initial system installation deviate from the theoretical values. If errors exist, fine-tuning is used to reduce them. For systems that have already been debugged and verified, this fine-tuning process is not necessary.

[0121] The image information processing apparatus according to the embodiments of this application can filter the light signal of a color image by setting a filter film on a transparent substrate of a multi-color channel mask and setting a preset pattern on the filter film. Since the preset pattern is obtained by reverse optimization design based on the color information of the target color image and multiple different single-color channel convolution kernels on the convolutional network layer of the target convolutional neural network, the filtered multiple single-color channel information is equivalent to the multiple single-color channel information obtained by the convolutional network layer convolutionally processing the target color image. This realizes the preprocessing of the color image in the optical domain, reduces the data processing volume of the back-end data processing equipment, effectively improves the convolutional processing efficiency, and reduces the hardware cost of processing color images.

[0122] Based on the image information processing apparatus provided in the above embodiments, this application also provides an image information processing method. Figure 8 A schematic diagram of an image information processing method provided in one embodiment of this application is shown.

[0123] like Figure 8 As shown, an image information processing method applied to an image information processing apparatus proposed in any embodiment of this application includes:

[0124] Step 801: The light signal of the received target color image is filtered by the filter film of the multi-color channel mask to obtain multiple single-color channel information after filtering.

[0125] In this embodiment, the multi-color channel mask includes a transparent substrate and a filter film disposed on the transparent substrate. The filter film is provided with a preset pattern. The preset pattern is obtained by reverse optimization design based on the color information of the target color image and multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network. The single-color channel convolutional kernel is a convolutional kernel obtained by optimizing the single-color channel convolutional kernels in the convolutional layer of the target convolutional neural network structure according to the target visual task.

[0126] Step 802: Receive the filtered single-color channel information through the optical sensor chip and convert the filtered single-color channel information into electrical signals.

[0127] Step 803: The processor receives the electrical signal transmitted from the optical sensor chip and processes the electrical signal using the pooling layer and fully connected layer of the target convolutional neural network to obtain the processing result of the target visual task of the target color image.

[0128] In some embodiments of this application, the image information processing method may further include: generating an optical signal carrying a target color image through an optical generating component, wherein the optical signal is an incoherent optical signal.

[0129] In some embodiments of this application, the light generating component is a plurality of point light sources, and the light signal is a signal generated by the plurality of point light sources; or the light generating component is a display, and the light signal includes a multi-pixel image signal generated by the display.

[0130] In some embodiments of this application, the distance between two adjacent point light sources or two adjacent pixels is calculated using the following formula:

[0131]

[0132] Where, d L d represents the distance between two adjacent point light sources or two adjacent pixels. LM d is the distance between the light-generating component and the multi-color channel mask. MS Δ represents the distance between the multi-color channel mask and the optical sensor chip, and Δ represents the size of a single pixel on the optical sensor chip; a single pixel on the optical sensor chip is equivalent to a single pixel generated after the convolution calculation of the convolutional network layer.

[0133] The image information processing method according to the embodiments of this application can filter the light signal of a color image by setting a filter film on a transparent substrate of a multi-color channel mask and setting a preset pattern on the filter film. Since the preset pattern is obtained by reverse optimization design based on the color information of the target color image and multiple different single-color channel convolution kernels on the convolutional network layer of the target convolutional neural network, the filtered multiple single-color channel information is equivalent to the multiple single-color channel information obtained by the convolutional network layer convolutionally processing the target color image. This realizes the preprocessing of the color image in the optical domain, reduces the data processing volume of the back-end data processing equipment, effectively improves the convolutional processing efficiency, and reduces the hardware cost of processing color images.

[0134] Figure 9 This is a schematic diagram of the architecture corresponding to an application scenario embodiment of the image information processing device provided in this application.

[0135] like Figure 9 As shown, before creating a multi-color channel mask, a dataset for training the target convolutional neural network can be determined based on the target visual task. The target convolutional neural network is trained based on the dataset. The loss value between the predicted category and the true category of the output of the target convolutional neural network is determined based on a preset loss function. The target convolutional neural network is then back-optimized based on the loss value, thereby obtaining multiple different single-color channel convolutional kernels in the convolutional network layers of the target convolutional neural network. This determines the pattern of the multi-color channel mask, and the multi-color channel mask is then created based on the pattern.

[0136] After the multi-color channel mask is fabricated, it filters the light signal of the target color image. The optical sensor chip receives the filtered light signal of the target color image and converts it into an electrical signal to obtain a feature map. The processor acquires the feature map and inputs it into the pooling layer and fully connected layer of the convolutional neural network to process the electrical signal, thereby obtaining the classification result of the target color image.

[0137] This application uses specific terms to describe embodiments of the application. Terms such as "first / second embodiment," "an embodiment," and / or "some embodiments" refer to a particular feature, structure, or characteristic associated with at least one embodiment of the application. Therefore, it should be emphasized and noted that references to "an embodiment," "one embodiment," or "an alternative embodiment" in different locations throughout this specification do not necessarily refer to the same embodiment. Furthermore, certain features, structures, or characteristics in one or more embodiments of the application can be appropriately combined.

[0138] Unless otherwise defined, all terms used herein (including technical and scientific terms) shall have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. It should also be understood that terms such as those defined in a common dictionary shall be interpreted as having a meaning consistent with their meaning in the context of the relevant art, and not as having an idealized or highly formalized meaning, unless expressly defined herein.

[0139] The foregoing description is illustrative of the invention and should not be construed as limiting it. Although several exemplary embodiments of the invention have been described, those skilled in the art will readily understand that many modifications can be made to the exemplary embodiments without departing from the novel teachings and advantages of the invention. Therefore, all such modifications are intended to be included within the scope of the invention as defined in the claims. It should be understood that the foregoing description is illustrative of the invention and should not be construed as limiting it to the specific embodiments disclosed, and modifications to the disclosed embodiments and other embodiments are intended to be included within the scope of the appended claims. The invention is defined by the claims and their equivalents.

Claims

1. An image information processing apparatus characterized by comprising: include: A multi-color channel mask includes a transparent substrate and a filter film disposed on the transparent substrate. The filter film has a preset pattern, which is obtained by reverse optimization design based on the color information of the target color image and multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network. The single-color channel convolutional kernels are convolutional kernels obtained by optimizing the single-color channel convolutional kernels in the convolutional layer of the target convolutional neural network structure according to the target visual task. The multi-color channel mask is used to filter the light signal of the received target color image through the filter film to obtain filtered information of multiple single-color channels. An optical sensing chip is used to receive the filtered multiple single-color channel information and convert the filtered multiple single-color channel information into electrical signals; The processor is used to receive electrical signals transmitted from the optical sensing chip and process the electrical signals using the fully connected layer of the target convolutional neural network to obtain the processing result of the target visual task of the target color image. The preset pattern is obtained by reverse optimization design based on the color information of the target color image and multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network, including: Based on the target visual task, determine the target convolutional neural network structure model; The floating-point convolutional kernel in the first convolutional layer of the target convolutional neural network is adjusted to a binary convolutional kernel to obtain the adjusted first convolutional layer; or, a binary convolutional layer corresponding to the first convolutional layer is added on the first convolutional layer, and the binary convolutional layer is determined as the adjusted first convolutional layer. According to the preset loss function, the loss value between the prediction result obtained by the target convolutional neural network based on the target color image and the actual result is calculated, and the multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network are reverse optimized based on the loss value. Obtain multiple different single-color channel convolutional kernels after reverse optimization; wherein, the multiple different single-color channel convolutional kernels are the convolutional kernels in the first convolutional layer of the convolutional network layer; Based on the values ​​of multiple different single-color channel convolution kernels at different pixel positions, the color channel transmission of multiple different single-color channel convolution kernels at different pixel positions is determined; The preset pattern is determined based on the patterns of multiple different single-color channel convolution kernels in the first convolutional layer.

2. The image information processing apparatus according to claim 1, characterized by, Also includes: A light generating component is used to generate a light signal carrying the target color image, wherein the light signal is an incoherent light signal.

3. The image information processing apparatus according to claim 2, characterized by, The light generating component comprises multiple point light sources, and the light signal includes signals generated by the multiple point light sources; or The light-generating component is a display, and the light signal includes a multi-pixel image signal generated by the display.

4. The image information processing apparatus according to claim 3, characterized by The distance between two adjacent point light sources or two adjacent pixels is calculated using the following formula: in, The distance between two adjacent point light sources or two adjacent pixels. The distance between the light-generating component and the multi-color channel mask. The distance between the multi-color channel mask and the optical sensor chip. The spatial size of a single pixel on the optical sensing chip; a single pixel on the optical sensing chip is equivalent to a single pixel generated after the convolution calculation of the convolutional network layer.

5. The image information processing apparatus according to claim 4, characterized by, The spatial size of a single pixel on the optical sensing chip is determined based on the relationship between the geometric blurring during the transmission of the optical signal and the diffraction blurring generated by the optical signal passing through the multi-color channel mask.

6. The image information processing apparatus according to claim 5, characterized in that, The geometric fuzz = The diffraction blur = ; The spatial dimensions of the individual pixels on the optical sensing chip ; Where λ is the wavelength of light.

7. An image information processing method applied to the image information processing apparatus according to any one of claims 1 to 6, characterized by, The method includes: The light signal of the received target color image is filtered by the filter film of the multi-color channel mask to obtain multiple filtered single-color channel information. The multi-color channel mask includes a transparent substrate and the filter film disposed on the transparent substrate. The filter film is provided with a preset pattern. The preset pattern is obtained by reverse optimization design based on the color information of the target color image and multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network. The single-color channel convolutional kernels are convolutional kernels obtained by optimizing the single-color channel convolutional kernels in the convolutional layer of the target convolutional neural network structure according to the target visual task. The optical sensing chip receives the filtered single-color channel information and converts the filtered single-color channel information into electrical signals. The processor receives the electrical signal transmitted from the optical sensing chip and processes the electrical signal using the fully connected layer of the target convolutional neural network to obtain the processing result of the target visual task of the target color image. The preset pattern is obtained by reverse optimization design based on the color information of the target color image and multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network, including: Based on the target visual task, determine the target convolutional neural network structure model; The floating-point convolutional kernel in the first convolutional layer of the target convolutional neural network is adjusted to a binary convolutional kernel to obtain the adjusted first convolutional layer; or, a binary convolutional layer corresponding to the first convolutional layer is added on the first convolutional layer, and the binary convolutional layer is determined as the adjusted first convolutional layer. According to the preset loss function, the loss value between the prediction result obtained by the target convolutional neural network based on the target color image and the actual result is calculated, and the multiple different single-color channel convolutional kernels on the convolutional network layer of the target convolutional neural network are reverse optimized based on the loss value. Obtain multiple different single-color channel convolutional kernels after reverse optimization; wherein, the multiple different single-color channel convolutional kernels are the convolutional kernels in the first convolutional layer of the convolutional network layer; Based on the values ​​of multiple different single-color channel convolution kernels at different pixel positions, the color channel transmission of multiple different single-color channel convolution kernels at different pixel positions is determined; The preset pattern is determined based on the patterns of multiple different single-color channel convolution kernels in the first convolutional layer.

8. The image information processing method according to claim 7, characterized by, Also includes: The light generating component of the image information processing device generates an optical signal carrying the target color image, wherein the optical signal is an incoherent optical signal.

9. The image information processing method according to claim 8, characterized by, The light generating component comprises multiple point light sources, and the light signal is a signal generated by the multiple point light sources; or The light generating component is a display, and the light signal comprises a multi-pixel image signal generated by the display.

10. The image information processing method according to claim 9, characterized by, The distance between two adjacent point sources or two adjacent pixels is calculated by the following formula: in, The distance between two adjacent point light sources or two adjacent pixels. The distance between the light-generating component and the multi-color channel mask. The distance between the multi-color channel mask and the optical sensor chip. The spatial size of a single pixel on the optical sensing chip; a single pixel on the optical sensing chip is equivalent to a single pixel generated after the convolution calculation of the convolutional network layer.