Image processing system
By using voice and image recognition technology, 2D movies are converted into 3D images, solving the problems of insufficient visual impact and difficulty in seeing details when watching 2D movies, thus achieving a richer viewing experience and better acquisition of image details.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN TCL HIGH TECH DEVELOPMENT CO LTD
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-12
Smart Images

Figure CN122199569A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and more specifically to an image processing system. Background Technology
[0002] Currently, users struggle to obtain a rich and diverse viewing experience when watching regular 2D films. For example, users can only observe specific details in 2D films through rewinding and pausing, which also results in insufficient visual impact. Summary of the Invention
[0003] In a first aspect, this application provides an image processing method, the method comprising:
[0004] Obtain the image information to be processed;
[0005] The image information to be processed is subjected to image segmentation processing to obtain the first image information;
[0006] The first image information is processed to generate the target image information.
[0007] Secondly, this application also provides an image processing system, the system comprising:
[0008] The acquisition module is used to acquire information about the image to be processed.
[0009] The processing module is used to perform image segmentation processing on the image information to be processed to obtain first image information;
[0010] The processing module is also used to perform image generation processing on the first image information to obtain target image information.
[0011] Thirdly, this application also provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and executable on the processor. The processor executes the computer program to implement the steps in any of the image processing methods.
[0012] Fourthly, this application also provides a computer-readable storage medium storing a computer program, which is executed by a processor to implement the steps in any of the image processing methods. Attached Figure Description
[0013] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0014] Figure 1 This is a scene diagram of the image processing system provided in the embodiments of this application;
[0015] Figure 2 This is a schematic flowchart of one embodiment of the image processing method in this application;
[0016] Figure 3 This is a schematic diagram of a model architecture of an embodiment of the image processing method in this application;
[0017] Figure 4 This is a schematic diagram of a model architecture of an embodiment of the image processing method in this application;
[0018] Figure 5 This is a schematic diagram of a model architecture of an embodiment of the image processing method in this application;
[0019] Figure 6 This is a schematic diagram of a functional module of the image processing system in an embodiment of this application;
[0020] Figure 7 This is a schematic diagram of the structure of the terminal device in the embodiments of this application. Detailed Implementation
[0021] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0022] In the description of this application, it should be understood that the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this application, "multiple" means two or more, unless otherwise explicitly specified.
[0023] In this application, the term "exemplary" is used to mean "used as an example, illustration, or description." Any embodiment described as "exemplary" in this application is not necessarily to be construed as being more preferred or advantageous than other embodiments. Furthermore, it is understood that in the specific embodiments of this application, user information, user data, and other related data are involved. When the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use, and processing of related data must comply with the relevant laws, regulations, and standards of the relevant countries and regions.
[0024] To enable any person skilled in the art to implement and use this application, the following description is provided. In this description, details are set forth for purposes of explanation. It should be understood that those skilled in the art will recognize that this application can be implemented without using these specific details. In other instances, well-known structures and processes will not be described in detail to avoid obscuring the description of this application with unnecessary detail. Therefore, this application is not intended to be limited to the embodiments shown, but is consistent with the broadest scope of the principles and features disclosed in this application.
[0025] This application provides an image processing method, system, device, and storage medium, which are described in detail below.
[0026] Please see Figure 1 , Figure 1 This is a schematic diagram of a scene for an image processing system provided in an embodiment of this application. The image processing system may include a terminal device 100 and a storage device 200, and the storage device 200 may transmit data to the terminal device 100. Figure 1 The terminal device 100 can obtain the image information stored in the storage device 200 to execute the image processing method in this application.
[0027] In this embodiment of the application, the terminal device 100 may include, but is not limited to, desktop computers, portable computers, network servers, PDAs (personal digital assistants), tablet computers, wireless terminal devices, embedded devices, etc.
[0028] In the embodiments of this application, the terminal device 100 and the storage device 200 can communicate through any communication method, including but not limited to mobile communication based on the 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), and Worldwide Interoperability for Microwave Access (WiMAX), or computer network communication based on the TCP / IP Protocol Suite (TCP / IP) and User Datagram Protocol (UDP). It should be noted that the storage device 200 can also be built into the terminal device 100; specific embodiments of this application are not limited in this regard.
[0029] It should be noted that, Figure 1The schematic diagram of the image processing system shown is merely an example. The image processing system and scenario described in this application are for the purpose of more clearly illustrating the technical solutions of this application and do not constitute a limitation on the technical solutions provided in this application. As those skilled in the art will know, with the evolution of image processing systems and the emergence of new business scenarios, the technical solutions provided in this application are also applicable to similar technical problems.
[0030] like Figure 2 As shown, Figure 2 This is a schematic flowchart of an embodiment of the image processing method in this application. The image processing method may include the following steps 201 to 203:
[0031] 201. Obtain the image information to be processed.
[0032] In this embodiment, the image information to be processed can be any type of image data or a video frame from a video. Specific embodiments of this application are not limited.
[0033] 202. Perform image segmentation on the image information to be processed to obtain the first image information.
[0034] In existing technologies, users typically only view 2D images when watching videos or browsing images. Therefore, the amount of detailed information available to users is limited. Based on this, embodiments of this application provide a target processing scheme for converting 2D images or video frames into 3D images. For example, a user instructs that a specific object in an image needs to be converted into a 3D image. In this case, the user's specific conversion target can be identified based on their control commands, such as voice commands.
[0035] Specifically, suppose a user issues a voice command to convert a cup in an image or video frame into a 3D image. In this case, voice recognition can identify the cup in the user's command. Simultaneously, image recognition can identify the cup within the image to be processed and convert it into a 3D image. After identifying the cup in the image, the corresponding image region can be segmented to obtain a segmented image of the cup.
[0036] It should be noted that, in this embodiment of the application, the 3D model conversion of the cup can utilize any 3D model conversion technology, and this embodiment of the application is not limited thereto. Furthermore, during 3D conversion, reconstruction can be performed solely based on the feature information of the target object in the current image information to be processed.
[0037] Therefore, the image processing solution provided in this application can transform 2D images into 3D images with richer details, so that users can obtain richer image details.
[0038] As described above, the embodiments of this application can obtain target image information through corresponding speech recognition, image recognition, and 3D model conversion. Based on this, the three methods can be integrated into a single model, which is then installed on a device capable of playing video and images. Therefore, to better implement the embodiments of this application, in one embodiment, the method is implemented through a first processing model; the first processing model includes an image segmentation model and an image processing model. The image segmentation model may include a speech recognition module and an image recognition module. The speech recognition module is used to recognize the 3D target object specified by the user, and the image recognition module is used to perform image segmentation on the image information to be processed based on the determined specific 3D target object, thereby recognizing the target object. At this point, the segmentation process is complete.
[0039] 203. Perform image generation processing on the first image information to obtain the target image information.
[0040] As described above, after segmenting the object to be transformed, a 3D transformation is required. Therefore, the generation process in this embodiment involves transforming the segmented target, specifically:
[0041] The image recognition module also connects to the internet and identifies images matching the target object from the internet. Finally, the image processing model determines all the features of the target object from each image containing information about the target object, and then performs a 3D conversion.
[0042] Based on this, the above embodiments provide a scheme for 3D transformation through a model. In order to further improve the effect of 3D transformation, this application embodiment also provides an image segmentation model for processing image segmentation. The input end of the image segmentation model can be understood as receiving image information to be processed; the output end of the image segmentation model is connected to the input end of the image processing model; the output end of the image processing model is used to output target image information; the image segmentation model is configured to perform image segmentation processing on the image information to be processed to determine the target processing object of the image information to be processed; the image processing model is configured to perform image processing on the target processing object to obtain target image information.
[0043] To better implement the embodiments of this application, in one embodiment, the image segmentation model includes a first encoding module, a second encoding module, and a first decoding module; performing image segmentation processing on the image information to be processed, and determining the target processing object of the image information to be processed, includes:
[0044] The system acquires image segmentation instruction information; a first encoding module is configured to encode the image segmentation instruction information to obtain instruction encoding feature information; a second encoding module is configured to encode the image information to be processed to obtain first image feature encoding information; and a first decoding module is configured to determine the target processing object of the image information to be processed based on the instruction encoding feature information and the first image feature encoding information.
[0045] The above embodiments provide an image segmentation model that includes image recognition functionality. However, the specific method of image recognition is not provided in these embodiments. In other words, any recognition method can be used for image recognition in these embodiments. Nevertheless, to ensure the accuracy of image recognition, this application provides an image recognition scheme. Specifically, it includes the structure and method of an image recognition module.
[0046] For example, in this embodiment of the application, the image recognition function in the image segmentation model may include a first encoding module, which may include a CLIP text encoder. This encoding module can encode the acquired speech data using user text prompts to obtain instruction encoding feature information. Furthermore, the second encoding module may be a pre-trained Vision Transformer. When image information to be processed is acquired, the second encoding module can encode the image to obtain first image feature encoding information. In a specific application scenario, assuming the user is playing a video, when the user pauses the video, the user can specify the currently paused video frame as the image information to be processed. Finally, the first decoding module can decode based on the instruction encoding feature information and the first image feature encoding information to obtain the specific target object referred to by the user's speech. Specifically, the module architecture of the image segmentation model can be as follows: Figure 3 As shown.
[0047] For example, after receiving a user's command such as "I want to see a 3D model of Sun Wukong's golden cudgel" using a voice assistant, the object of the golden cudgel can be identified in the current video frame.
[0048] To better implement the embodiments of this application, in one embodiment, the target processing object of the image information to be processed is determined based on the instruction encoding feature information and the first image feature encoding information, including:
[0049] Based on the instruction encoding feature information and the first image feature encoding information, the confidence information of each image region in the image information to be processed is determined; based on the confidence information corresponding to each image region, the target object in the image region corresponding to the target confidence information in each confidence information is determined as the target processing object.
[0050] Specifically, when the first decoding module decodes the image based on the instruction encoding feature information and the first image feature encoding information, it can output the confidence score of the image region corresponding to each object in the image information to be processed. After obtaining the confidence score of the image region corresponding to each object, the object in the image region with the highest confidence score is then determined as the target processing object.
[0051] To better implement the embodiments of this application, in one embodiment, image generation processing is performed on the first image information to obtain the target image information, which is based on an image processing model. The image processing model includes a target feature extraction module, an image reconstruction module, and an image rendering module, as detailed below. Figure 4 As shown.
[0052] The target feature extraction module has an input terminal for acquiring first image information; its output terminal is connected to the input terminal of the image reconstruction module; its input terminal is connected to the output terminal of the image reconstruction module; and its output terminal is used to output target image information. The target feature extraction module is configured to extract multiple target features from the target object to obtain multiple target feature information. The image reconstruction module is configured to perform target processing on each target feature information to obtain target reconstruction feature information. The image rendering module is configured to perform rendering processing on the target reconstruction feature information to obtain target image information.
[0053] The above embodiments provide a scheme for converting a 2D target object into a 3D image. Specifically, these embodiments provide a scheme that processes only the features of the target object in the image information to be processed. Then, the segmented image of the cup can be used to search for various cup images similar to the cup on the publicly available internet, thereby obtaining more image information about the cup from various perspectives. Finally, based on the cup information from each different perspective, the cup in the image information to be processed is subjected to 3D conversion processing to obtain a 3D target image including the cup. In this way, multiple target feature extractions can be performed on the target object to obtain multiple target feature information.
[0054] In this process, determining whether an object similar to the target object exists in an image from the internet can be done using any method for determining similarity; this application embodiment does not impose any limitations. Subsequently, the 3D reconstruction can be performed in the same manner as in the above embodiment, employing any 3D reconstruction scheme. Similarly, rendering the reconstructed 3D image can also employ any rendering scheme; this application embodiment does not impose any limitations.
[0055] To better implement the embodiments of this application, in one embodiment of this application, target feature extraction is performed multiple times on the target processing object to obtain multiple target feature information, including:
[0056] Acquire candidate image information to be processed; based on the target processing object feature information corresponding to the target processing object in the image information to be processed, detect the candidate image information to be processed to determine whether the candidate image information to be processed includes the target processing object; if the candidate image information to be processed includes the target processing object, extract the target features for each candidate image information to be processed that includes the target processing object to obtain multiple target feature information.
[0057] The above embodiments provide a scheme for acquiring images similar to the target processing object via the Internet, thereby enabling multiple feature extractions for images of different target processing objects. However, it should be noted that there may be situations where the network is unavailable, or where images searching for similar target object information on the network are incorrectly found, which would prevent the improvement of 3D modeling accuracy. Therefore, to further improve the quality of 3D modeling, in this embodiment, if the image information to be processed is a video frame played by a user, the target processing object can be searched for in other video frames played by the user. The method for searching the target processing object can also be based on similarity, thereby determining whether there are objects similar to the target processing object in other video frames. Similarly, in this embodiment, any type of similarity determination method can be used to determine whether other video frames include the target processing object, or image recognition methods can be used to determine whether the target processing object exists; this embodiment does not limit the scope of the search. For example, cosine similarity can be used to compare whether other video frames contain images of the target object; or, Euclidean distance can be used to calculate the vector distance between other video frames and the target object image, and the distance can be compared with a distance threshold. If the distance is less than the distance threshold, it can be determined that other video frames contain the target object.
[0058] If the target processing object exists in other video frames, then the feature extraction of the target processing object can be performed on the video frames containing the target processing object.
[0059] To better implement the embodiments of this application, in one embodiment of this application, whether the candidate image information to be processed includes the target processing object is obtained based on the target processing model, which includes an image feature extraction module, a third encoding module, and a second decoding module;
[0060] The image feature extraction module is configured to extract features from candidate image information to obtain target image feature information; the third encoding module is configured to encode the target image feature information to obtain second image feature encoding information; the second decoding module is configured to perform decoding processing based on the second image feature encoding information and the target processing object feature information to determine whether the candidate image information to be processed includes the target processing object.
[0061] The above embodiments provide a scheme for determining whether other video frames contain a target processing object using similarity detection or image recognition. However, to improve the accuracy of target processing object recognition, this application also provides a scheme. Specifically, this application provides a processing model, wherein the image feature extraction module can be a DINOv2 ViT model. After feature extraction using DINOv2 ViT, the extracted image features can be input to a third encoding module for encoding enhancement, which can be a Transformer encoder. Finally, since the image region containing the target processing object has already been determined during image recognition of the image information to be processed, the target processing object feature information corresponding to the image information to be processed containing the target processing object can be synchronously input to a second decoding module for decoding, thereby determining whether the candidate image information to be processed contains the target processing object.
[0062] It should be noted that if the candidate image information to be processed includes the target object, the image information to be processed can be input into the image segmentation model in the first processing model for image segmentation. Then, feature extraction is performed on the segmented image to obtain the target feature information of the target object.
[0063] The connection method between the target processing model and the first processing model can be as follows: Figure 5 As shown. It should be noted that, Figure 5In the diagram, arrow 1 represents the process of the first processing model segmenting the image information to be processed. Arrow 4 represents the process where, if the target processing object is not present in the candidate image information or if no candidate image information exists, the model can directly perform 3D reconstruction based on the image information to be processed. Arrow 3 represents the process where, if candidate image information exists, the first decoding module can input the features of the target processing object in the image information to the second decoding module, so that the second decoding module can determine whether the candidate image information includes the target processing object. If it exists, the second decoding module can input the candidate image information including the target processing object back into the first processing model, and repeat the process according to arrow 2. Then, the first processing model can perform 3D reconstruction based on the image information to be processed and the features of the target processing object in the candidate image information.
[0064] To better implement the embodiments of this application, in one embodiment, target feature extraction is performed on each candidate image information to be processed that includes the target processing object, resulting in multiple target feature information, including:
[0065] For each candidate image information containing the target object, target features are extracted to obtain multiple first candidate feature information; second candidate feature information is determined from each first candidate feature information to obtain multiple target feature information.
[0066] The above embodiments provide a scheme for 3D reconstruction based on multiple target feature information. In this embodiment, to further improve the accuracy of the 3D reconstruction, the multiple target feature information can be filtered.
[0067] Specifically, after using DINOv2 ViT as a feature extractor, 2D image features are extracted from all viewpoints. Next, a Multi-view Encoder is used to receive features from all views; these features can then be considered as first-line candidate feature information. Further improvements can be made by capturing cross-view correlations. There are N Multi-view Encoder blocks in total, each consisting of a Non-canonical View Update (NVU) layer and a Global Consensus Reasoning (GCR) layer. The NVU layer updates other features by aggregating canonical reference view features. The GCR layer uses the correlation between all views to obtain the feature with the highest similarity. This feature with the highest similarity is the multiple target feature information obtained after filtering. Therefore, it can be understood that the feature with the highest similarity in this embodiment is the second candidate feature information in this embodiment. The specific steps are shown in the following formulas (1) and (2):
[0068]
[0069] In this formula (2), the output is multiple target feature information, while k in formulas (1) and (2) can represent different video frames.
[0070] To better implement the embodiments of this application, in one embodiment, target processing is performed on each target feature information to obtain target reconstruction feature information, including:
[0071] Encode the feature information of each target to obtain multiple first reconstructed feature information;
[0072] Weighting is applied to each first reconstructed feature and each target feature to obtain the target reconstructed feature.
[0073] The above embodiments provide a scheme for reconstruction using arbitrary 3D reconstruction methods. However, in order to further improve the accuracy of the reconstruction, the embodiments of this application provide additional reconstruction schemes.
[0074] Specifically, a learnable neural volume is introduced to encode geometric and texture priors and serve as the initial 3D representation, thereby obtaining the first reconstructed feature information V∈R. H×W×D×c For each view, the model maps 2D information to 3D by querying multi-view features and updates the neural volume. Specifically, using n m Each TransformerDecoder block updates and integrates features, and each block contains a cross-attention layer and a self-attention layer. In the cross-attention layer, a 3D latent neural volume is used as the query, and the output of equation (2) is F, which is then used as the key / value pair. The updated 3D latent neural volume... It can be represented as Should This is the target reconstructed feature information.
[0075] To better implement the embodiments of this application, in one embodiment, the target reconstruction feature information is rendered to obtain target image information, including:
[0076] Based on the target reconstruction feature information, the density feature information is determined; the target pose information to be rendered is determined; and the target image information is obtained based on the target pose information and density feature information.
[0077] This application also provides a rendering scheme. Specifically, it uses an updated 3D latentneural volume. Predicting the voxel-based radiance field R = (R σ ,R f ), where R σ and R f They respectively utilize 3D convolutional layers from The density features of the predicted radiance field are used, so R is the density feature information in the embodiments of this application.
[0078] Then, volumetric rendering techniques can be used to generate the rendered image A and the object mask. Specifically, after rendering a feature map, a two-dimensional convolution is used to predict the rendered image, i.e. Here, Π represents the volume rendering process. This is the target pose information. Then, the rendered target image information can be obtained. It should be noted that, in this embodiment, the target pose information refers to the posture information of the 3D reconstructed object under the corresponding instruction. For example, according to a user instruction, if it is necessary to view the 3D image information of the target object at a specific position and angle, then that specific position and angle are the target pose information of the reconstructed object.
[0079] To better implement the embodiments of this application, in one embodiment, after performing image generation processing on the first image information to obtain the target image information, the method further includes:
[0080] Obtain target instruction information; adjust the angle of the target image information according to the target instruction information to obtain the adjusted target image information.
[0081] In addition, this application also provides a scene control scheme for user interaction with 3D models.
[0082] Specifically, users can use AR glasses to overlay the generated 3D model onto their field of vision. The displayed 3D model can automatically adjust itself by triggering user interaction commands such as voice, touch, and gestures, enabling operations such as rotation and scaling, thereby adjusting the angle. Therefore, the angle adjustment in this embodiment can refer to the angle information from which the user views the target image information.
[0083] It should also be noted that, in this embodiment of the application, a training scheme for the model is also provided.
[0084] Specifically, the above model is trained using a photometric loss function to compare the rendered result with the input without any 3D information supervision. First, a loss function L is defined for RGB images. A For details, please refer to formula (3):
[0085]
[0086] Among them, L mse Let A be the mean squared error loss function. i and The original and rendered images are shown below, λ p For hyperparameters, L p The perceptual loss function is defined. Next, a loss function L on the density mask is defined. M For details, please refer to formula (4):
[0087]
[0088] in, and σ i These are the original and rendered masks, respectively. Therefore, the final overall loss function can be referenced from formula (5).
[0089] L = L A +λ m L M ……(5)
[0090] Using the trained 3D reconstruction model, a sparse multi-view image without camera pose can be input to generate a corresponding 3D model of the target object.
[0091] To better implement the image processing method in the embodiments of this application, an image processing system is also provided in the embodiments of this application, such as... Figure 6 As shown, system 300 includes:
[0092] The acquisition module 301 is used to acquire image information to be processed;
[0093] Processing module 302 is used to perform image segmentation processing on the image information to be processed to obtain first image information;
[0094] The processing module 302 is also used to perform image generation processing on the first image information to obtain the target image information.
[0095] In this embodiment, the image information to be processed can be acquired by the acquisition module 301. Then, the target object in the image information to be processed is reconstructed in 3D by the processing module 302 to obtain target image information with richer details.
[0096] Optionally, the image segmentation process is performed on the image information to be processed to obtain the first image information, which is achieved by an image segmentation model; the image segmentation model includes a first encoding module, a second encoding module, and a first decoding module;
[0097] The input terminal of the first encoding module is used to obtain image segmentation instruction information;
[0098] The input of the second encoding module is used to acquire the image information to be processed;
[0099] The output terminals of the first encoding module and the second encoding module are respectively connected to the input terminal of the first decoding module;
[0100] The first decoding module outputs the first image information at its output terminal.
[0101] The first encoding module is configured to encode the image segmentation instruction information to obtain instruction-encoded feature information;
[0102] The second encoding module is configured to encode the image information to be processed to obtain the first image feature encoding information;
[0103] The first decoding module is configured to determine the target processing object of the image information to be processed based on the instruction encoding feature information and the first image feature encoding information.
[0104] Optionally, the processing module 302 determines the target processing object of the image information to be processed based on the instruction encoding feature information and the first image feature encoding information, including:
[0105] Based on the instruction encoding feature information and the first image feature encoding information, the confidence information of each image region in the image information to be processed is determined;
[0106] Based on the confidence information corresponding to each image region, the target object in the image region corresponding to the target confidence information in each confidence information is determined as the target processing object.
[0107] Optionally, the first image information is processed to obtain the target image information based on an image processing model, which includes a target feature extraction module, an image reconstruction module, and an image rendering module.
[0108] The input of the target feature extraction module is used to obtain the first image information;
[0109] The output of the target feature extraction module is connected to the input of the image reconstruction module;
[0110] The input of the image rendering module is connected to the output of the image reconstruction module;
[0111] The output of the image rendering module is used to output target image information;
[0112] The target feature extraction module is configured to perform multiple target feature extractions on the target processing object to obtain multiple target feature information;
[0113] The image reconstruction module is configured to perform target processing on the feature information of each target to obtain the target reconstruction feature information;
[0114] The image rendering module is configured to render the target reconstructed feature information to obtain the target image information.
[0115] Optionally, the processing module 302 performs multiple target feature extractions on the target object to obtain multiple target feature information, including:
[0116] Obtain candidate image information;
[0117] Based on the feature information of the target processing object corresponding to the target processing object of the image information to be processed, the candidate image information to be processed is detected to determine the candidate image information to be processed that includes the target processing object;
[0118] For each candidate image to be processed that includes the target object, target features are extracted to obtain multiple target feature information.
[0119] Optionally, whether the candidate image information to be processed includes the target processing object is obtained based on the target processing model, which includes an image feature extraction module, a third encoding module, and a second decoding module.
[0120] The input terminal of the image feature extraction module is used to receive candidate image information to be processed;
[0121] The output of the image feature extraction module is connected to the input of the third encoding module;
[0122] The output of the third encoding module is connected to the input of the second decoding module;
[0123] The output of the second decoding module is used to output the target result information;
[0124] The image feature extraction module is configured to extract features from candidate image information to obtain target image feature information;
[0125] The third encoding module is configured to encode the target image feature information to obtain the second image feature encoding information;
[0126] The second decoding module is configured to perform decoding processing based on the second image feature encoding information and the target processing object feature information to obtain the target result information.
[0127] Optionally, the processing module 302 performs target feature extraction on each candidate image information including the target processing object to obtain multiple target feature information, including:
[0128] Target features are extracted from each candidate image to be processed that includes the target object, resulting in multiple first candidate feature information.
[0129] Feature extraction processing is performed on the second candidate feature information in each first candidate feature information to obtain multiple target feature information.
[0130] Optionally, the processing module 302 performs target processing on each target feature information to obtain target reconstruction feature information, including:
[0131] Encode the feature information of each target to obtain multiple first reconstructed feature information;
[0132] Weighting is applied to each first reconstructed feature and each target feature to obtain the target reconstructed feature.
[0133] Optionally, the processing module 302 performs rendering processing on the target reconstruction feature information to obtain target image information, including:
[0134] Based on the target reconstructed feature information, the density feature information is determined;
[0135] Determine the pose information of the target to be rendered;
[0136] The target pose information and density feature information are rendered to obtain the target image information.
[0137] This application also provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and executable on the processor. The processor executes the computer program to implement the steps of any of the image processing methods in this application. This terminal device integrates any of the image processing methods provided in this application, such as... Figure 7 As shown, it illustrates a structural schematic diagram of the terminal device involved in the embodiments of this application. Specifically:
[0138] The terminal device may include components such as a processor 401 with one or more processing cores, a memory 402 with one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will understand that... Figure 7 The terminal device structure shown does not constitute a limitation on the terminal device and may include more or fewer components than shown, or combine certain components, or have different component arrangements. Wherein:
[0139] The processor 401 is the control center of the terminal device. It connects various parts of the terminal device via various interfaces and lines, and performs various functions and processes data by running or executing software programs and / or modules stored in the memory 402, and by calling data stored in the memory 402, thereby providing overall monitoring of the terminal device. Optionally, the processor 401 may include one or more processing cores; the processor 401 may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor. Preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and application programs, and the modem processor mainly handles wireless communication. It is understood that the aforementioned modem processor may not be integrated into the processor 401.
[0140] The memory 402 can be used to store software programs and modules. The processor 401 executes various functional applications and image processing by running the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, at least one application program required for a function (such as sound playback function, image playback function, etc.), etc.; the data storage area may store data created according to the use of the terminal device, etc. In addition, the memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.
[0141] The terminal device also includes a power supply 403 that supplies power to the various components. Preferably, the power supply 403 can be logically connected to the processor 401 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system. The power supply 403 may also include one or more DC or AC power supplies, recharging systems, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.
[0142] The terminal device may also include an input unit 404, which can be used to receive input digital or character information, and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
[0143] Although not shown, the terminal device may also include a display unit, etc., which will not be described in detail here. Specifically, in this embodiment, the processor 401 in the terminal device loads the executable files corresponding to the processes of one or more applications into the memory 402 according to the following instructions, and the processor 401 runs the applications stored in the memory 402 to realize various functions, such as:
[0144] Obtain the image information to be processed;
[0145] The image information to be processed is segmented to obtain the first image information;
[0146] The first image information is processed to generate the target image information.
[0147] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be performed by instructions, or by instructions controlling related hardware. These instructions can be stored in a computer-readable storage medium and loaded and executed by a processor.
[0148] Therefore, embodiments of this application provide a computer-readable storage medium, which may include: read-only memory (ROM), random access memory (RAM), a magnetic disk, or an optical disk, etc. A computer program is stored thereon, and the computer program is loaded by a processor to execute the steps in any of the image processing methods provided in embodiments of this application. For example, the computer program loaded by the processor can execute the following steps:
[0149] Obtain the image information to be processed;
[0150] The image information to be processed is segmented to obtain the first image information;
[0151] The first image information is processed to generate the target image information.
[0152] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the detailed descriptions of other embodiments above, which will not be repeated here.
[0153] In practice, each of the above units or structures can be implemented as an independent entity or can be arbitrarily combined to be implemented as the same or several entities. For the specific implementation of each of the above units or structures, please refer to the previous method embodiments, which will not be repeated here.
[0154] For details on the implementation of each of the above operations, please refer to the previous examples, which will not be repeated here.
[0155] The above provides a detailed description of an image processing method and system provided by the embodiments of this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. A method, characterized in that, The method includes: Obtain the image information to be processed; The image information to be processed is subjected to image segmentation processing to obtain the first image information; The first image information is processed to generate the target image information.
2. The method according to claim 1, characterized in that, The image segmentation process performed on the image information to be processed to obtain the first image information is achieved through an image segmentation model; the image segmentation model includes a first encoding module, a second encoding module, and a first decoding module. The input terminal of the first encoding module is used to obtain image segmentation instruction information; The input terminal of the second encoding module is used to acquire the image information to be processed; The output terminals of the first encoding module and the second encoding module are respectively connected to the input terminal of the first decoding module; The first image information is output from the output terminal of the first decoding module; The first encoding module is configured to encode the image segmentation instruction information to obtain instruction encoding feature information; The second encoding module is configured to encode the image information to be processed to obtain first image feature encoding information; The first decoding module is configured to determine the target processing object of the image information to be processed based on the instruction encoding feature information and the first image feature encoding information.
3. The method according to claim 2, characterized in that, The step of determining the target processing object of the image information to be processed based on the instruction encoding feature information and the first image feature encoding information includes: Based on the instruction encoding feature information and the first image feature encoding information, the confidence information of each image region in the image information to be processed is determined; Based on the confidence information corresponding to each of the image regions, the target object in the image region corresponding to the target confidence information in each of the confidence information is determined as the target processing object.
4. The method according to claim 1, characterized in that, The image generation process performed on the first image information to obtain the target image information is based on an image processing model, which includes a target feature extraction module, an image reconstruction module, and an image rendering module. The input terminal of the target feature extraction module is used to obtain the first image information; The output of the target feature extraction module is connected to the input of the image reconstruction module; The input end of the image rendering module is connected to the output end of the image reconstruction module; The output terminal of the image rendering module is used to output the target image information; The target feature extraction module is configured to perform multiple target feature extractions on the target processing object to obtain multiple target feature information; The image reconstruction module is configured to perform target processing on each of the target feature information to obtain target reconstruction feature information; The image rendering module is configured to render the target reconstructed feature information to obtain target image information.
5. The method according to claim 4, characterized in that, The process of extracting target features from the target object multiple times yields multiple target feature information, including: Obtain candidate image information; Based on the target processing object feature information corresponding to the target processing object of the image information to be processed, the candidate image information to be processed is detected to determine the candidate image information to be processed that includes the target processing object; Target features are extracted from each candidate image information that includes the target processing object to obtain multiple target feature information.
6. The method according to claim 5, characterized in that, Whether the candidate image information to be processed includes the target processing object is obtained based on the target processing model, which includes an image feature extraction module, a third encoding module, and a second decoding module. The input terminal of the image feature extraction module is used to receive the candidate image information to be processed; The output of the image feature extraction module is connected to the input of the third encoding module; The output of the third encoding module is connected to the input of the second decoding module; The output of the second decoding module is used to output the target result information; The image feature extraction module is configured to extract features from the candidate image information to obtain target image feature information; The third encoding module is configured to encode the target image feature information to obtain second image feature encoding information; The second decoding module is configured to perform decoding processing based on the second image feature encoding information and the target processing object feature information to obtain the target result information.
7. The method according to claim 5, characterized in that, The step of extracting target features from each candidate image information including the target processing object yields multiple target feature information, including: Target features are extracted from each candidate image information that includes the target processing object to obtain multiple first candidate feature information; Feature extraction processing is performed on the second candidate feature information in each of the first candidate feature information to obtain multiple target feature information.
8. The method according to claim 4, characterized in that, The step of processing the target feature information to obtain target reconstruction feature information includes: The target feature information is encoded to obtain multiple first reconstructed feature information; Weighting is applied to each of the first reconstructed feature information and each of the target feature information to obtain the target reconstructed feature information.
9. The method according to claim 4, characterized in that, The rendering process of the reconstructed feature information of the target to obtain target image information includes: Based on the target reconstructed feature information, the density feature information is determined; Determine the pose information of the target to be rendered; The target pose information and the density feature information are rendered to obtain target image information.
10. A system, characterized in that, The system includes: The acquisition module is used to acquire information about the image to be processed. The processing module is used to perform image segmentation processing on the image information to be processed to obtain first image information; The processing module is also used to perform image generation processing on the first image information to obtain target image information; Optionally, the image segmentation processing of the image information to be processed to obtain the first image information is implemented by an image segmentation model; the image segmentation model includes a first encoding module, a second encoding module, and a first decoding module; The input terminal of the first encoding module is used to obtain image segmentation instruction information; The input terminal of the second encoding module is used to acquire the image information to be processed; The output terminals of the first encoding module and the second encoding module are respectively connected to the input terminal of the first decoding module; The first image information is output from the output terminal of the first decoding module; The first encoding module is configured to encode the image segmentation instruction information to obtain instruction encoding feature information; The second encoding module is configured to encode the image information to be processed to obtain first image feature encoding information; The first decoding module is configured to determine the target processing object of the image information to be processed based on the instruction encoding feature information and the first image feature encoding information; Optionally, the processing module determines the target processing object of the image information to be processed based on the instruction encoding feature information and the first image feature encoding information, including: Based on the instruction encoding feature information and the first image feature encoding information, the confidence information of each image region in the image information to be processed is determined; Based on the confidence information corresponding to each of the image regions, the target object in the image region corresponding to the target confidence information in each of the confidence information is determined as the target processing object; Optionally, the image generation process performed on the first image information to obtain the target image information is based on an image processing model, which includes a target feature extraction module, an image reconstruction module, and an image rendering module. The input terminal of the target feature extraction module is used to obtain the first image information; The output of the target feature extraction module is connected to the input of the image reconstruction module; The input end of the image rendering module is connected to the output end of the image reconstruction module; The output terminal of the image rendering module is used to output the target image information; The target feature extraction module is configured to perform multiple target feature extractions on the target processing object to obtain multiple target feature information; The image reconstruction module is configured to perform target processing on each of the target feature information to obtain target reconstruction feature information; The image rendering module is configured to render the target reconstructed feature information to obtain target image information; Optionally, the processing module performs multiple target feature extractions on the target processing object to obtain multiple target feature information, including: Obtain candidate image information; Based on the target processing object feature information corresponding to the target processing object of the image information to be processed, the candidate image information to be processed is detected to determine the candidate image information to be processed that includes the target processing object; Target feature extraction is performed on each candidate image information that includes the target processing object to obtain multiple target feature information; Optionally, whether the candidate image information to be processed includes the target processing object is obtained based on the target processing model, which includes an image feature extraction module, a third encoding module, and a second decoding module. The input terminal of the image feature extraction module is used to receive the candidate image information to be processed; The output of the image feature extraction module is connected to the input of the third encoding module; The output of the third encoding module is connected to the input of the second decoding module; The output of the second decoding module is used to output the target result information; The image feature extraction module is configured to extract features from the candidate image information to obtain target image feature information; The third encoding module is configured to encode the target image feature information to obtain second image feature encoding information; The second decoding module is configured to perform decoding processing based on the second image feature encoding information and the target processing object feature information to obtain the target result information; Optionally, the processing module performs target feature extraction on each candidate image information including the target processing object to obtain multiple target feature information, including: Target features are extracted from each candidate image information that includes the target processing object to obtain multiple first candidate feature information; Feature extraction processing is performed on the second candidate feature information in each of the first candidate feature information to obtain multiple target feature information; Optionally, the processing module performs target processing on each of the target feature information to obtain target reconstruction feature information, including: The target feature information is encoded to obtain multiple first reconstructed feature information; Weighting is applied to each of the first reconstructed feature information and each of the target feature information to obtain the target reconstructed feature information; Optionally, the processing module performs rendering processing on the target reconstruction feature information to obtain target image information, including: Based on the target reconstructed feature information, the density feature information is determined; Determine the pose information of the target to be rendered; The target pose information and the density feature information are rendered to obtain target image information.
11. A terminal device, characterized in that, The terminal device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method according to any one of claims 1 to 9.
12. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that is executed by a processor to perform the steps of the method according to any one of claims 1 to 9.