Dynamic point cloud color attribute enhancement method and device for temporal consistency
By analyzing the temporal consistency of adjacent point cloud frames, and utilizing a three-step cube spatiotemporal search and a convolutional point cloud long short-term memory network, the problem of uneven color attributes of dynamic point clouds across multiple frames was solved, thus improving the video display effect.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
- Filing Date
- 2024-12-18
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies lack temporal consistency modeling in enhancing the color attributes of dynamic point clouds with compression distortion, resulting in color attributes jumping or having uneven transitions between multiple frames, which affects the video display effect.
By acquiring point cloud blocks from adjacent point cloud frames, and utilizing a three-step cube spatiotemporal search module, a single-frame feature extraction module, and a convolutional point cloud long short-term memory network, the temporal dependency characteristics between point cloud frames are analyzed, features are fused and adjusted, and the color attributes of the point cloud blocks are optimized.
It achieves a smooth transition of point cloud color attributes between multiple frames, improving the video display effect and quality.
Abstract
Description
Technical Field
[0001] This application relates to the field of computer information technology, and more specifically, to a method and apparatus for enhancing the color attributes of dynamic point clouds with time consistency. Background Technology
[0002] Existing technologies lack modeling for temporal consistency in enhancing the color attributes of dynamic point clouds with compression distortion. They typically only process single-frame (static) point clouds, ignoring the temporal correlation between frames. This can easily lead to color attributes jumping or having uneven transitions between multiple frames, resulting in poor video display quality. Summary of the Invention
[0003] The embodiments of this application provide a method and apparatus for enhancing the dynamic point cloud color attributes with temporal consistency. It can analyze point cloud data within a frame and analyze the temporal consistency of point cloud data in adjacent frames, thereby making the transition of color attributes between multiple frames smoother and improving the video display effect.
[0004] The technical solution is as follows:
[0005] In a first aspect, this application provides a method for enhancing the color attributes of dynamic point clouds with temporal consistency. The method includes: acquiring a first point cloud frame and a second point cloud frame that are temporally adjacent, and determining a first point cloud block within the first point cloud frame; based on the first point cloud block, determining a second point cloud block within the second point cloud frame corresponding to the first point cloud block; performing feature extraction based on the first and second point cloud blocks to determine a first high-dimensional feature and a second high-dimensional feature, wherein the feature extraction of the first and second point cloud blocks includes color feature extraction; inputting the first and second high-dimensional features into a convolutional point cloud long short-term memory network to analyze the temporal dependency characteristics between the first and second high-dimensional features, and obtaining output features; determining information to be fused based on the output features, and fusing the first point cloud block of the first point cloud frame with the information to be fused to obtain a processed first point cloud block, thereby determining the fused first point cloud frame.
[0006] Furthermore, determining the first point cloud block within the first point cloud frame includes: generating sampling points within the first point cloud frame based on the farthest point sampling algorithm, and generating the first point cloud block based on the k-nearest neighbor algorithm.
[0007] Furthermore, the second point cloud frame includes the point cloud frame preceding the first point cloud frame and the point cloud frames of the previous two frames; the step of determining the second point cloud block corresponding to the first point cloud block within the second point cloud frame based on the first point cloud block includes: determining the search starting point of the three-dimensional search space based on the first point cloud block, and determining the set step size corresponding to the search starting point; performing a search based on the search starting point and the set step size to determine candidate points, and constructing a three-dimensional search space composed of candidate points; performing similarity calculation based on the first point cloud block and the point cloud blocks corresponding to the candidate points in the three-dimensional search space to determine the second point cloud block corresponding to the first point cloud block within the second point cloud frame.
[0008] Furthermore, the step of determining the search starting point of the three-dimensional search space based on the first point cloud block and determining the set step size corresponding to the search starting point includes: taking the center point of the first point cloud block as the search starting point of the three-dimensional search space and taking the first step size as the set step size; taking the candidate point of the point cloud block that matches the first point cloud block in the three-dimensional search space as the starting point and taking the second step size as the set step size, wherein the length of the second step size is shorter than the length of the first step size.
[0009] Furthermore, the step of extracting features based on the first point cloud block and the second point cloud block to determine the first high-dimensional feature and the second high-dimensional feature includes: obtaining the distance information and normal vectors corresponding to the first point cloud block and the second point cloud block; and extracting features based on the first point cloud block, the second point cloud block, the distance information, and the normal vectors to determine the first high-dimensional feature and the second high-dimensional feature.
[0010] Furthermore, the step of determining the information to be fused based on the output features includes: processing the first high-dimensional features and output features of the first point cloud block based on the fully connected layer to determine the information to be fused.
[0011] Furthermore, the method also includes: comparing the first point cloud frame with the fused first point cloud frame to determine the spatial error; comparing the first point cloud frame with the second point cloud frame to determine the temporal consistency; and determining the adjustment amount corresponding to the convolutional point cloud long short-term memory network based on the spatial error and temporal consistency, so as to adjust the convolutional point cloud long short-term memory network.
[0012] Secondly, this application provides a time-consistent dynamic point cloud color attribute enhancement device, the device comprising: a point cloud frame acquisition module, used to acquire a first point cloud frame and a second point cloud frame that are adjacent in the time domain, and determine a first point cloud block within the first point cloud frame; a point cloud block matching module, used to determine a second point cloud block within the second point cloud frame corresponding to the first point cloud block based on the first point cloud block; a high-dimensional feature extraction module, used to perform feature extraction based on the first point cloud block and the second point cloud block, determine a first high-dimensional feature and a second high-dimensional feature, the feature extraction of the first point cloud block and the second point cloud block including color feature extraction; an output feature acquisition module, used to input the first high-dimensional feature and the second high-dimensional feature into a convolutional point cloud long short-term memory network to analyze the time dependency characteristics between the first high-dimensional feature and the second high-dimensional feature, and obtain output features; and a point cloud frame fusion module, used to determine the information to be fused based on the output features, and to fuse the first point cloud block of the first point cloud frame with the information to be fused to obtain the processed first point cloud block, thereby determining the fused first point cloud frame.
[0013] Thirdly, this application provides a network device, including: a memory, a transceiver, and a processor; wherein the memory is used to store a computer program; the transceiver is used to send and receive data under the control of the processor; and the processor is used to read the computer program in the memory and execute the method as described in the first aspect.
[0014] Fourthly, this application provides a storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described in the first aspect.
[0015] The beneficial effects of the technical solution provided in this application are:
[0016] The solution presented in this application can be applied to video processing scenarios. It can analyze point cloud data within a frame and analyze the temporal consistency of point cloud data between adjacent frames, thereby making the transition of color attributes smoother between multiple frames and improving video quality. Specifically, this solution can acquire a first point cloud frame to be processed and a second point cloud frame that is temporally related to the first point cloud frame. The second point cloud frame can contain multiple frames, and can be one or two frames before the first point cloud frame, or a frame after the first point cloud frame. This scheme divides a first point cloud frame into multiple first point cloud blocks. It then searches and analyzes the similarity between these blocks to identify related second point cloud blocks within a second point cloud frame. Single-frame feature extraction is performed on these first and second point cloud blocks to determine first and second high-dimensional features, including color feature extraction. These first and second high-dimensional features are then input into a convolutional point cloud long short-term memory network to analyze their temporal dependencies, yielding output features. Based on these output features, information to be fused is determined, and the first point cloud blocks from the first point cloud frame are fused with this information to obtain processed first point cloud blocks, thus defining the fused first point cloud frame. This scheme analyzes the temporal consistency between point cloud frames based on adjacent frames, thereby adjusting and optimizing the point cloud blocks within each frame, ultimately improving video quality. Attached Figure Description
[0017] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below.
[0018] Figure 1 This is a schematic diagram illustrating the steps of a method for enhancing the time-consistency dynamic point cloud color attributes according to an embodiment of this application;
[0019] Figure 2 This is a schematic diagram of the matching process between the first point cloud block and the second point cloud block according to an embodiment of this application;
[0020] Figure 3 This is a flowchart illustrating a method for enhancing the time-consistency dynamic point cloud color attributes according to an embodiment of this application;
[0021] Figure 4 This is a schematic diagram of the structure of a dynamic point cloud color attribute enhancement device for time consistency according to an embodiment of this application;
[0022] Figure 5 This is a structural block diagram of a network device according to an embodiment of this application;
[0023] Figure 6 This is a structural block diagram of a user equipment according to one embodiment of this application. Detailed Implementation
[0024] The embodiments of this application are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals identify the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain this application, and should not be construed as limiting this application.
[0025] Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms “a,” “an,” “the,” and “the” used herein may also include the plural forms, while “a plurality” refers to two or more, and other quantifiers are similarly understood. It should be further understood that the word “comprising” as used in this application’s specification means the presence of the stated feature, integer, step, operation, element, and / or component, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof. It should be understood that when we say an element is “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, or there may be intermediate elements. Furthermore, “connected” or “coupled” as used herein can include wireless connection or wireless coupling. The word “and / or” as used herein describes the relationship between related objects, indicating that three relationships can exist; for example, A and / or B can represent: A alone, A and B simultaneously, and B alone. The character “ / ” generally indicates that the preceding and following related objects are in an “or” relationship.
[0026] The solution presented in this application can be applied to video processing scenarios. It can analyze point cloud data within a frame and analyze the temporal consistency of point cloud data between adjacent frames, thereby making the transition of color attributes smoother between multiple frames and improving video quality. Specifically, this solution can acquire a first point cloud frame to be processed and a second point cloud frame that is temporally related to the first point cloud frame. The second point cloud frame can contain multiple frames, and can be one or two frames before the first point cloud frame, or a frame after the first point cloud frame. This scheme divides the first point cloud frame into multiple first point cloud blocks. It then searches for and identifies multiple candidate point cloud blocks to analyze their similarity, determining the second point cloud blocks within the second point cloud frame that are related to the first point cloud blocks. Single-frame feature extraction is then performed based on the first and second point cloud blocks to determine first and second high-dimensional features, including color feature extraction. These first and second high-dimensional features are then input into a convolutional point cloud long short-term memory network to analyze their temporal dependencies, yielding output features. Based on these output features, information to be fused is determined, and the first point cloud blocks of the first point cloud frame are fused with this information to obtain the processed first point cloud block, thus determining the fused first point cloud frame. This scheme analyzes the temporal consistency between point cloud frames based on adjacent frames, thereby adjusting and optimizing the point cloud blocks within each frame, ultimately improving video quality.
[0027] This proposal suggests a temporally consistent method for enhancing the color attributes of dynamic point clouds. Addressing the distortion and temporal consistency issues caused by quantization during dynamic point cloud compression, it employs a three-step Spatial-Temporal Search (STS) module, a Single Frame Feature Extraction (SFFE) module (or single-frame feature extraction network), and a Convolutional Point Cloud Long Short-Term Memory (Conv-Point LSTM) network. The Convolutional Point Cloud LSTM network can also be referred to as a three-dimensional Long Short-Term Memory network. The single-frame feature extraction network can be replaced with a neural network based on deep learning 3D networks (such as PointNet or Point Transformer). The STS module achieves feature alignment by adaptively searching point cloud patches in the temporal domain. The SFFE module combines multi-head attention and graph convolution to extract latent color features from single-frame point clouds. The Conv-Point LSTM network captures spatiotemporal dependencies through convolution and LSTM mechanisms to enhance the consistency of color attributes and visual quality.
[0028] like Figure 1As shown, dt is the point cloud frame of the current frame, and dt-1 and dt-2 are the point cloud frames of the previous frame and the two frames before that. Taking the current point cloud block Pt to be enhanced as the reference, and combining the point cloud blocks Pt-1 and Pt-2 of the previous two frames, the color attributes of the dynamic point cloud are gradually enhanced through the three-step cubic spatiotemporal search (STS) module, the single frame feature extraction (SFFE) module, and the convolutional point cloud long short-term memory network (Conv-Point LSTM).
[0029] The main process of this scheme includes: inputting a dynamic point cloud frame (distorted point cloud) and dividing it into multiple small blocks containing a fixed number of points. Based on the farthest point sampling (FPS), m sampling points are selected as center points, and point cloud blocks are generated using the k-nearest neighbor (KNN) algorithm to complete the initial division of the point cloud. For the reference point cloud block, the STS module is used to match the corresponding point cloud block in the preceding frame in the time domain. First, a spatiotemporal search is performed starting from the center of the reference point cloud block to construct a three-dimensional search space (which can be a cube, a sphere, or other shapes). This scheme uses a three-dimensional cube search space, constructing 9 point cloud blocks centered on the centroid of the cube and each vertex. By calculating the point cloud structural similarity metric (Point SSIM) score, the point cloud block with the highest score is selected, and the center of this point cloud block is used as the starting point for the next search. At the same time, the search step size is halved, and the above process is repeated 3 times to finally obtain the optimal matching point cloud block.
[0030] The baseline point cloud patch and its matching preceding frame point cloud patches (patch set) are input into the SFFE module, where high-dimensional features are extracted using multi-head attention and graph convolution. During this process, normal vectors and distance maps are used as geometric inputs to further optimize the feature representation. The extracted high-dimensional features are then fed into the Conv-Point LSTM module, which combines feature information from the current frame and the previous two frames to model the spatiotemporal dependencies of the point cloud. The state update process is optimized through convolution operations and max pooling. The features output from the Conv-Point LSTM are fused through a fully connected layer, and the enhanced features are added to the original features of the baseline point cloud patch using a residual structure. The enhanced point cloud patches are then stitched together to reconstruct a complete point cloud frame (patch fusion), completing the color attribute enhancement.
[0031] Figure 2The implementation process of the STS module is demonstrated. It can sample from point cloud frames to generate sampling points and point cloud blocks (such as a reference patch set). Then, in the baseline point cloud block Pt(S0), point S0(x0, y0, z0) is selected as the search starting point, constructing a cube centered on S0 containing 27 candidate points. Candidate point cloud blocks are generated using the KNN algorithm, similarity is calculated, and the best matching point S1(x1, y1, z1) is selected. Subsequently, the search range is further narrowed using S1 as the center, and the search process is repeated to determine candidate points from the previous point cloud frame and candidate point cloud blocks (candidate patch sets) for similarity matching. Finally, the optimal matching point cloud block Pt-1(S3) is obtained. The SFFE module uses two parallel attention modules to initially extract point cloud features, and then concatenates another attention module to further integrate local features. Geometric information (normal vectors and distance maps) is combined with point cloud features to enrich feature dimensions and enhance graph convolution representation capabilities. The Conv-Point LSTM module works by taking the point cloud block features of the current frame and historical frames as input, replacing element-wise multiplication with convolutional layer operations, and combining max pooling operations to achieve unified modeling of the point cloud block features, and outputting optimized spatiotemporal features.
[0032] Specifically, the processing steps of this scheme include: dividing the baseline point cloud into small blocks containing a fixed number of points; using a three-step cube spatiotemporal search (STS) module to find the optimal matching preceding point cloud block in the temporal domain; extracting high-dimensional features of each point cloud block through a single-frame feature extraction (SFFE) module, which uses a multi-head attention mechanism combined with graph convolution for feature extraction, and further optimizes local features through normal vectors and distance maps; inputting the extracted high-dimensional features into a convolutional point cloud long short-term memory network (Conv-Point LSTM) to capture the spatiotemporal dependency characteristics of the point cloud; fusing the features output by the Conv-Point LSTM, predicting the residual through a fully connected layer, and adding it to the original color of the baseline point cloud block to obtain the enhanced point cloud block; and stitching all the enhanced point cloud blocks together to restore the complete point cloud frame, thus completing the color attribute enhancement.
[0033] The three-step cube spatiotemporal search includes: generating sampling points using the Farthest Point Sampling (FPS) algorithm and generating point cloud blocks using the k-Nearest Neighbor (KNN) algorithm; constructing a three-dimensional cube search space starting from the center point of the reference point cloud block, gradually narrowing the search range, and selecting the optimal matching point cloud block through similarity calculation (Point SSIM). The steps for constructing the 3D cube search space include: taking the center point S0(x0, y0, z0) of the reference point cloud as the starting point of the 3D cube search space; constructing the 3D cube search space with S0 as the center, where the vertex coordinates of the cube are calculated according to unit offsets Δx, Δy, Δz∈{-1,0,1}, containing 8 vertices and the centroid, for a total of 9 candidate points; for each candidate point C0(x0+sΔx, y0+sΔy, z0+sΔz), where s represents the search step size, generating a candidate point cloud containing n points using the k-nearest neighbor (KNN) algorithm; selecting the center point corresponding to the current best point cloud as the starting point for the next search based on the point cloud similarity matching criterion, while halving the search step size s; repeating the above process 3 times.
[0034] The steps for constructing spatiotemporal features include: a single-frame feature extraction network extracts local spatial features within a frame and combines this with point cloud geometric information to guide color feature extraction; a convolutional point cloud long short-term memory network models temporal dependencies, fusing features from the current frame and historical frames while preserving the spatial structure of the point cloud, thus modeling spatiotemporal features. The optimized loss function uses Mean Squared Error (MSE) L1 to evaluate the spatial error in color attributes between the enhanced and original point clouds; a temporal error L2 is introduced to evaluate the consistency of color changes in point clouds from adjacent frames; the final loss function combines the above two parts and is expressed as a weighted sum: L = L1 + λL2, where λ is the weight parameter. This scheme can adjust the parameters of the single-frame feature extraction module, the convolutional point cloud long short-term memory network, and the fully connected layers based on the loss function, utilizing spatial error and temporal consistency.
[0035] Based on the above embodiments, this application also provides a method for enhancing the color attributes of dynamic point clouds with temporal consistency, such as... Figure 3 As shown, the method includes:
[0036] Step 102: Obtain a first point cloud frame and a second point cloud frame that are temporally adjacent, and determine the first point cloud block within the first point cloud frame. Step 104: Based on the first point cloud block, determine the second point cloud block within the second point cloud frame that corresponds to the first point cloud block. Step 106: Perform feature extraction based on the first and second point cloud blocks to determine the first high-dimensional feature and the second high-dimensional feature. Feature extraction for the first and second point cloud blocks includes color feature extraction. Step 108: Input the first and second high-dimensional features into a convolutional point cloud long short-term memory network to analyze the temporal dependency between the first and second high-dimensional features, obtaining the output features. Step 110: Determine the information to be fused based on the output features, and fuse the first point cloud block of the first point cloud frame with the information to be fused to obtain the processed first point cloud block, thereby determining the fused first point cloud frame. Specifically, as an optional embodiment, determining the information to be fused based on the output features includes: processing the first high-dimensional feature and the output feature of the first point cloud block based on a fully connected layer to determine the information to be fused.
[0037] The solution presented in this application can be applied to video processing scenarios. It can analyze point cloud data within a frame and analyze the temporal consistency of point cloud data between adjacent frames, thereby making the transition of color attributes smoother between multiple frames and improving video quality. Specifically, this solution can acquire a first point cloud frame to be processed and a second point cloud frame that is temporally related to the first point cloud frame. The second point cloud frame can contain multiple frames, and can be one or two frames before the first point cloud frame, or a frame after the first point cloud frame. This scheme divides the first point cloud frame into multiple first point cloud blocks. It then searches for and identifies multiple candidate point cloud blocks to analyze their similarity, determining the second point cloud blocks within the second point cloud frame that are related to the first point cloud blocks. Single-frame feature extraction is then performed based on the first and second point cloud blocks to determine first and second high-dimensional features, including color feature extraction. These first and second high-dimensional features are then input into a convolutional point cloud long short-term memory network to analyze their temporal dependencies, yielding output features. Based on these output features, information to be fused is determined, and the first point cloud blocks of the first point cloud frame are fused with this information to obtain the processed first point cloud block, thus determining the fused first point cloud frame. This scheme analyzes the temporal consistency between point cloud frames based on adjacent frames, thereby adjusting and optimizing the point cloud blocks within each frame, ultimately improving video quality.
[0038] This scheme can generate sampling points based on point cloud frames and perform cluster analysis to determine multiple first point cloud blocks. Specifically, as an optional embodiment, determining the first point cloud blocks within the first point cloud frame includes: generating sampling points within the first point cloud frame based on the farthest point sampling algorithm, and generating first point cloud blocks based on the k-nearest neighbor algorithm. This scheme can perform a search based on the first point cloud blocks to construct a three-dimensional search space, and generate candidate point cloud blocks for candidate points in the three-dimensional search space, so as to match the first point cloud blocks with the candidate point cloud blocks to determine the second point cloud blocks. Specifically, as an optional embodiment, the second point cloud frame includes the point cloud frame preceding the first point cloud frame and the point cloud frames of the previous two frames; the step of determining the second point cloud block corresponding to the first point cloud block in the second point cloud frame based on the first point cloud block includes: determining the search starting point of the three-dimensional search space based on the first point cloud block, and determining the set step size corresponding to the search starting point; performing a search based on the search starting point and the set step size to determine candidate points, and constructing a three-dimensional search space composed of candidate points; performing similarity calculation based on the first point cloud block and the point cloud blocks corresponding to the candidate points in the three-dimensional search space to determine the second point cloud block corresponding to the first point cloud block in the second point cloud frame.
[0039] This scheme can iterate the search for candidate points multiple times. Specifically, as an optional embodiment, determining the starting point of the 3D search space based on the first point cloud block and determining the set step size corresponding to the starting point includes: using the center point of the first point cloud block as the starting point of the 3D search space and using the first step size as the set step size; using the candidate point of the point cloud block matching the first point cloud block in the 3D search space as the starting point and using the second step size as the set step size, where the length of the second step size is shorter than the length of the first step size. This scheme continuously updates the starting point and step size of the 3D search space. For example, it can search three times, with the starting point of the 3D search space each time being the best candidate point corresponding to the first point cloud block, and the set step size for each search being half of the set step size for the previous search.
[0040] This scheme can also incorporate more dimensional information into single-frame features to optimize the features. Specifically, as an optional embodiment, the feature extraction based on the first and second point cloud blocks to determine the first and second high-dimensional features includes: obtaining the distance information and normal vectors corresponding to the first and second point cloud blocks; and performing feature extraction based on the first point cloud block, the second point cloud block, the distance information, and the normal vectors to determine the first and second high-dimensional features. When the single-frame feature extraction network extracts features from the first or second point cloud block, it can also combine the distance information (distance map) and normal vectors for analysis to optimize the feature extraction results.
[0041] This solution can analyze the spatial error and temporal consistency of the optimized point cloud frames to adjust the model in conjunction with the loss function. Specifically, as an optional embodiment, the method further includes: comparing the first point cloud frame with the fused first point cloud frame to determine the spatial error; comparing the first point cloud frame and the second point cloud frame to determine the temporal consistency; and determining the adjustment amount corresponding to the convolutional point cloud long short-term memory network based on the spatial error and temporal consistency, so as to adjust the convolutional point cloud long short-term memory network.
[0042] Based on the above embodiments, this application also provides a device for enhancing the color attributes of dynamic point clouds with temporal consistency, such as... Figure 4 As shown, the device includes:
[0043] The point cloud frame acquisition module 202 is used to acquire a first point cloud frame and a second point cloud frame that is adjacent in the time domain, and to determine the first point cloud block within the first point cloud frame.
[0044] The point cloud block matching module 204 is used to determine the second point cloud block in the second point cloud frame that corresponds to the first point cloud block, based on the first point cloud block.
[0045] The high-dimensional feature extraction module 206 is used to extract features based on the first point cloud block and the second point cloud block, determine the first high-dimensional feature and the second high-dimensional feature, and the feature extraction of the first point cloud block and the second point cloud block includes color feature extraction.
[0046] The output feature acquisition module 208 is used to input the first high-dimensional feature and the second high-dimensional feature into the convolutional point cloud long short-term memory network to analyze the time dependency characteristics between the first high-dimensional feature and the second high-dimensional feature, and obtain the output feature.
[0047] The point cloud frame fusion module 210 is used to determine the information to be fused based on the output features, and to fuse the first point cloud block of the first point cloud frame with the information to be fused to obtain the processed first point cloud block, so as to determine the fused first point cloud frame.
[0048] The implementation methods of this application are similar to those of the above embodiments. For specific implementation methods, please refer to the specific implementation methods of the above embodiments, which will not be repeated here.
[0049] The solution presented in this application can be applied to video processing scenarios. It can analyze point cloud data within a frame and analyze the temporal consistency of point cloud data between adjacent frames, thereby making the transition of color attributes smoother between multiple frames and improving video quality. Specifically, this solution can acquire a first point cloud frame to be processed and a second point cloud frame that is temporally related to the first point cloud frame. The second point cloud frame can contain multiple frames, and can be one or two frames before the first point cloud frame, or a frame after the first point cloud frame. This scheme divides the first point cloud frame into multiple first point cloud blocks. It then searches for and identifies multiple candidate point cloud blocks to analyze their similarity, determining the second point cloud blocks within the second point cloud frame that are related to the first point cloud blocks. Single-frame feature extraction is then performed based on the first and second point cloud blocks to determine first and second high-dimensional features, including color feature extraction. These first and second high-dimensional features are then input into a convolutional point cloud long short-term memory network to analyze their temporal dependencies, yielding output features. Based on these output features, information to be fused is determined, and the first point cloud blocks of the first point cloud frame are fused with this information to obtain the processed first point cloud block, thus determining the fused first point cloud frame. This scheme analyzes the temporal consistency between point cloud frames based on adjacent frames, thereby adjusting and optimizing the point cloud blocks within each frame, ultimately improving video quality.
[0050] It should be noted that the division of units and / or modules in the embodiments of this application is illustrative and only represents a logical functional division. In actual implementation, there may be other division methods. Furthermore, the functional units and / or modules in the various embodiments of this application can be integrated into one processing unit and / or module, or each unit and / or module can exist physically separately, or two or more units and / or modules can be integrated into one unit and / or module. The integrated units and / or modules described above can be implemented in hardware or as software functional units and / or modules.
[0051] If the integrated units and / or modules are implemented as software functional units and / or modules and sold or used as independent products, they can be stored in a processor-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to related technologies, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0052] Furthermore, the data transmission apparatus and data transmission method provided in the above embodiments are based on the same application concept. Since the methods and apparatus solve problems in similar principles, the implementation of the apparatus and methods can refer to each other, and repeated parts will not be described again.
[0053] Figure 5 A structural block diagram of a network device is shown according to an exemplary embodiment.
[0054] like Figure 5 As shown, the network device 1100 includes at least: a processor 1110, a memory 1120, and a transceiver 1130.
[0055] The transceiver 1130 is used to receive and send data under the control of the processor 1110.
[0056] exist Figure 5 In this context, the bus architecture can include any number of interconnected buses and bridges, specifically linking various circuits of one or more processors represented by processor 1110 and memory represented by memory 1120 together. The bus architecture can also link various other circuits such as peripheral devices, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein. The bus interface provides an interface. The transceiver 1130 can be multiple elements, including transmitters and receivers, providing units and / or modules for communicating with various other devices over transmission media, including wireless channels, wired channels, optical fibers, and other transmission media.
[0057] The processor 1110 is responsible for managing the bus architecture and general processing, and the memory 1120 can store the data used by the processor 1110 when performing operations.
[0058] Optionally, the processor 1110 may be a central processing unit (CPU), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a complex programmable logic device (CPLD). The processor 1110 may also adopt a multi-core architecture. The processor 1110 and the memory 1120 may also be physically separated.
[0059] The processor 1110 calls the computer program stored in the memory 1120 to execute any of the cell wireless network temporary identifier allocation methods provided in the above embodiments of this application according to the obtained executable instructions.
[0060] Figure 6 A structural block diagram of a user equipment is shown according to an exemplary embodiment.
[0061] like Figure 6 As shown, the user equipment 1300 includes at least: a processor 1310, a memory 1320, and a transceiver 1330.
[0062] The transceiver 1330 is used to receive and send data under the control of the processor 1310.
[0063] exist Figure 6 In this context, the bus architecture can include any number of interconnected buses and bridges, specifically linking various circuits of one or more processors represented by processor 1310 and memory represented by memory 1320 together. The bus architecture can also link various other circuits such as peripheral devices, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein. The bus interface provides an interface. The transceiver 1330 can be multiple elements, including transmitters and receivers, providing units and / or modules for communicating with various other devices over a transmission medium, including wireless channels, wired channels, optical fibers, etc. For different user equipment, the user interface 1340 can also be an interface capable of connecting external or internal devices, including but not limited to keypads, displays, speakers, microphones, joysticks, etc.
[0064] The processor 1310 is responsible for managing the bus architecture and general processing, and the memory 1320 can store the data used by the processor 1310 when performing operations.
[0065] Optionally, the processor 1310 can be a CPU (Central Processing Unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or a CPLD (Complex Programmable Logic Device). The processor 1310 can also adopt a multi-core architecture. The processor 1310 and the memory 1320 can also be physically separated.
[0066] The processor 1310 calls the computer program stored in the memory 1320 to execute any of the cell wireless network temporary identifier allocation methods provided in the above embodiments of this application according to the obtained executable instructions.
[0067] It should be noted that the apparatus provided in this application embodiment can implement all the method steps implemented in the above method embodiment and can achieve the same technical effect. Here, the parts that are the same as those in the method embodiment and the beneficial effects will not be described in detail.
[0068] Furthermore, this application provides a storage medium storing a computer program, which, when executed by a processor, implements the data transmission methods described in the above embodiments. The storage medium can be any available medium or data storage device accessible to the processor, including but not limited to magnetic storage (e.g., floppy disks, hard disks, magnetic tapes, magneto-optical disks (MO), etc.), optical storage (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor storage (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND flash), solid-state drives (SSDs)).
[0069] This application provides a program product, such as an FPGA chip or a DSP chip, which includes executable instructions stored in a storage medium. A processor reads the executable instructions from the storage medium, causing the processor to execute the executable instructions to implement the data transmission methods described in the above embodiments.
[0070] The solution presented in this application can be applied to video processing scenarios. It can analyze point cloud data within a frame and analyze the temporal consistency of point cloud data between adjacent frames, thereby making the transition of color attributes smoother between multiple frames and improving video quality. Specifically, this solution can acquire a first point cloud frame to be processed and a second point cloud frame that is temporally related to the first point cloud frame. The second point cloud frame can contain multiple frames, and can be one or two frames before the first point cloud frame, or a frame after the first point cloud frame. This scheme divides the first point cloud frame into multiple first point cloud blocks. It then searches for and identifies multiple candidate point cloud blocks to analyze their similarity, determining the second point cloud blocks within the second point cloud frame that are related to the first point cloud blocks. Single-frame feature extraction is then performed based on the first and second point cloud blocks to determine first and second high-dimensional features, including color feature extraction. These first and second high-dimensional features are then input into a convolutional point cloud long short-term memory network to analyze their temporal dependencies, yielding output features. Based on these output features, information to be fused is determined, and the first point cloud blocks of the first point cloud frame are fused with this information to obtain the processed first point cloud block, thus determining the fused first point cloud frame. This scheme analyzes the temporal consistency between point cloud frames based on adjacent frames, thereby adjusting and optimizing the point cloud blocks within each frame, ultimately improving video quality.
[0071] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.
[0072] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-executable instructions. These computer-executable instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0073] These processor-executable instructions may also be stored in a processor-readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the processor-readable memory produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0074] These processors can execute instructions that can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable device for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0075] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.
[0076] The above description is only a partial embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be included in the protection scope of this application.
Claims
1. A method for enhancing the color attributes of dynamic point clouds with temporal consistency, characterized in that, The method includes: Acquire the first point cloud frame and the second point cloud frame that are adjacent in the time domain, and determine the first point cloud block within the first point cloud frame; Based on the first point cloud block, determine the second point cloud block within the second point cloud frame that corresponds to the first point cloud block; Feature extraction is performed based on the first point cloud patch and the second point cloud patch to determine the first high-dimensional feature and the second high-dimensional feature. The feature extraction of the first point cloud patch and the second point cloud patch includes color feature extraction. The first and second high-dimensional features are input into a convolutional point cloud long short-term memory network to analyze the time dependency between the first and second high-dimensional features and obtain the output features. Based on the output features, the information to be fused is determined, and the first point cloud block of the first point cloud frame is fused with the information to be fused to obtain the processed first point cloud block, so as to determine the fused first point cloud frame.
2. The method as described in claim 1, characterized in that, Determining the first point cloud block within the first point cloud frame includes: Based on the farthest point sampling algorithm, sampling points are generated within the first point cloud frame, and the first point cloud block is generated based on the k-nearest neighbor algorithm.
3. The method as described in claim 1, characterized in that, The second point cloud frame includes the point cloud frame preceding the first point cloud frame and the point cloud frames of the previous two frames; the step of determining the second point cloud block corresponding to the first point cloud block within the second point cloud frame based on the first point cloud block includes: The starting point of the three-dimensional search space is determined based on the first point cloud block, and the set step size corresponding to the starting point is determined. The search is performed based on the starting point and the set step size to determine candidate points and construct a three-dimensional search space composed of candidate points; Similarity calculation is performed based on the first point cloud block and the point cloud blocks corresponding to candidate points in the 3D search space to determine the second point cloud block in the second point cloud frame that corresponds to the first point cloud block.
4. The method as described in claim 3, characterized in that, The step of determining the starting point of the three-dimensional search space based on the first point cloud block and determining the set step size corresponding to the starting point includes: The center point of the first cloud block is used as the starting point for the search in the three-dimensional search space, and the first step length is used as the set step length. Starting from the candidate points of the point cloud blocks that match the first point cloud block in the 3D search space, and using the second step length as the set step length, the length of the second step length is shorter than the length of the first step length.
5. The method as described in claim 1, characterized in that, The feature extraction based on the first point cloud patch and the second point cloud patch, to determine the first high-dimensional feature and the second high-dimensional feature, includes: Obtain the distance information and normal vectors corresponding to the first and second point cloud blocks; Feature extraction is performed based on the first point cloud patch, the second point cloud patch, distance information, and normal vectors to determine the first high-dimensional feature and the second high-dimensional feature.
6. The method as described in claim 1, characterized in that, The process of determining the information to be fused based on output features includes: The first high-dimensional feature and output feature of the first point cloud block are processed based on the fully connected layer to determine the information to be fused.
7. The method as described in claim 1, characterized in that, The method further includes: The spatial error is determined by comparing the first point cloud frame with the fused first point cloud frame. The first point cloud frame and the second point cloud frame are compared to determine time consistency; Based on spatial error and temporal consistency, the adjustment amount corresponding to the convolutional point cloud long short-term memory network is determined in order to adjust the convolutional point cloud long short-term memory network.
8. A time-consistent dynamic point cloud color attribute enhancement device, characterized in that, The device includes: The point cloud frame acquisition module is used to acquire a first point cloud frame and a second point cloud frame that is adjacent in the time domain, and to determine the first point cloud block within the first point cloud frame. The point cloud block matching module is used to determine the second point cloud block in the second point cloud frame that corresponds to the first point cloud block, based on the first point cloud block; The high-dimensional feature extraction module is used to extract features based on the first point cloud block and the second point cloud block, determine the first high-dimensional feature and the second high-dimensional feature, and the feature extraction of the first point cloud block and the second point cloud block includes color feature extraction. The output feature acquisition module is used to input the first high-dimensional feature and the second high-dimensional feature into the convolutional point cloud long short-term memory network to analyze the time dependency characteristics between the first high-dimensional feature and the second high-dimensional feature, and obtain the output feature; The point cloud frame fusion module is used to determine the information to be fused based on the output features, and to fuse the first point cloud block of the first point cloud frame with the information to be fused to obtain the processed first point cloud block, so as to determine the fused first point cloud frame.
9. A network device, characterized in that, include: The system includes a memory, a transceiver, and a processor; wherein the memory is used to store computer programs; and the transceiver is used to send and receive data under the control of the processor. The processor is configured to read the computer program in the memory and execute the method as described in claims 1-7.
10. A storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method as described in claims 1-7.