Method, electronic device and program product for training an encoder and processing data
By training the encoder and decoder and adjusting the loss of the invariant and variable parts, the problems of low efficiency and waste of resources in point cloud data processing are solved, and more efficient and accurate point cloud data processing is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- DELL PROD LP
- Filing Date
- 2022-04-22
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies suffer from low efficiency, waste of resources, and insufficient accuracy when processing 3D point cloud data, especially when processing large-scale point cloud data. Traditional methods are limited by sparsity and the computational cost of 3D convolution.
By training the encoder, transforming the encoded data to determine multiple invariant and variable parts, adjusting the encoder parameters based on similarity loss and spatial loss, and combining the encoder with the decoder for training, the efficiency and accuracy of point cloud data processing are improved.
It improves the efficiency of point cloud data processing, saves time and computing resources, and enhances processing accuracy.
Smart Images

Figure CN116993997B_ABST
Abstract
Description
Technical Field
[0001] Embodiments of this disclosure relate to the field of data processing, and more specifically, to methods, electronic devices, and program products for training encoders and processing data. Background Technology
[0002] With the development of computer technology, people have begun to utilize computer vision technology to obtain information about target objects or environments. This process typically involves acquiring point cloud data of the object using various devices, and then analyzing the acquired point cloud data to obtain various desired information. Currently, most existing features for point clouds are handcrafted for specific tasks. Point cloud features usually encode certain statistical properties of the points. However, many problems still need to be solved in the analysis and processing of point cloud data. Summary of the Invention
[0003] Embodiments of this disclosure provide a method, electronic device, and program product for training an encoder and processing data.
[0004] According to a first aspect of this disclosure, a method for training an encoder is provided. The method includes inputting sample point cloud data of an object into an encoder to obtain encoded data for the object. The method includes transforming the encoded data to determine a plurality of invariant parts and a plurality of variable parts for the object, wherein the invariant parts indicate invariant features of the object, and the variable parts indicate variable features of the object. The method further includes determining a similarity loss and a spatial loss for the sample point cloud data based on the plurality of invariant parts and the plurality of variable parts. The method further includes adjusting the parameters of the encoder based on the similarity loss and the spatial loss to obtain a trained encoder.
[0005] According to a second aspect of this disclosure, a method for processing data is provided. The method includes inputting point cloud data of an object into a trained encoder to obtain object-specific encoded data. The trained encoder is obtained by adjusting encoder parameters based on a similarity loss and a spatial loss obtained from sample point cloud data of a sample object. The method further includes transforming the encoded data to determine object-specific invariant and object-specific variable components, the invariant components indicating invariant features of the object and the variable components indicating variable features of the object.
[0006] According to a third aspect of this disclosure, an electronic device is provided. The electronic device includes at least one processor; and a memory coupled to the at least one processor and having instructions stored thereon, the instructions causing the device to perform actions when executed by the at least one processor, the actions including: inputting sample point cloud data of an object into an encoder to obtain encoded data for the object; determining a plurality of invariant portions and a plurality of variable portions of the object by transforming the encoded data, the invariant portions indicating invariant features of the object and the variable portions indicating variable features of the object; determining a similarity loss and a spatial loss for the sample point cloud data based on the plurality of invariant portions and the plurality of variable portions; and adjusting the parameters of the encoder based on the similarity loss and the spatial loss to obtain a trained encoder.
[0007] According to a fourth aspect of this disclosure, an electronic device is provided. The electronic device includes at least one processor; and a memory coupled to the at least one processor and having instructions stored thereon, the instructions causing the device to perform actions when executed by the at least one processor, the actions including: inputting point cloud data of an object into a trained encoder to obtain encoded data for the object, the trained encoder being obtained by adjusting encoder parameters based on similarity loss and spatial loss obtained from sample point cloud data for a sample object; and transforming the encoded data to determine an invariant portion and a variable portion for the object, the invariant portion indicating invariant features of the object and the variable portion indicating variable features of the object.
[0008] According to a fifth aspect of this disclosure, a computer program product is provided, which is tangibly stored on a non-volatile computer-readable medium and includes machine-executable instructions that, when executed, cause a machine to perform the steps of the methods in the first and second aspects of this disclosure. Attached Figure Description
[0009] The above and other objects, features and advantages of this disclosure will become more apparent from the accompanying drawings, in which like reference numerals generally denote like parts.
[0010] Figure 1 The illustration shows a schematic diagram of an example environment 100 in which the devices and / or methods according to embodiments of the present disclosure may be implemented;
[0011] Figure 2 A flowchart illustrating a method 200 for training an encoder according to an embodiment of the present disclosure is shown;
[0012] Figure 3The illustration shows a schematic diagram of an example 300 of matrix decomposition according to an embodiment of the present disclosure;
[0013] Figure 4 The illustration shows a schematic diagram of an example process 400 for training an encoder according to an embodiment of the present disclosure;
[0014] Figure 5 The illustration shows a schematic diagram of example 500 using a decoder according to an embodiment of the present disclosure;
[0015] Figure 6 A flowchart illustrating a method 600 for processing data according to an embodiment of the present disclosure is shown;
[0016] Figure 7 The illustration shows a schematic diagram of an application example 700 of data according to an embodiment of the present disclosure;
[0017] Figure 8 A schematic block diagram of an example device 800 suitable for implementing embodiments of the present disclosure is shown.
[0018] In the various figures, the same or corresponding reference numerals indicate the same or corresponding parts. Detailed Implementation
[0019] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.
[0020] In the description of embodiments of this disclosure, the term "comprising" and similar terms should be understood as open-ended inclusion, i.e., "including but not limited to". The term "based on" should be understood as "at least partially based on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first", "second", etc., may refer to different or the same objects. Other explicit and implicit definitions may also be included below.
[0021] As mentioned above, 3D point cloud data contains a wealth of information about the object. Traditional approaches employ various techniques to process and analyze 3D point cloud data. These methods are subject to several limitations. For example, volumetric convolutional neural networks (CNNs) were an early approach to applying convolutional neural networks to process point cloud data. However, due to data sparsity and the computational cost of 3D convolution, volumetric representation is limited by its resolution. Specialized methods for handling sparsity have also been proposed. However, these operations still focus on sparse volumes. Processing very large point clouds presents a challenge. Furthermore, this approach is time-consuming, wastes resources, and is not user-friendly.
[0022] To address at least the aforementioned and other potential problems, embodiments of this disclosure propose a method for training an encoder and processing data. In this method, a computing device inputs sample point cloud data of an object into an encoder to obtain encoded data for the object. By transforming the encoded data, multiple invariant parts and multiple variable parts of the object are determined. The computing device determines a similarity loss and a spatial loss for the sample point cloud data based on the multiple invariant parts and the multiple variable parts. The computing device then adjusts the encoder parameters based on the similarity loss and the spatial loss to obtain a trained encoder. The computing device can further utilize the trained encoder to process the point cloud data. This method improves the efficiency of point cloud data processing, saves time and computing resources, and improves accuracy.
[0023] The embodiments of this disclosure will now be described in further detail with reference to the accompanying drawings, wherein... Figure 1 A schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented is shown.
[0024] like Figure 1 As shown, the example environment 100 includes a computing device 104. The computing device 104 is used to receive point cloud data 102 for processing.
[0025] The computing device 104 includes, but is not limited to, personal computers, server computers, handheld or laptop devices, mobile devices (such as mobile phones, personal digital assistants (PDAs), media players, etc.), multiprocessor systems, consumer electronics, minicomputers, mainframe computers, and distributed computing environments that include any of the above systems or devices.
[0026] The point cloud data 102 in example environment 100 includes point cloud data for a target object. This point cloud data 102 is for a rigid target object, such as an airplane or a face, or it can be for other suitable objects.
[0027] Point cloud data 102 is input to encoder 106 in computing device 104 to obtain encoded data. In some embodiments, encoder 106 is a neural network model. In some embodiments, encoder 106 is any suitable machine model for encoding data. The above examples are merely for describing this disclosure and are not intended to specifically limit this disclosure.
[0028] The encoder 106 then inputs the encoded data into the transformation module 108. The transformation module performs matrix transformation on the encoded data to obtain the invariant part 110 and the variable part 112 for the target object.
[0029] In one example, when inputting point cloud data of an aircraft, the invariant parts can correspond to the shape and size of the aircraft, while the variable parts can correspond to the coordinates or position information of various parts of the aircraft. This is because the shape of the aircraft remains constant in point cloud data obtained from any angle, while its position may differ in different point cloud datasets. The above example is merely for describing this disclosure and not for limiting its specific scope. In some embodiments, the input point cloud data is point cloud data of a human face. Then, the invariant and variable parts of the face are obtained. The invariant parts can be the size of the facial features, while the variable parts can be expressions. The variable part 112 can be applied to other objects or avatars. For example, combining expressions with other avatars in an avatar pool to generate an avatar with a new expression. The above example is merely for describing this disclosure and not for limiting its specific scope.
[0030] In some embodiments, during encoder training 106, point cloud data, serving as sample point cloud data, is input into the encoder to obtain a vector representation of the object. This representation is then input into transformation module 108 for matrix transformation to obtain an invariant part 106 and a variable part 112. The invariant part 110 is then used to determine a similarity loss, and the variable part 112 is used to determine a spatial loss. These two losses are then combined to adjust the encoder parameters for training.
[0031] In some embodiments, in addition to utilizing the loss described above, the invariant and variable parts are also input into the decoder to obtain reference point cloud data for the object. Then, the data loss between the reference point cloud data and the sample point cloud data is calculated. Then, the parameters of the encoder and decoder are adjusted by combining the similarity loss, spatial loss and data loss to achieve simultaneous training of the encoder and decoder.
[0032] This method improves the efficiency of point cloud data processing, saves time and computing resources, and enhances accuracy.
[0033] The above combination Figure 1 A block diagram of an example system 100 in which embodiments of the present disclosure can be implemented is described. The following is in conjunction with… Figure 2A flowchart describing a method 200 for training an encoder according to embodiments of the present disclosure is provided. Method 200 can be performed in... Figure 1 Executed on computing device 104 and any suitable computing device.
[0034] At box 202, sample point cloud data of the object is input into the encoder to obtain encoded data for the object. For example, computing device 104 obtains point cloud data 102 as sample point cloud data.
[0035] In some embodiments, point cloud data 102 can be represented by the following equation (1):
[0036] P = {P n |n=1,…,N} (1)
[0037] Where P n This represents the vector representation of each point in the point cloud data, where N is a positive integer representing the number of points in the point cloud data, and P... n It can include its coordinates (x, y, z) and other feature information, such as color, norm, etc.
[0038] At box 204, by transforming the coded data, multiple invariant parts and multiple variable parts of the object are determined. The invariant parts indicate the invariant features of the object, and the variable parts indicate the variable features of the object. For example, if the point cloud data is for an airplane, the invariant parts can indicate the shape and size of the airplane, while the variable parts can indicate the coordinates and positions of various parts of the airplane.
[0039] In some embodiments, the computing device 104 performs matrix transformations on the encoded data to obtain multiple invariant parts and multiple variable parts. For example... Figure 3 The diagram illustrates an example 300 of matrix decomposition according to an embodiment of the present disclosure. Aircraft point cloud data 302 is used to generate a feature vector X 304 from encoder data, which is then subjected to matrix decomposition. The decomposition results in a variable part U 306 and an invariant part V 308.
[0040] In some embodiments, the feature vector X can be represented by the following equation (2):
[0041] X = UV + E (2)
[0042] Where X represents the feature matrix of the point cloud, V represents the object-invariant part, which can be considered as a template factor that captures important information of the dataset, such as the shape and size of the object, and U represents the object-variable part, which can be considered as an activation factor, such as the coordinate position of a point in different point cloud data. U∈R M×k And V∈Rk×d , where M and d represent the dimensions of the vector space, and k is the decomposition factor with k < d. Therefore, a vector transformation can be performed on X to obtain U and V.
[0043] Return Figure 2 Next, it is described that at block 206, a similarity loss and a spatial loss for the sample point cloud data are determined based on multiple invariant parts and multiple variable parts. For example, computing device 104 uses invariant part 110 and variable part 112 to obtain the similarity loss and the spatial loss.
[0044] In some embodiments, computing device 104 determines the similarity loss based on a first invariant part and a second invariant part among multiple invariant parts. Computing device 104 also determines the spatial loss based on a first variable part and a second variable part among multiple variable parts. By using the above method, loss data can be obtained quickly.
[0045] In some embodiments, the similarity loss for the point cloud data of an object is calculated by the following equation (3):
[0046] L sim = ‖V1 - V2‖ F (3)
[0047] where |||| F represents the Frobenius norm, and V1 and V2 are the invariant parts obtained from two point cloud data of an object.
[0048] In some embodiments, the spatial loss for the point cloud data of an object is calculated as follows by the following equation (4):
[0049] L spa (P) = tr(U T WU) (4)
[0050] where tr() represents the trace of a matrix, U is the variable part of the eigenvector, W is the weight matrix of the 3D point cloud set, and W(m,n) is the weight between two points m and n. W(m,n) is calculated by the following equation (5):
[0051]
[0052] where σ represents a parameter for controlling the distance, exp() represents the exponential function, Pm and Pn respectively represent the vector representations of points m and n in the point cloud, and where represents the square of the 2-norm.
[0053] In some embodiments, after obtaining multiple invariant parts and multiple variable parts of an object, the invariant parts and their corresponding variable parts can be input into a decoder to obtain reference point cloud data for the object. The decoder can be a neural network model or any suitable machine model. For example, the first invariant part and the first variable part are input into the decoder module to obtain reference point cloud data for the object. Then, the data loss between the reference point cloud data and the sample point cloud data is determined. At this point, the decoder can be combined to train the encoder and decoder as a whole. The data loss of the decoder is represented by the following equation (6):
[0054]
[0055] in This is the output matrix of the decoder, where N is the number of points in the point cloud. F The above example is for illustrative purposes only and is not intended to further limit this disclosure.
[0056] At box 208, the encoder parameters are adjusted based on similarity loss and spatial loss to obtain a trained encoder. For example, computing device 104 uses similarity loss and spatial loss to adjust encoder 106.
[0057] In some embodiments, the computing device 104 combines the similarity loss and the spatial loss to obtain a combined loss. The computing device 104 then determines whether the combined loss is greater than a first threshold loss.
[0058] If the combined loss is determined to be greater than the first threshold loss, the encoder parameters are adjusted. If the combined loss is less than or equal to the first threshold loss, the encoder parameters are no longer adjusted, and training ends.
[0059] In some embodiments, the computing device 104 can determine whether both the similarity loss and the spatial loss are less than their corresponding threshold losses. If both the similarity loss and the spatial loss are less than their corresponding threshold losses, the encoder parameters are no longer adjusted. Otherwise, the encoder parameters are adjusted.
[0060] In some embodiments, as described above, the encoder is trained in conjunction with the decoder. In this case, the computing device 104 adjusts the parameters of the encoder and decoder based on similarity loss, spatial loss, and data loss. In some embodiments, the computing device combines the similarity loss, spatial loss, and data loss to obtain a total loss. It is then determined whether the total loss is greater than a second threshold loss. If the total loss is determined to be greater than the second threshold loss, the parameters of the encoder and decoder are adjusted. If the total loss is determined to be less than or equal to the second threshold loss, training is stopped.
[0061] For example, when training the encoder and decoder in one block, the total loss is shown in equation (7) below:
[0062]
[0063] Where α and β represent the weights of the control loss function.
[0064] In some embodiments, during the training of the encoder and decoder, the similarity loss, spatial loss, and data loss can be compared with their corresponding threshold losses, respectively. If the similarity loss, spatial loss, and data loss are all less than their respective threshold losses, training stops. If the above conditions are not met, the parameters of the encoder and decoder continue to be adjusted. The above examples are merely for describing this disclosure and are not intended to limit the specific scope of this disclosure. Those skilled in the art can set any suitable method to utilize the obtained losses to adjust the parameters of the encoder and / or decoder as needed.
[0065] The above methods improve the efficiency of point cloud data processing, save time and computing resources, and enhance accuracy.
[0066] The above combination Figure 2-3 The method for training the encoder is described below, in conjunction with... Figure 4 and Figure 5 Describe an example process for training the encoder and decoder.
[0067] like Figure 4 As shown, two point cloud datasets, 402 and 404, are input into encoder 406 to obtain two corresponding feature vectors X1 408 and X2 410. These are then transformed to obtain the corresponding variable parts u1 412 and invariant parts v1 416, as well as variable parts u2 414 and v2 420. A similarity loss 418 is then calculated using V1 and V2, and a spatial loss is calculated using u1 and u2. These similarity and spatial losses are then used to adjust the encoder.
[0068] Figure 5 A further illustration shows Example 500 trained using both the decoder and encoder. Figure 5 As shown, after obtaining the variable part u1 502 and the invariant part v1 504, they are input into the decoder 506. Then, the decoder 506 generates reference point cloud data 508, and the data error, such as mean square error, between the reference point cloud data 508 and the sample point cloud data 512 is calculated. Then, the parameters of the encoder and decoder are adjusted together using similarity loss, spatial loss, and data error. The above example is merely for describing this disclosure and is not intended to limit it.
[0069] The following is combined with Figure 6A flowchart describing a method 600 for processing data according to an embodiment of the present disclosure describes the application of an encoder.
[0070] At box 602, the point cloud data of the object is input into a trained encoder to obtain encoded data for the object. The trained encoder is obtained by adjusting the encoder parameters based on similarity loss and spatial loss obtained from sample point cloud data of the sample object. For example, computing device 104 encodes the point cloud data of the object to be processed using the trained encoder 106.
[0071] At box 602, the invariant and variable parts of an object are determined by transforming the encoded data. The invariant parts indicate the invariant characteristics of the object, and the variable parts indicate the variable characteristics of the object. For example, computing device 104 performs matrix transformations on the encoded data to obtain the invariant and variable parts.
[0072] In some embodiments, computing device 104 may apply variable portions of an object to other objects. For example... Figure 7 As shown, at user equipment 702, the individual's point cloud data is processed using trained coded data to obtain a variable portion u1 for that individual. This u1 is then sent to cloud 704 to be combined with the invariant portion v1 of the avatar and applied to the decoder, resulting in an avatar that includes the individual's varied content. The above example is merely for describing this disclosure and is not intended to limit its specific scope.
[0073] The above methods improve the efficiency of point cloud data processing, save time and computing resources, and increase accuracy.
[0074] Figure 8 A schematic block diagram of an example device 800 that can be used to implement embodiments of the present disclosure is shown. Figure 1 The computing device 104 can be implemented using device 800. As shown, device 800 includes a central processing unit (CPU) 801, which can perform various appropriate actions and processes according to computer program instructions stored in read-only memory (ROM) 802 or loaded from storage unit 808 into random access memory (RAM) 803. RAM 803 can also store various programs and data required for the operation of device 800. CPU 801, ROM 802, and RAM 803 are interconnected via bus 804. Input / output (I / O) interface 805 is also connected to bus 804.
[0075] Multiple components in device 800 are connected to I / O interface 805, including: input unit 806, such as keyboard, mouse, etc.; output unit 807, such as various types of monitors, speakers, etc.; storage page 808, such as disk, optical disk, etc.; and communication unit 809, such as network card, modem, wireless transceiver, etc. Communication unit 809 allows device 800 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0076] The various processes and procedures described above, such as methods 200 and 600, can be executed by processing unit 801. For example, in some embodiments, methods 200 and 600 can be implemented as computer software programs tangibly contained in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and / or installed on device 800 via ROM 802 and / or communication unit 809. When the computer program is loaded into RAM 803 and executed by CPU 801, one or more actions of methods 200 and 600 described above can be performed.
[0077] This disclosure can be a method, apparatus, system, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for performing various aspects of this disclosure.
[0078] Computer-readable storage media can be tangible devices capable of holding and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example—but not limited to—electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination of the foregoing. The computer-readable storage media used herein are not to be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or electrical signals transmitted through wires.
[0079] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.
[0080] Computer program instructions used to perform the operations of this disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing the status information of the computer-readable program instructions to implement various aspects of this disclosure.
[0081] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.
[0082] These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.
[0083] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0084] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0085] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, and are not limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical applications, or technical improvements to the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.
Claims
1. A method for training an encoder, comprising: Input sample point cloud data of the object into the encoder to obtain encoded data for the object; By transforming the encoded data, multiple invariant parts and multiple variable parts of the object are determined, wherein the invariant parts indicate invariant features of the object, and the variable parts indicate variable features of the object; The similarity loss and spatial loss for the sample point cloud data are determined based on the multiple invariant parts and the multiple variable parts. as well as The parameters of the encoder are adjusted based on the similarity loss and the spatial loss to obtain a trained encoder; Determining the similarity loss and the spatial loss includes: The similarity loss is determined based on the first and second invariant parts among the plurality of invariant parts; as well as The spatial loss is determined based on the first and second variable parts among the plurality of variable parts.
2. The method of claim 1, wherein determining the invariant portion and the variable portion comprises: The encoded data is subjected to matrix transformation to obtain the plurality of invariant parts and the plurality of variable parts.
3. The method of claim 1, wherein adjusting the parameters of the encoder comprises: The similarity loss and the spatial loss are combined to obtain a combined loss; as well as If it is determined that the combined loss is greater than the first threshold loss, the parameters of the encoder are adjusted.
4. The method according to claim 1, further comprising: The first invariant part and the first variable part are input into the decoder module to obtain reference point cloud data for the object; as well as Determine the data loss between the reference point cloud data and the sample point cloud data.
5. The method according to claim 4, wherein adjusting the parameters of the encoder comprises: The parameters of the encoder and the decoder are adjusted based on the similarity loss, the spatial loss, and the data loss.
6. The method of claim 5, wherein adjusting the parameters of the encoder and the decoder based on the similarity loss, the spatial loss, and the data loss comprises: The similarity loss, spatial loss, and data loss are combined to obtain the total loss; as well as If the total loss is determined to be greater than the second threshold loss, the parameters of the encoder and the decoder are adjusted.
7. A method for processing data, comprising: The point cloud data of the object is input into a trained encoder to obtain encoded data for the object. The trained encoder is obtained by adjusting the parameters of the encoder based on similarity loss and spatial loss obtained from sample point cloud data for the sample object. as well as By transforming the encoded data, an invariant portion and a variable portion of the object are determined, wherein the invariant portion indicates the invariant characteristics of the object, and the variable portion indicates the variable characteristics of the object; The similarity loss and the spatial loss are determined by the following operations: The similarity loss is determined based on the invariant portion; and The spatial loss is determined based on the variable portion.
8. The method of claim 7, wherein determining the invariant portion and the variable portion comprises: The encoded data is subjected to matrix transformation to obtain the invariant part and the variable part.
9. The method according to claim 7, further comprising: Apply the variable parts of the object to other objects.
10. An electronic device, comprising: At least one processor; as well as A memory, coupled to the at least one processor and having instructions stored thereon, the instructions causing the device to perform actions when executed by the at least one processor, the actions including: Input sample point cloud data of the object into the encoder to obtain encoded data for the object; By transforming the encoded data, multiple invariant parts and multiple variable parts of the object are determined, wherein the invariant parts indicate invariant features of the object, and the variable parts indicate variable features of the object; The similarity loss and spatial loss for the sample point cloud data are determined based on the multiple invariant parts and the multiple variable parts; and The parameters of the encoder are adjusted based on the similarity loss and the spatial loss to obtain a trained encoder; Determining the similarity loss and the spatial loss includes: The similarity loss is determined based on the first and second invariant parts among the plurality of invariant parts; and The spatial loss is determined based on the first and second variable parts among the plurality of variable parts.
11. The electronic device of claim 10, wherein determining the invariant portion and the variable portion comprises: The encoded data is subjected to matrix transformation to obtain the plurality of invariant parts and the plurality of variable parts.
12. The electronic device of claim 10, wherein adjusting the parameters of the encoder comprises: The similarity loss and the spatial loss are combined to obtain a combined loss; as well as If it is determined that the combined loss is greater than the first threshold loss, the parameters of the encoder are adjusted.
13. The electronic device according to claim 10, wherein the action further comprises: The first invariant part and the first variable part are input into the decoder module to obtain reference point cloud data for the object; as well as Determine the data loss between the reference point cloud data and the sample point cloud data.
14. The electronic device of claim 13, wherein adjusting the parameters of the encoder comprises: The parameters of the encoder and the decoder are adjusted based on the similarity loss, the spatial loss, and the data loss.
15. The electronic device of claim 14, wherein adjusting the parameters of the encoder and the parameters of the decoder based on the similarity loss, the spatial loss, and the data loss comprises: The similarity loss, spatial loss, and data loss are combined to obtain the total loss; as well as If the total loss is determined to be greater than the second threshold loss, the parameters of the encoder and the decoder are adjusted.
16. An electronic device comprising: At least one processor; as well as A memory, coupled to the at least one processor and having instructions stored thereon, the instructions causing the device to perform actions when executed by the at least one processor, the actions including: The point cloud data of the object is input into a trained encoder to obtain encoded data for the object. The trained encoder is obtained by adjusting the parameters of the encoder based on similarity loss and spatial loss obtained from sample point cloud data for the sample object. as well as By transforming the encoded data, invariant and variable portions of the object are determined. The invariant portions indicate the invariant characteristics of the object, and the variable portions indicate the variable characteristics of the object. The similarity loss and the spatial loss are determined by the following operations: The similarity loss is determined based on the invariant portion; and The spatial loss is determined based on the variable portion.
17. The electronic device of claim 16, wherein determining the invariant portion and the variable portion comprises: The encoded data is subjected to matrix transformation to obtain the invariant part and the variable part.
18. The electronic device according to claim 16, further comprising: Apply the variable parts of the object to other objects.
19. A computer program product tangibly stored on a non-volatile computer-readable medium and comprising machine-executable instructions that, when executed, cause a machine to perform the steps of the method according to any one of claims 1 to 9.