Devices and methods for pre-processing data for data-driven tasks
The data processing apparatus addresses the inefficiencies in utilizing symmetries by systematically defining representative data elements and combining regularization with augmentation, enhancing model performance and reducing computational costs in data-driven tasks.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- HUAWEI TECH CO LTD
- Filing Date
- 2024-12-18
- Publication Date
- 2026-06-25
Smart Images

Figure EP2024086990_25062026_PF_FP_ABST
Abstract
Description
[0001] Devices and methods for pre-processing data for data-driven tasks
[0002] TECHNICAL FIELD
[0003] The present invention relates to devices and methods for data processing. More specifically, the present invention relates to devices and methods for pre-processing data for data-driven tasks, in particular for artificial intelligence, Al, or machine learning, ML, tasks.
[0004] BACKGROUND
[0005] Data-driven tasks usually require a large amount of training data to be able to learn the main features of a given problem. Shaping the input data in a pre-processing stage, based on a priori knowledge, can simplify the problem for the machine and simplify the learning process. For instance, in many applications, intrinsic symmetries exist in the problem which don’t have impact on the output. Introducing these symmetries to the machine may help the machine in the following ways: the problem at hand may be simplified for the machine; the required amount of training data may be reduced; and / or the generalizability of the model may be improved.
[0006] The following attempts have been made to utilize the intrinsic symmetries inherent in a data-driven task. The size of the dataset has been increased by only employing data augmentation, which has its own drawbacks as it consumes the computation resources of the model to estimate the symmetries. Additionally the estimated symmetrical operators may not be accurate. Some restriction has been added to the model to address the symmetries. This can make the trained model to be sub-optimal. Some heuristic methods have been used, which are not applicable for general use cases.
[0007] SUMMARY
[0008] It is an object of the invention to provide improved devices and methods for pre-processing data for data-driven tasks, in particular for artificial intelligence, Al, or machine learning, ML, tasks.
[0009] The foregoing and other objects are achieved by the subject matter of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
[0010] As used herein, an invertible operator g(.) is symmetrical with respect to the function (.), if for all inputs x, f(x)' = f g x')').
[0011] As used herein, a mathematical group (or group) is a set of invertible operators which is a closed under multiplication, such that the multiplication is associative, has a neutral element, and each element of the group has an inverse for multiplication. Symmetries of a function f form a group.
[0012] As used herein, an action, such as an action of a group G on an input set X, is a morphism from the group G to the group of bijections from X to X.
[0013] As used herein, an orbit 0 of an element x 6 X under the action of the group G on X, is defined as the set of all possible destinations of the element x under the group action, i.e.:
[0014] 0 = Gx = {g.x,g e G}
[0015] As used herein, the representative elements (or representatives) of an orbit 0 are one or more elements of the orbit.
[0016] According to a first aspect a data processing apparatus is provided for pre-processing one or more data elements for a data-driven task. The data processing apparatus according to the first aspect is configured to obtain information indicative of a set, e.g. a list of symmetrical operators associated with the data-driven task. The set of symmetrical operators defines a mathematical group of symmetrical operators, wherein for each of the one or more data elements an orbit of the data element with respect to the mathematical group of symmetrical operators is defined by a plurality of image elements of the data element by the set of symmetrical operators.
[0017] Moreover, the data processing apparatus according to the first aspect is configured to determine for each orbit one or more of the plurality of image elements of the data element as representative data elements of the orbit and to output for each of the one or more data elements the one or more representative data elements.
[0018] As will be described in more detail in the following, the data processing apparatus according to the first aspect implements a systematic and efficient scheme for pre-processes input data, in order to introduce the existing symmetries of a data-driven task problem to a machine implementing the data-driven task. According to embodiments disclosed herein, this is achieved by a combination of data regularization and data augmentation. Thus, the data processing apparatus according to the first aspect allows to, for instance, efficiently utilize the intrinsic symmetries of a data-driven task without consuming much of computational resources, provide a systematic scheme taking advantage from data regularization and data augmentation, apply no constraint on the model implementing the data-driven task, and / or support different types of applications.
[0019] In a further possible implementation form, in an inference phase of the data-driven task, the data processing apparatus according to the first aspect is configured to output for each of the one or more data elements a single representative data element of the one or more representative data elements.
[0020] In a further possible implementation form, in a training phase of the data-driven task, the data processing apparatus according to the first aspect is configured to output for each of the one or more data elements the one or more representative data elements.
[0021] In a further possible implementation form, the data processing apparatus is configured to generate the set of symmetrical operators associated with the data-driven task based on an input set, e.g. an input list of symmetrical operators associated with the data-driven task, wherein the input set of symmetrical operators comprises the set of symmetrical operators and / or a further set of one or more further symmetrical operators.
[0022] In a further possible implementation form, the further set of further symmetrical operators defines a further mathematical group of symmetrical operators, wherein for each of the one or more data elements the further mathematical group of symmetrical operators does not define a representative data element.
[0023] In a further possible implementation form, the data processing apparatus is configured to generate the set of symmetrical operators associated with the data-driven task based on the input set of symmetrical operators in an iterative manner.
[0024] In a further possible implementation form, for generating the set of symmetrical operators associated with the data-driven task based on the input set of symmetrical operators in an iterative manner, the data processing apparatus is configured to iteratively determine for each symmetrical operator of the input set of symmetrical operators whether or not it is possible to determine for each orbit one or more of the plurality of images of the data element as representative data elements of the orbit. In a further possible implementation form, in an interference phase of the data-driven task, the data processing apparatus is configured to output for each of the one or more data elements a single representative data element of the one or more representative data elements.
[0025] In a further possible implementation form, in a first processing stage the data processing apparatus is configured to determine for each of the one or more data elements a single representative data element based on the orbit of the data element defined by the group of symmetrical operators defined by the set of symmetrical operators and to output the single representative data element.
[0026] In a further possible implementation form, in a second processing stage the data processing apparatus is configured for each of the one or more data elements to augment the single representative data element provided by the first processing stage into one or more augmented representative data elements by mapping the single representative data element provided by the first processing stage to one or more data elements of the same orbit defined by the further mathematical group of symmetrical operators.
[0027] In a further possible implementation form, in a third processing stage the data processing apparatus is configured for each of the one or more data elements to determine for each of the one or more augmented representative data elements one or more further representative data elements based on a respective orbit defined by the group of symmetrical operators and to output the one or more further representative data elements.
[0028] According to a second aspect a method is provided for pre-processing one or more data elements for a data-driven task. The method according to the second aspect comprises the steps of:
[0029] obtaining a set, e.g. a list of symmetrical operators associated with the data-driven task, wherein the set of symmetrical operators defines a mathematical group of symmetrical operators and wherein for each of the one or more data elements an orbit of the data element with respect to the mathematical group of symmetrical operators is defined by a plurality of image elements of the data element by the set of symmetrical operators;
[0030] determining for each orbit one or more of the plurality of image elements of the element as the representative data elements of the orbit; and
[0031] outputting for each of the one or more data elements the one or more representative data elements.
[0032] In a further possible implementation form, in an inference phase of the data-driven task, the method according to the second aspect comprises outputting for each of the one or more data elements a single representative data element of the one or more representative data elements.
[0033] In a further possible implementation form, in a training phase of the data-driven task, the method according to the second aspect comprises outputting for each of the one or more data elements the one or more representative data elements.
[0034] In a further possible implementation form, the method according to the second aspect further comprises generating the set of symmetrical operators based on an input set, e.g. an input list of symmetrical operators associated with the data-driven task, wherein the input set of symmetrical operators comprises the set of symmetrical operators and / or a further set, e.g. a further list of one or more further symmetrical operators.
[0035] In a further possible implementation form, the further set of further symmetrical operators defines a further mathematical group of symmetrical operators, wherein for each of the one or more data elements the further mathematical group of symmetrical operators does not define a representative data element. In a further possible implementation form, the method according to the second aspect comprises generating the set of symmetrical operators associated with the data-driven task based on the input set of symmetrical operators in an iterative manner.
[0036] In a further possible implementation form, generating the set of symmetrical operators associated with the data-driven task based on the input set of symmetrical operators in an iterative manner comprises iteratively determining for each symmetrical operator of the input set of symmetrical operators whether or not it is possible to determine for each orbit one or more of the plurality of images of the data element as representative data elements of the orbit.
[0037] In a further possible implementation form, in an interference phase of the data-driven task, the method according to the second aspect comprises outputting for each of the one or more data elements a single representative data element of the one or more representative data elements.
[0038] In a further possible implementation form, in a first processing stage the method according to the second aspect comprises determining for each of the one or more data elements a single representative data element based on the orbit of the data element defined by the group of symmetrical operators defined by the set of symmetrical operators and outputting the single representative data element.
[0039] In a further possible implementation form, in a second processing stage the method comprises for each of the one or more data elements augmenting the single representative data element provided by the first processing stage into one or more augmented representative data elements by mapping the single representative data element provided by the first processing stage to one or more data elements of the same orbit defined by the further mathematical group of symmetrical operators.
[0040] In a further possible implementation form, in a third processing stage the method according to the second aspect comprises for each of the one or more data elements determining for each of the one or more augmented representative data elements one or more further representative data elements based on a respective orbit defined by the group of symmetrical operators and outputting the one or more further representative data elements.
[0041] The method according to the second aspect can be performed by the data processing apparatus according to the first aspect. Thus, further features of the method according to the second aspect result directly from the functionality of the data processing apparatus according to the first aspect, as well as its different implementation forms described above and below.
[0042] According to a third aspect a computer program product is provided, comprising program code which causes a computer or a processor to perform the method according to the second aspect, when the program code is executed by the computer or the processor.
[0043] Details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.
[0044] BRIEF DESCRIPTION OF THE DRAWINGS
[0045] In the following, embodiments of the present disclosure are described in more detail with reference to the attached figures and drawings, in which:
[0046] Fig. 1 shows a schematic diagram illustrating a data processing apparatus according to an embodiment for pre-processing input data for a data-driven task implemented by a functional module; Fig. 2a shows a schematic diagram illustrating further details of a data processing apparatus according to an embodiment for pre-processing input data for a data-driven task;
[0047] Fig. 2b shows a flow diagram illustrating processing steps implemented by the data processing apparatus of figure 2a;
[0048] Fig. 3 shows a schematic diagram illustrating the mathematical concepts of group, orbit and representative element;
[0049] Fig. 4a shows a schematic diagram illustrating further details of a data processing apparatus according to a further embodiment for pre-processing input data for a data-driven task during a training phase;
[0050] Fig. 4b shows a flow diagram illustrating processing steps implemented by the data processing apparatus of figure 4a;
[0051] Fig. 5a shows a schematic diagram illustrating further details of a data processing apparatus according to a further embodiment for pre-processing input data for a data-driven task during an inference phase;
[0052] Fig. 5b shows a flow diagram illustrating processing steps implemented by the data processing apparatus of figure 5a;
[0053] Fig. 6a shows a schematic diagram illustrating a representative of a rotation operator;
[0054] Fig. 6b shows a schematic diagram illustrating a representative of a translation operator;
[0055] Fig. 6c shows a schematic diagram illustrating a representative of a scaling operator;
[0056] Fig. 6d shows a table summarizing properties of rotation, translation and scaling operators;
[0057] Fig. 6e shows a schematic diagram illustrating mapping to a representative for the example of radio map generation implemented by a data processing apparatus according to an embodiment; and
[0058] Fig. 7 shows a flow diagram illustrating steps of a data processing method according to an embodiment.
[0059] In the following, identical reference signs refer to identical or at least functionally equivalent features.
[0060] DETAILED DESCRIPTION OF THE EMBODIMENTS
[0061] In the following description, reference is made to the accompanying figures, which form part of the disclosure, which illustrates specific aspects of embodiments of the present disclosure or specific aspects in which embodiments of the present disclosure may be used. It is understood that embodiments of the present disclosure may be used in other aspects and comprise structural or logical changes not depicted in the figures. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims.
[0062] For instance, it is to be understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if one or a plurality of specific method steps are described, a corresponding device may include one or a plurality of units, e.g. functional units, to perform the described one or plurality of method steps (e.g. one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a specific apparatus is described based on one or a plurality of units, e.g. functional units, a corresponding method may include one step to perform the functionality of the one or plurality of units (e.g. one step performing the functionality of the one or plurality of units, or a plurality of steps each performing the functionality of one or more of the plurality of units), even if such one or plurality of steps are not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary embodiments and / or aspects described herein may be combined with each other, unless specifically noted otherwise.
[0063] F igure 1 shows a schematic diagram illustrating a data processing apparatus 110 according to an embodiment for pre-processing input data for a data-driven task implemented by a functional module 120. In an embodiment, the functional module 120 is configured to implement an Al or ML task as the data-driven task, for instance a neural network 120. In an embodiment, the data processing apparatus 110 may be implemented as an application server or another type of computational device capable of data processing. In an embodiment, the functional module 120 may be implemented by the same application server or a different computation device in communication with the data processing apparatus 110, for instance, via a wired and / or wireless connection. As illustrated in figure 1 and will be described in more detail below, according to embodiments disclosed herein the data processing apparatus 110 is configured to use data regularization and / or data augmentation techniques for prepreprocessing the input data and providing the pre-processed input data, i.e. the regularized input data to the functional module 120, e.g. neural network 120, for generating output data.
[0064] The data processing apparatus 110 may comprise processing circuitry and a communication interface, in particular a wireless or wired communication interface enabling communication, for instance, with the functional module 120. The processing circuitry may be implemented in hardware and / or software and may comprise digital circuitry, or both analog and digital circuitry. Digital circuitry may comprise components such as application- specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), or general-purpose processors. The data processing apparatus 110 may further comprise a memory configured to store executable program code which, when executed by the processing circuitry, causes the data processing apparatus 110 to perform the functions and methods described herein.
[0065] Likewise, the functional module 120 may comprise processing circuitry and a communication interface, in particular a wireless or wired communication interface enabling communication, for instance, with the data processing apparatus 110. The processing circuitry may be implemented in hardware and / or software and may comprise digital circuitry, or both analog and digital circuitry. Digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), or general-purpose processors. The functional module 120 may further comprise a memory configured to store executable program code which, when executed by the processing circuitry, causes the functional module 120 to perform the functions and methods described herein.
[0066] As will be described in the following under further reference to figures 2a and 2b, for pre-processing one or more data elements of the input data for the data-driven task implemented by the functional module 120, e.g. neural network 120, a control flow module 110a of the data processing apparatus 110 is configured to obtain information indicative of a set, e.g. a list I of symmetrical operators associated with the data-driven task. The list I of symmetrical operators defines a mathematical group of symmetrical operators, wherein for each of the one or more data elements an orbit of the data element with respect to the mathematical group of symmetrical operators is defined by a plurality of image elements of the data element by the set of symmetrical operators.
[0067] Figure 3 shows a schematic diagram illustrating the mathematical concepts of group, orbit and representative element mentioned above. As used herein, an invertible operator g(. ) is symmetrical with respect to the function (. ), if for all inputs x, f(x)' = f g x')'). A mathematical group (or group) is a set of invertible operators which is a closed under multiplication, such that the multiplication is associative, has a neutral element, and each element of the group has an inverse for multiplication. Symmetries of a function f form a group. An action, such as an action of a group G on an input set X, is a morphism from the group G to the group of bijections from X to X. An orbit O of an element x 6 X under the action of the group G on X, is defined as the set of all possible destinations of the element x under the group action, i.e. O = Gx = {g.x,g E G}. The representative elements (or representatives) of an orbit O are one or more elements of the orbit.
[0068] As further illustrated in figures 2a and 2b, a processing block 111a of the control flow module 110a of the data processing apparatus 110 is further configured to determine for each orbit one or more of the plurality of image elements of the data element as representative data elements of the orbit. Finally, a processing block 111b of a data flow module 110b of the data processing apparatus 110 is configured to output for each of the one or more data elements the one or more representative data elements to the functional module 120. More specifically, the processing block 111a of the control flow module 110a of the data processing apparatus 110 is configured to define orbits and one or more representative elements for each orbit based on the group G -=< l > on the input space X, i.e. the one or more input data elements (see step 201 of figure 2b). The processing block 111a is further configured to provide the representative elements and the list I to the data flow module 110b of the data processing apparatus 110 (see step 203 of figure 2b).
[0069] The processing block 111b of the data flow module 110b of the data processing apparatus 110 is configured to map to the representatives based on G:= {I), i.e. the processing block 111b maps each of the one or more input data elements to multiple of the representative(s) provided by the control flow module 110a based on the symmetrical operators in the list I (see step 205 of figure 2b). Moreover, the processing block 111b is configured to output for each of the one or more input data elements the one or more representative data elements to the functional module 120 (see step 207 of figure 2b).
[0070] Figures 4a and 4b show a further embodiment of the data processing apparatus 110 for pre-processing input data for a data-driven task. In the embodiment shown in figures 4a and 4b a processing block 113a of the control flow module 110a of the data processing apparatus 110 is configured to divide a set, e.g. list I of symmetrical operators associated with the data-driven task (which may be obtained by the desired behavior of the output, theoretical analysis, or derived by an Al block) into two lists and l2(see step 401 of figure 4b). The groups G, and G2are groups generated by and l2, respectively. In an embodiment, the processing block 113a operates iteratively with the processing block 111a already described above, i.e. the block referred to as “Defining orbits and representative(s) for each orbit based on G2■=< l2>” (see step 403 of figure 4b). After converging, the processing block 111a provides the list to the data flow module 110b of the data processing apparatus 110 (see step 405 of figure 4b). The convergence criteria in step 403 of figure 4b may be, whether the processing block 11 la, i.e. the processing block referred to as “Defining orbits and representative(s) for each orbit based on G2-=< l2>” is able to find the representative elements for all the orbits. Afterwards, the orbits and their representative elements are defined based on the action G2on the input space, i.e., the space of the input data (see step 407 of figure 4b). On the other hand, the data augmentation may be done based on the action group G, on the input space (see step 409 of figure 4b). In an embodiment, it is recommended that G, n G2= {identity} to avoid redundancy. As will be appreciated, as the group G2contains more operators, the computational complexity gets simpler.
[0071] As already mentioned above, the processing block 11 la of the control flow module 110a of the data processing apparatus 110, i.e. the block referred to as “Defining orbits and representative(s) for each orbit based on G2:=< l2> " is configured to define one or more elements in each orbit as the representative elements based on the action G2on the input space (see step 407 of figure 4b). As already further mentioned above, in the embodiment illustrated in figures 4a and 4b the processing block Illa operates iteratively with the processing block 113a, i.e. the processing block referred to as “Dividing the list of the operators into and I2”. In an embodiment, the processing block 111a is configured to ensure that each orbit has at least one representative element. After converging, it transmits the representatives and the list l2to the data flow module 110b. The processing block 11 la may work iteratively with the processing block 113a, i.e. the processing block referred to as “Dividing the list of the operators into and I2” to find the largest set of operators for G2.
[0072] The processing block 11 lb of the data flow module 110b of the data processing apparatus 110 illustrated in figure 4a, i.e. the processing block referred to as “Map to one of the Reps, based on G2:=< l2>” is configured to map the input data elements to one of the representatives provided by the processing block Illa based on the symmetrical operators in l2(see step 407 of figure 4b). Thus, in a first processing stage implemented by the processing block 111b of the data flow module 110b of the data processing apparatus 110 for each of the one or more data elements a single representative data element is determined based on the orbit of the data element defined by the group G2of symmetrical operators defined by the list l2of symmetrical operators.
[0073] The processing block 113b of the data flow module 110b of the data processing apparatus 110 illustrated in figure 4a, i.e. the processing block referred to as “Data augmentation by transferring to other orbits based on Gj” increases the training dataset by generating new data elements by transferring the given input to other orbits based on the action lists in group G, (see step 409 of figure 4b). The transferred points are still inside the orbit I2G. Thus, in a second processing stage implemented by the processing block 113b of the data flow module 110b of the data processing apparatus 110 for each of the one or more data elements the single representative data element provided by the processing block 111b is augmented into one or more augmented representative data elements by mapping the single representative data element to one or more data elements of the orbit defined by the further group G, of further symmetrical operators.
[0074] The processing blocks 115b of the data flow module 110b of the data processing apparatus 110 illustrated in figure 4a, i. e. the processing blocks referred to as “Map to one of the Reps, based on G2:=< l2>” are implemented in the same way as the processing block 111b of figure 4a. In other words, each of the processing blocks 115b is configured to map the input data elements (previously processed by the blocks 111b and 113b) to one of the representatives provided by the processing block 111a based on the symmetrical operators in l2(see step 411 of figure 4b). Thus, in a third processing stage implemented by the plurality of processing blocks 115b of the data flow module 110b of the data processing apparatus 110 for each of the one or more augmented representative data elements (provided by the processing block 113b) one or more further representative data elements are determined based on a respective orbit defined by the group G2of symmetrical operators. In step 413 of figure 4b the output from the processing blocks 115b, e.g. the enlarged training data set is provided to the functional module 120.
[0075] Figures 5a and 5b show a further embodiment of the data processing apparatus 110 for pre-processing input data for a data-driven task in the inference phase. In this embodiment this control flow module 110a is identical to the control flow module 110a of the data processing apparatus illustrated in figures 4a and 4b. In other words, the processing block 113a of the control flow module 110a of the data processing apparatus 110 is configured to divide a set, e.g. list I of symmetrical operators associated with the data-driven task (which may be obtained by the desired behavior of the output, theoretical analysis, or derived by an Al block) into two lists and l2(see step 501 of figure 5b). The groups G, and G2are groups generated by and l2, respectively. In an embodiment, the processing block 113a operates iteratively with the processing block 111a already described above, i.e. the block referred to as “Defining orbits and representative(s) for each orbit based on G2:=< l2>” (see step 503 of figure 5b). After converging, the processing block Illa provides the list to the data flow module 110b of the data processing apparatus 110 (see step 505 of figure 4b). The convergence criteria in step 503 of figure 5b may be, whether the processing block Illa, i.e. the processing block referred to as “Defining orbits and representative(s) for each orbit based on G2:=< l2>” is able to find the representative elements for all the orbits.
[0076] In the embodiment shown in figures 5a and 5b, the processing block 11 lb of the data flow module 110b of the data processing apparatus 110, i.e. the processing block referred to as “Map to one of the Reps, based on G2:=< l2>” is configured to map the input data elements to a single one (and not to one or more as in the embodiments described above) of the representative(s) given by the block “Defining orbits and representative(s) for each orbit based on G2” 11 la based on the symmetrical operators in the l2.
[0077] As will be appreciated, according to embodiments disclosed herein the data processing apparatus 110 employs a form of data regularization which is based on defining representative(s) for sets of invariant inputs and which is combined with a data augmentation approach. These two approaches are combined according to embodiments disclosed herein in such a way to get maximum benefits from each of them. The main advantages of the implemented form of data regularization over the data augmentation may be summarized as follows. In data augmentation, the model needs to learn the effects of the symmetrical operators which requires a longer process. Data augmentation consumes the learning resources to learn the symmetrical operators. The learned symmetrical operators in data augmentation may not be accurate. Data augmentation may need to increase the training dataset extensively, which increases the computational costs. On the other hand, data augmentation needs only to know the symmetrical operators to generate a larger training dataset, but the representative approach needs to define a space on which the group of detected symmetries acts, and identify representatives of the orbits, which is not always trivial.
[0078] In the following under further reference to figures 6a-e some examples for the representatives for common symmetrical operators are described for an illustrative data-driven task of generating a radio map including a transmitter, Tx, and a receiver, Rx. By way of example, the common symmetrical operators comprise rotation, translation, and scaling operators, which are symmetrical also for a wide range of other applications in telecommunication scenarios.
[0079] For generating the radio map of an environment, the output is invariant with respect to the rotation operator, which means that if the environment map (including the Tx and Rx locations illustrated in figure 6a) is rotated, the measured channel impulse response, CIR, at the Tx location stays unchanged. As this feature does not have an impact on the desired output, this feature may be removed by considering a representative for each set of inputs which can be transformed to each other by only rotation. This may be done by, firstly, defining one or more coordinate vector(s) btand, secondly, setting the angle between the vectors v (between two specific points) and the coordinate vector(s) to a fixed value by rotating the input map. For a 2D map, a v between two points (e.g., Tx and Rx) are enough and then the map may be rotated by RQ such that the angles between R(v) and the coordinate vector(s) bfbe always fixed (e.g., it be 0). For a 3D map, two vectors v and w which are not colinear are enough and then the map may be rotated by R(j such that the angles between R(v) (resp. R(w)) and the coordinate vector(s) btare always fixed. For a multi-user case, the vectors v and w may be considered to be based on three of the users.
[0080] As will be appreciated, for a two-dimensional Cartesian coordination system, the rotation function may be represented in the following way:
[0081] Dzn> rcos 0 —sin 01
[0082] ^ “ Lin0 cos © J'
[0083] Therefore, the rotation of point a = [ ] is anelv= R(0)a = rco.s 0 —sin 01
[0084] Lsin0 cos 0 1171'
[0085] For a three-dimensional Cartesian coordination system, the rotation function may be represented as: Rx(0) = 1 0 0 cos 0 0 sin 0 cos 0 —sin 0 O
[0086] 0 cos 0 —sin 0, RyW = 0 1 0 and Rz(0) = sin 0 cos 0 0
[0087]
[0088] 0 sin 0 cos 0 —sin 0 0 cos 0 0 0 1
[0089] As illustrated in figure 6b, the translation operator is a symmetrical operator for radio map generation. To find a representative for the orbits based on the translation operator, the following procedure may be implemented. Firstly, one or more coordinate vector(s) btand the origin point of the affine plane are defined. Secondly, the location of a specific point is fixed (e.g., by moving the Tx to the origin point), and all the points are translated accordingly. As will be appreciated, for translating therxvl
[0090] points in the environment by the vector v = L / vJ it is sufficient to map the points to new points by the following operation:anew = a — v = [ “] — [ ”], where a is an arbitrary point in the environment.
[0091] I I L / v
[0092] As illustrated in figure 6c, scaling, i.e. reducing or increasing the size according to a common scale, is a symmetrical operator for radio map generation, if the absolute value of the channel impulse response is not demanded. A representative for the orbits based on scaling operator may be defined in the following way. Firstly, one or more coordinate vector(s) bfand units may be defined. Secondly, the distance between some specific points (e.g., Tx and Rx) is fixed and all the points are scaled, accordingly. With respect to the second step it may be assumed that the distance between the transmitter (Tx) and the receiver (Rx) is D and in the representative case the distance between Tx and Rx is Ds, so that the input may be mapped to the representative by [X Ct 1 D
[0093]
[0094] y ] * 77'
[0095] The table shown in figure 6d summarizes the above disclose for the rotation, translation and scaling operators for a two- dimensional map.
[0096] Figure 6e shows a schematic diagram illustrating mapping to a representative for the example of radio map generation implemented by the data processing apparatus 110 according to an embodiment. The following operators for radio map generation, when the absolute value of the received signal strength is not necessary, are symmetrical: rotation, translation, reciprocity, flipping, and scaling. For this use-case the symmetrical operators may be divided by the data processing apparatus 110 (as described above in the context of figures 4a and 4b) into the following two lists:
[0097] = {Flipping, Reciprocity}, and l2= {Rotation, Translation, Scaling}
[0098] A representative for the orbits defined based on the symmetrical operators in the list l2, is to fix the Rx and Tx to two specific points (e.g., Tx at [ and Rx at ^j). In an embodiment, as illustrated in figure 6e, the data processing apparatus 110 may implement the following procedure for transforming a random map with random location of the Tx and the Rx to a representative by using the symmetrical operators in the l2:
[0099] 1. Translate the input map such that Tx is mapped to the origin.
[0100] 2. Rotate it such that the Rx is mapped to the positive part of the x-axis.
[0101] 3. Scale the map, such that the distance between Tx and Rx becomes to two.
[0102] 4. Translate the map one unit to the negative direction of the x-axis. In the data augmentation part, by using the symmetrical operators in the data processing apparatus 110 may generate a
[0103]
[0104] different copy of the training data.
[0105] Figure 7 shows a flow diagram illustrating a method 700 according to an embodiment for pre-processing one or more data elements for a data-driven task implemented, for instance, by the functional module 120 of figure 1. The method 700 comprises a step 701 of obtaining a set of symmetrical operators associated with the data-driven task, wherein the set of symmetrical operators defines a group of symmetrical operators and wherein for each of the one or more data elements an orbit of the data element with respect to the group of symmetrical operators is defined by a plurality of image elements of the data element by the set of symmetrical operators. Moreover, the method 700 comprises a step 703 of determining for each orbit one or more of the plurality of image elements of the element as the representative data elements of the orbit. The method 700 further comprises a step 705 of outputting for each of the one or more data elements the one or more representative data elements.
[0106] The method 700 can be performed by the data processing apparatus 110. Thus, further features of the method 700 result directly from the functionality of the data processing apparatus 110 as well as its different implementation forms and embodiments described above and below.
[0107] The person skilled in the art will understand that the "blocks" ("units") of the various figures (method and apparatus) represent or describe functionalities of embodiments of the present disclosure (rather than necessarily individual "units" in hardware or software) and thus describe equally functions or features of apparatus embodiments as well as method embodiments (unit = step).
[0108] In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described embodiment of an apparatus is merely exemplary. For example, the unit division is merely logical function division and may be another division in an actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
[0109] The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
[0110] In addition, functional units in the embodiments of the invention may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.
Claims
CLAIMS1. A data processing apparatus (110) for pre-processing one or more data elements for a data-driven task, wherein the data processing apparatus (110) is configured to:obtain a set of symmetrical operators associated with the data-driven task, wherein the set of symmetrical operators defines a group of symmetrical operators and wherein for each of the one or more data elements an orbit of the data element with respect to the group of symmetrical operators is defined by a plurality of image elements of the data element by the set of symmetrical operators;determine for each orbit one or more of the plurality of image elements of the data element as representative data elements of the orbit; andoutput for each of the one or more data elements the one or more representative data elements.
2. The data processing apparatus (110) of claim 1, wherein, in an inference phase of the data-driven task, the data processing apparatus (110) is configured to output for each of the one or more data elements a single representative data element of the one or more representative data elements.
3. The data processing apparatus (110) of claim 1, wherein, in a training phase of the data-driven task, the data processing apparatus (110) is configured to output for each of the one or more data elements the one or more representative data elements.
4. The data processing apparatus (110) of any one of the preceding claims, wherein the data processing apparatus (110) is configured to generate the set of symmetrical operators associated with the data-driven task based on an input set of symmetrical operators, wherein the input set of symmetrical operators comprises the set of symmetrical operators and / or a further set of one or more further symmetrical operators.
5. The data processing apparatus (110) of claim 4, wherein the further set of one or more further symmetrical operators defines a further group of symmetrical operators and wherein for each of the one or more data elements the further group of symmetrical operators does not define a representative data element.
6. The data processing apparatus (110) of claim 4 or 5, wherein the data processing apparatus (110) is configured to generate the set of symmetrical operators associated with the data-driven task based on the input set of symmetrical operators in an iterative fashion.
7. The data processing apparatus (110) of claim 6, wherein, for generating the set of symmetrical operators associated with the data-driven task based on the input set of symmetrical operators in an iterative fashion, the data processing apparatus (110) is configured to iteratively determine for each symmetrical operator of the input set of symmetrical operators whether or not it is possible to determine for each orbit one or more of the plurality of image elements of the data element as one or more representative data elements of the orbit.
8. The data processing apparatus (110) of any one of claims 4 to 7, wherein, in an interference phase of the data-driven task, the data processing apparatus (110) is configured to output for each of the one or more data elements a single representative data element of the one or more representative data elements.
9. The data processing apparatus (110) of any one of claims 4 to 7, wherein in a first processing stage the data processing apparatus (110) is configured to determine for each of the one or more data elements a single representative data element basedon the orbit of the data element defined by the group of symmetrical operators defined by the set of symmetrical operators and to output the single representative data element.
10. The data processing apparatus (110) of claim 9, wherein in a second processing stage the data processing apparatus (110) is configured for each of the one or more data elements to augment the single representative data element provided by the first processing stage into one or more augmented representative data elements by mapping the single representative data element provided by the first processing stage to one or more data elements of the orbit defined by the further group of further symmetrical operators.
11. The data processing apparatus (110) of claim 10, wherein in a third processing stage the data processing apparatus (110) is configured for each of the one or more data elements to determine for each of the one or more augmented representative data elements one or more further representative data elements based on a respective orbit defined by the group of symmetrical operators and to output the one or more further representative data elements.
12. A method (700) for pre-processing one or more data elements for a data-driven task, wherein the method (700) comprises:obtaining (701) a set of symmetrical operators associated with the data-driven task, wherein the set of symmetrical operators defines a group of symmetrical operators and wherein for each of the one or more data elements an orbit of the data element with respect to the group of symmetrical operators is defined by a plurality of image elements of the data element by the set of symmetrical operators;determining (703) for each orbit one or more of the plurality of image elements of the element as the representative data elements of the orbit; andoutputting (705) for each of the one or more data elements the one or more representative data elements.
13. The method (700) of claim 12, wherein, in an inference phase of the data-driven task, the method (700) comprises outputting for each of the one or more data elements a single representative data element of the one or more representative data elements.
14. The method (700) of claim 12, wherein, in a training phase of the data-driven task, the method (700) comprises outputting for each of the one or more data elements the one or more representative data elements.
15. The method (700) of any one of claims 12 to 14, wherein the method (700) further comprises generating the set of symmetrical operators associated with the data-driven task based on an input set of symmetrical operators associated with the data-driven task, wherein the input set of symmetrical operators comprises the set of symmetrical operators and / or a further set of one or more further symmetrical operators.
16. The method (700) of claim 15, wherein the further set of symmetrical operators defines a further group of further symmetrical operators and wherein for each of the one or more data elements the further group of further symmetrical operators does not define a representative data element.
17. The method (700) of claim 15 or 16, wherein the method (700) comprises generating the set of symmetrical operators associated with the data-driven task based on the input set of symmetrical operators in an iterative fashion.
18. The method (700) of claim 17, wherein generating the set of symmetrical operators associated with the data-driven task based on the input set of symmetrical operators in an iterative fashion comprises iteratively determining for eachsymmetrical operator of the input set of symmetrical operators whether or not it is possible to determine for each orbit one or more of the plurality of image elements of the data element as representative data elements of the orbit.
19. The method (700) of any one of claims 15 to 18, wherein, in an interference phase of the data-driven task, the method (700) comprises outputting for each of the one or more data elements a single representative data element of the one or more representative data elements.
20. The method (700) of any one of claims 15 to 19, wherein in a first processing stage the method (700) comprises determining for each of the one or more data elements a single representative data element based on the orbit of the data element defined by the group of symmetrical operators defined by the set of symmetrical operators and outputting the single representative data element.
21. The method (700) of claim 20, wherein in a second processing stage the method (700) comprises for each of the one or more data elements augmenting the single representative data element provided by the first processing stage into one or more augmented representative data elements by mapping the single representative data element provided by the first processing stage to one or more data elements of the orbit defined by the further group of further symmetrical operators.
22. The method (700) of claim 21, wherein in a third processing stage the method (700) comprises for each of the one or more data elements determining for each of the one or more augmented representative data elements one or more further representative data elements based on a respective orbit defined by the group of symmetrical operators and outputting the one or more further representative data elements.
23. A computer program product comprising a computer-readable storage medium for storing program code which causes a computer or a processor to perform the method (700) of any one of claims 12 to 22 when the program code is executed by the computer or the processor.