Data processing method and related device
A technology of data processing and attribute data, applied in the field of artificial intelligence, can solve the problem of inability to guarantee the consistency of sample data distribution, and achieve the effect of balanced data distribution
Pending Publication Date: 2022-01-28
PING AN TECH (SHENZHEN) CO LTD
0 Cites 0 Cited by
AI-Extracted Technical Summary
Problems solved by technology
However, the above method cannot guarantee that the sample data distr...
Method used
[0089] In the embodiment of the present application, in the case that the distribution characteristics of the attribute data of the first sample group and the second sample group do not meet the equilibrium condition, determine the value of each attribute data in the first sample group and the second sample group Difficulty score, determine the moving influence score of each sample in the first sample group and the second sample group based on the difficulty score, exchange the K samples with the lowest moving influence in the two sample groups, and obtain The updated two sample groups, and continue to judge whether the attribute data distribution characteristics of the updated two sample groups meet the equilibrium condition, if not,...
Abstract
The invention relates to the technical field of artificial intelligence, and discloses a data processing method and a related device. The method comprises the following steps: obtaining first sample groups and second sample groups, and under the condition that attribute data distribution characteristics of the two sample groups do not meet the equilibrium condition, determining the difficulty score of each kind of attribute data in the two first sample groups, determining a movement influence score of each sample in the two sample groups based on the difficulty score, exchanging K samples with the lowest movement influence in the two sample groups to obtain two updated sample groups, continuing to judge whether the attribute data distribution characteristics of the two updated sample groups meet the equilibrium condition or not, and if not, continuously updating the two sample groups until the equalization condition is met, and outputting the two updated sample groups meeting the equalization condition. According to the invention, the data distribution inconsistency degree of the two sample groups can be reduced as much as possible, so that the data distribution of the two sample groups is balanced.
Application Domain
Character and pattern recognitionResources
Technology Topic
Sample groupData processing +3
Image
Examples
- Experimental program(1)
Example Embodiment
[0056] The present application will be described in detail below with reference to the accompanying drawings.
[0057] The terms used in the following examples described below are merely intended to describe a particular embodiment, and is not intended to be a limitation of the present application. As used in the specification of this application and the appended claims, the singular expression form "one", "one", "said", "above", "" "and" this "aim Includes complex expression form unless otherwise indicated in its context.
[0058] In the present application, "at least one (item) refers to one or more," multiple "refers to two or more," at least two (items) refers to two or three and three. These, "and / or", is used to describe the association relationship of the associated object, indicating that there are three relationships, for example, "a and / or b" can be represented: there is only A, only b and simultaneous existence A and B three The case, where A, B can be single or plural. Character "/" generally means that the associated object is a "or" relationship. "The following at least one (one)" or its similar expression means any combination in these items. For example, at least one of A, B or C, may represent: A, B, C, "A and B", "A and C", "B and C", or "A and B and C. ".
[0059] The present application embodiment can acquire and process the relevant data based on manual intelligence technology. Among them, Artificial Intelligence (AI) is the use of digital computer or digital computer-controlled machine simulation, extension and expansion of people's intelligence, perceptual environment, access to knowledge, and use knowledge to achieve best results, methods, techniques, and application systems. .
[0060] Artificial intelligence basic techniques generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, large data processing technology, operation / interactive system, and electromechanical integration. Artificial intelligence software technology mainly includes computer visual technology, robotics, biometric technology, speech processing technology, natural language processing technology, and machine learning / deep learning.
[0061] The present application embodiment provides a data processing method and related apparatus, and is further introduced in order to more clearly describe the disclosure of the present application.
[0062] See figure 1 , figure 1 A flow diagram of a data processing method provided by the embodiment of the present application embodiment. like figure 1 As shown, the method can include step 110 to step 140.
[0063] Step 110, acquire the first sample group and the second sample group, the first sample group and the second sample group each comprise a plurality of samples, each sample in the plurality of samples includes a variety of properties. data;
[0064] In one embodiment, the process of acquiring the first sample group and the second sample is as follows:
[0065] First obtain the sample set, sample the sample set by the preset sampling method, obtain a plurality of samples after sample, including simple random sampling or layered sampling. The sample after the sample is then divided into two groups, and the first sample group and the second sample group are obtained. Among them, the data in the sample set is structured, that is, each sample in the sample set has a variety of properties, such as in the marketing policy test scenario, one sample can refer to a user's related information, and a variety of samples The attribute can refer to information such as the user's gender, age, insurance period. The first sample and the second sample group are obtained according to the sample set, so the first group and the second sample group have multiple samples in the sample set, and the first sample group includes more Multiple samples included in the sample and the second sample, each sample in the two sample groups includes a variety of attribute data, for example, the first group is {user 1 (A attribute data 1, b attribute data) 1), user 2 (a attribute data 2, b attribute data 2)}, the second sample group is {User 3 (A attribute data 3, b attribute data 3), user 4 (A attribute data 4, B attribute data 4 . It will be appreciated that in one sample group, if a sample corresponds to a row of data, then a attribute data can have a column of data in the sample group. Conversely, if a sample corresponds to a column data, an attribute data may have a row of data in the sample group.
[0066] Step 120: Determining whether the attribute data distribution feature of the first sample group and whether the attribute data distribution feature of the second sample is satisfied, and the first sample is determined without satisfying the equalization condition. The difficulty score of each attribute data in this group and the difficulty score of each attribute data in the second sample group, the difficulty score is related to the degree of discretity of the attribute data.
[0067] In one embodiment, if the distribution of certain attribute data in the first group and the second sample is large, the data distribution of the two sample groups can be considered unbalanced, and the two sample groups cannot be guaranteed. The accuracy and reliability of the experimental results of the control test, therefore requires two sample groups to make both data distribution balance, even if the distribution difference of the attribute data of both meet the experimental requirements. Specifically, it is determined that the attribute data distribution feature of the first sample group is to satisfy the equalization condition, and may include the following steps:
[0068] First, it is determined that the distribution characteristics of each attribute data in the first sample group and distribution characteristics of each attribute data in the second sample group are determined. In this embodiment, the distribution feature of one attribute data may refer to the value distribution characteristics of all attribute data indicating the same attribute in the sample group, for example, the first In the same way, 100 samples are included in this group, the gender data of the sample is one of the attribute data, and the number of samples in the first group (can be represented by value 1) is 30, and the gender is male (can be used The number of samples represented by value 0 is 70, then the distribution characteristics of the attribute data in the first sample group may be: 1,30%; 0,70%. The distribution feature refers to the total of 2 values 1, 0, and 30%, 70%, respectively, corresponding distribution, respectively.
[0069] Next, if the first sample group and the second sample group indicate that the distribution feature of the attribute data of the same attribute satisfies the difference conditions, the attributes represented by the attribute data are determined as the target attribute, and the target properties are acquired. The ratio of the number of the target properties and the number of the number of the number of attribute data included in the first sample group is calculated. In this embodiment, the difference condition can be set according to the comparative trial, and if the test requirements are strict, the distribution of the two sets of data is required as consistent, the difference conditions are set to relatively strict conditions. . Exemplary, this difference condition can be set to any of the value distribution of attribute data is greater than 2%. If the distribution of a certain attribute data of the first group and the second sample is P, 20%, q, 30%, R, 50%; and, p, 19%, q, 32%, r, 49 %, That is, the difference between the value distribution characteristics of the corresponding attribute data is 1%, - 2%, 1%, and the distribution feature of the attribute data of the two sample groups can be determined that the difference is not satisfied, and the two sets of data can also be considered. The data distribution of this attribute data is consistent; while the distribution of the distribution of attribute data is P, 0%, Q, 44%, R, 56%; and, P, 3%, Q, 43%, R, 54%, That is, the difference between the value distribution characteristics of the corresponding attribute data is -3%, - 2%, 1%, and the distribution feature of the attribute data of the two sample groups can be determined to meet the difference conditions, and the attribute of the two sets of data can also be considered. The data distribution of the data is inconsistent, so the property indicated by the attribute data can be determined as the target property. Compared with the number of target attributes and the number of properties data included in the first sample group, the obtained ratio can represent the inconsistent extent of the attribute data distribution of the two sample groups. The above ratios and preset thresholds are compared, and the attribute data distribution feature of the first sample group and the second sample may satisfy the equalization condition according to the comparison result. This preset threshold can be set according to the actual needs.
[0070]Specifically, in a case where the ratio is less than a predetermined threshold value, determining the distribution of the attribute data of a first group of samples of the equilibrium condition and distribution characteristics of the attribute data satisfies a second group of samples; in the ratio is greater than the predetermined threshold value, determining said data distribution characteristic properties of the first sample set of attributes of the data distribution characteristics of a second group of samples does not satisfy the equilibrium condition. In the case where the ratio is equal to a preset threshold, according to the experiment stringent, or determined to satisfy the equilibrium condition is not satisfied.
[0071] Thus, the distribution of the data attribute determines the first sample set with the distribution of the attribute data of a second group of samples satisfies the equilibrium conditions and, for the first sample and second sample set is processed to distribution such that two data groups to be consistent with the sample. Specifically, it is possible to determine the difficulty level of the first sample group and a second score sample set of each attribute data. The degree of difficulty of the discrete attribute data associated with the scores, the attribute data described below to determine the ease embodiment In one embodiment of the process of scoring:
[0072] (1) for each sample of said first attribute data set, determining a plurality of feature data of each of said attribute data, and combinations of the features of each of the plurality of data attribute data as a vector to obtain multiple vectors, wherein one of said first attribute data set corresponds to a sample vector; alternatively, the plurality of data comprises a number of features of the following features: the variance of the distribution of the attribute data value, the attribute data Unique number value, saturation standard deviation attribute data and attribute data (i.e. the ratio of the total number of blank value with the sum of all the values of non-), poor distribution attribute data value, the attribute data value of the distribution is taken value interquartile distribution. Appreciated, if each of the first attribute data set corresponding to one sample, wherein the above-described plurality of data values corresponding to the column refers to the variance of the distribution, column saturation. And if the first sample group corresponding to one row of each attribute data, wherein the plurality of data corresponding to the above-described means the variance value of the distribution line, the line of saturation, and the like.
[0073] (2) for clustering the plurality of vectors to obtain a cluster of one or more clusters; k-means clustering algorithm can be used (k-means clustering algorithm, k-means) clustering algorithm to complete. Clustering algorithm is an unsupervised machine learning methods, and its role is similar to the sample automatically go into a category. And k-means clustering algorithm is a very common and effective clustering algorithm, the specific process of clustering the plurality of vectors using k-means clustering algorithm may comprise the steps of 1) - Step 4):
[0074] 1) from said plurality of vectors to determine the center point of the k initial vector, referred to as C 1 , C 2 , ..., C k , The same dimensions for each feature vector. k is an integer equal to or greater than 1.
[0075] 2) were calculated for each vector to the vector distance from the center point of the k, and the vector included in the set of nearest corresponding center point. Thus, each vector will enter a cluster cluster.
[0076] 3) For each cluster, cluster, calculates the coordinates of the average of all the vectors cluster includes cluster, obtain a new center point vector C 1 , C 2 , ..., C k.
[0077] 4) repeating step 2) and step 3), the center point of convergence, or until the maximum number of iterations, to obtain the vector k clusters and clusters each cluster contains a cluster.
[0078] (3) determining one or more clusters of the clusters average degree of dispersion of clusters each cluster, and based on an average of the degree of dispersion of clusters each cluster, determining the first sample of each set of difficulty score data, the average degree of dispersion is positively correlated to the difficulty level clades score vector corresponding to the attribute data belong. In one embodiment, the average degree of dispersion of clusters each cluster and its degree of dispersion of the vector comprises attribute data corresponding to the relevant, and therefore, the cluster for each cluster one or more clusters of clusters, determining the clustering said clustering vectors corresponding attribute data included, based on the features of said plurality of data vectors corresponding to the cluster of cluster includes attribute data, determine an average degree of dispersion of the clusters in the cluster. Exemplary, a weighted average may be characterized by the data of each cluster vector corresponding cluster includes attribute data, an average degree of dispersion of the cluster clustering. The feature data may include a degree of dispersion of the distribution of values (e.g., poor, interquartile range, standard deviation, etc.), the number of unique values of the column values of the variance of the distribution and the like. Appreciated, if the larger discrete clades average degree of distribution that the distribution and another sample set of vectors corresponding to the cluster of attribute data in the corresponding cluster includes attribute data of a sample group made consistent difficulty the larger, therefore, the difficulty level may be determined for each attribute data rates based on the average degree of dispersion of clades. Specifically, the average degree of dispersion can, for the first sample set of one or more clusters associated clusters are sorted according to the ranking result, determine a first score sample set the difficulty level of each attribute data, the difficulty rating to a certain extent the attribute data may indicate that the two sample groups balanced distribution becomes difficult, the greater the degree of difficulty rating, so that two sets of data distribution becomes larger balancing difficulty.
[0079] Each attribute data to obtain a second sample set degree of difficulty scores obtained in the same procedure as difficulty rating for each attribute data during a first sample group.
[0080] Step 130, based on the difficulty level of the first sample rates set for each attribute data, determining the first sample in each sample set of mobile impact scores, and each of said second group of samples based on species attribute data difficulty rating, the second sample set is determined for each sample moving impact score, the score for the moving impact moving the sample data representing the distribution of the properties of the sample group a sample belongs the total impact characteristics.
[0081] In one embodiment, after each attribute data rates difficulty in obtaining the sample set, the first sample may be determined further set of samples and the second group of each sample on the basis of the degree of difficulty rating Effect of movement score, and processes the first and second sample group of samples based on the movement of impact scores, such that the attribute data in the two samples as much as possible uniform distribution. In this embodiment, the first group of samples to determine the influence of the movement of each sample scoring process comprises the steps of: for any one sample of said first sample set, each of said sample is acquired distribution of seed attribute data value, based on the distribution of values for each attribute data set in the first sample the degree of difficulty of each attribute data rates, and the sample, determining the influence of the movement of the sample score. Wherein the distribution of a value of the attribute data of each sample, the specific value of the single attribute may include data samples represents the total number of repetitions of the same kinds of attributes of the attribute data in the first group of samples in the sample set ratio, and the number of different values of the kind of attribute data. When the first sample of a sample group corresponding to one line, an attribute corresponding to a data, the attribute data difficulty rating as A j , (J represents the total number of columns of the first set of samples), the influence of the movement of the sample rates referred to as P i (I represents the number of rows of the first group of samples), the relationship between the two can be expressed by the equation (1):
[0082]
[0083] Where P i For the i-th row moving impact score, A j For the difficulty column j score, K j J is the number of columns of different values, S i,j Sample for such i-th row j-th column value of the ratio of the number of repetitions of the j-th row and the j-th column of the total number of all samples.
[0084] Specifically, for the first sample set, for each row within the first set of samples, the influence of the movement of the line is the difficulty rating score column in the row distribution value of each data with the data of the column where ratio and, in general, the distribution of feature values in a column of a row of data in a product, the total value of the number of data corresponding to the total number of data columns of the data resides (S i,j ) Higher, and the less the number of the column where the value of the data, then the line should be moved lower impact score, the score may affect the movement of the movable row of sample data characterizing the distribution of the attribute of the first group of samples the total size of the impact feature.
[0085] Appreciated, based on the difficulty level of the second sample rates set for each attribute data, the process of determining the second sample set of the mobile impact score for each sample, specific reference may determine the first sample Effect of the group during movement score for each sample. That is to say, the movement of each sample to get a mobile impact score, the processing of two sample groups are similar to the steps described in the first sample set, for example, are not repeated here.
[0086] Step 140, respectively, affect the movement of the lowest first sample group and said second group of samples in exchange rates K samples to obtain a first sample set and a second set of samples after update after the update, the said K being a positive integer; the equilibrium condition is data distribution characteristic distribution of the attribute data set of the first attributes of the sample and determine an updated and the updated second group of samples meet the first until the updated as distribution of data samples a second set after the distribution and update attribute attribute data of the present group satisfies the updated first set of samples and the following equilibrium conditions, the equilibrium condition of the output meets update sample group II.
[0087] In one embodiment, the sample set is determined for each sample after the first movement and the second impact score for each sample in the sample set moving impact score, moving the first sample set of minimal impact score exchange of K samples and a second sample set of the lowest ratings affect movement K samples, the first sample set is updated and the updated second set of samples. And, after the first sample and the second sample set of the set of update obtained after updating, re-determining the equilibrium condition of distribution of data 120 is two sample groups of attributes updated satisfies the above steps, if not satisfied, then continue to identify two sample groups updated in each sample of mobile impact scores, respectively, and will affect the movement of the lowest ratings K samples to be exchanged, the update is complete again, and determine whether the conditions meet iterations stop, the stop iterative equalization comprising the above-described condition; attribute data and two sets of samples if the update distribution satisfying the equilibrium condition in step 120, i.e. the first and second sample group of samples is updated (i.e. complete after the sample exchange) determining the iteration stopping condition is satisfied, the iteration is stopped, output meets a first sample set and a second set of samples after updating the equilibrium condition, which can be determined to two samples of the control experiments. Optionally, the stop iteration condition may also include other conditions, for example, to determine the number of iterations is satisfied iterative threshold, this iterative threshold may be determined according to the actual, if the number of iterations is determined to meet the iterative threshold, even if does not satisfy the equilibrium condition, iteration may be stopped, the output of the last update of the first and second sample group of samples obtained.
[0088]It can be understood that the K samples of moving influenza rating in two sample groups are respectively exchanged, and in fact, from the two sample groups to determine the amount of samples having a low degree of density of respective attribute data distribution characteristics. Switching, by sample exchange can minimize the distribution of two sample groups, so that the distribution of both is equal to as possible. Among them, k is a positive integer, that is, K can be 1, that is, each time the first sample group and the second sample group are updated, only one sample of each of the respective moves in the two sample groups is mutually If you exchange, you can complete the update, so that the accuracy of the process of updating the two sample groups can be improved, and each time you find the most appropriate sample for exchange, avoid the addition of excessive exchange samples that make it updated. The distribution deviation of the sample group does not converge, thereby increasing the accuracy of data processing. Further, the K can also be more than 1 integer, and specifically, the actual value of K is determined according to the size of the sample number of the first sample group and the second sample group, i.e., each time the first sample group and When the second sample is updated, multiple samples of each of the respective moves in the two sample groups have a corresponding exchange, which is completed, so that the number of samples that can include the number of samples included in the two sample groups, once Exchange multiple samples, improve the degree of variation of data distribution characteristics of a single sample group, so that the degree of data distribution of two sample groups is inconsistent as soon as possible to improve data processing efficiency.
[0089] In the present application embodiment, in the case where the attribute data distribution feature of the first sample group and the second sample does not satisfy the equalization conditions, it is difficult to determine the difficulty of each attribute data in the first sample group and the second sample. The score is scored, and the mobile influencing score of each sample in the first sample group and the second sample group is determined based on the difficulty score. Two sample groups, and continue to determine whether the attribute data distribution feature of the two sample groups after the update meets the equalization conditions, if you do not meet the continued two sample groups, until the updated two sample groups attribute data distribution characteristics Something is satisfied. By performing data processing by the two sample groups that do not satisfy the balanced condition, you can find the sample (i.e., the minimum mobile influence score minimum) in the two sample groups of distribution is found. Equilibrium conditions can minimize the inconsistency of the data distribution of the two sample groups, thereby so as to make the data distribution of the two sample groups.
[0090] See figure 2 , figure 2 A structural diagram of a data processing apparatus provided in the present application embodiment. like figure 2 As shown, the data processing apparatus can include:
[0091] The acquisition unit 10 is used to acquire the first sample group and the second sample group, the first sample group and the second sample group each comprise a plurality of samples, each sample in the plurality of samples A variety of attribute data;
[0092] The first determination unit 11 is configured to determine if the attribute data distribution feature of the first sample group is determined if the attribute data distribution feature of the second sample is satisfied. The difficulty score of each attribute data in the first sample set and the difficulty score of each attribute data in the second sample group, the difficulty score is related to the degree of discretization of attribute data;
[0093] The second determination unit 12 is configured to determine the movement of each sample in the first sample in the first sample group based on the first sample group, based on the first sample. The difficulty score of each attribute data in the second sample group determines the movement of each sample in the second sample, which is used to represent the sample to which the sample belongs to the sample. The level of the total influence of the attribute data distribution characteristics of the group;
[0094] The update unit 13 is used to exchange the first sample group with the minimum K sample with the second sample of the second sample, and obtain the updated first sample group and the updated second sample. Group, the K is a positive integer; and determines whether the attribute data distribution feature of the updated first sample group and the updated second sample group of the attribute data distribution feature satisfies the equalization conditions until the update The attribute data distribution feature of the present group is satisfied with the equalization condition with the attribute data distribution feature of the updated second sample group, which outputs the first sample group and the update that satisfies the equalization condition. The second sample group.
[0095] In a possible design, the first determining unit 11 is specifically used:
[0096] Determining the distribution characteristics of each attribute data in the first sample, and distribution characteristics of each attribute data in the second sample group;
[0097] If the distribution feature of the attribute data of the same attribute is indicated in the first sample group and the second sample, the attribute indicated by the attribute data is determined as the target attribute, and the number of target properties is obtained. And the ratio of the number of the target attribute and the number of the number of the plurality of attribute data included in the first sample group;
[0098] In the case where the ratio is less than the preset threshold, it is determined that the attribute data distribution feature of the first sample group and the attribute data distribution feature of the second sample this group satisfies the equalization condition;
[0099] In the case where the ratio is greater than the preset threshold, it is determined that the attribute data distribution feature of the first sample group and the attribute data distribution feature of the second sample group do not satisfy the equalization conditions.
[0100] In a possible design, the first determining unit 11 is specifically used:
[0101] For each attribute data in the first sample group, a plurality of data features of each of the attribute data are determined, and a plurality of data features of each attribute data are combined with a vector, and multiple vectors are obtained. Among them, one of the first sample groups correspond to one vector;
[0102] Clustering the plurality of vectors, obtaining one or more clusterings;
[0103] Determine the degree of average dispersion of each of the clusters in the one or more cluster clusters, and determine the difficulty of each of the first sample groups based on the degree of average discretization of each of the clusters. Easy score, the difficulty score is positively correlated with the degree of aggregation of the cluster of the cluster of the vectors corresponding to the attribute data.
[0104] In a possible design, the plurality of data features include a number of features in the following features: the variance of the attribute data, the value of the attribute data is not repeated, the attribute data is saturated, the attribute data takes the value The spread of the distribution, the standard deviation of the property data value distribution, and the quarter difference distribution of attribute data.
[0105] In a possible design, the first determining unit 11 is specifically used:
[0106] For each of the clusters of the one or more cluster clusters, the attribute data corresponding to the vector included in the cluster cluster is determined, and the plurality of attribute data corresponding to the vector included in the cluster cluster. The data characteristics determines the average discrete degree of the cluster cluster.
[0107] In a possible design, the second determining unit 12 is specifically used for:
[0108] For any of the first sample group, the distribution feature value of each attribute data in the sample is acquired, based on the difficulty score of each attribute data in the first sample group, and the The distribution feature value of each attribute data in the sample is determined to determine the moving influencing score of the sample.
[0109] In a possible design, the acquisition unit 10 is specifically used for:
[0110] The sample set is obtained, and the sample set is sampled using a preset sampling method to obtain a plurality of samples after sampling, the preset sampling method comprising simple random sampling or layered sampling;
[0111] The plurality of samples after the sample are divided into two groups to obtain the first sample group and the second sample group.
[0112] in, figure 2 Specific description of the embodiment of the device shown can refer to the aforementioned figure 1 Specific description of the method embodiment shown in the method shown is not described herein.
[0113] Please refer to image 3 , image 3 A structural diagram of an electronic device provided by the embodiment of the present application, such as image 3 As shown, the electronic device 1000 can include: at least one processor 1001, such as a CPU, at least one communication interface 1003, a memory 1004, at least one communication bus 1002. The communication bus 1002 is used to implement connection communication between these components. The communication interface 1003 can include a standard wired interface, a wireless interface, such as a Wi-Fi interface. Memory 1004 can be a high speed RAM memory, or may be a non-stable memory (Non-Volatile Memory), such as at least one disk memory. Memory 1004 can also be optionally at least one storage device located away from the processor 1001. like image 3 As shown, memory 1004 as a computer storage medium can include an operating system, a network communication module, and a program instruction.
[0114] exist image 3 In the electronic device 1000 shown, the processor 1001 can be used to load the program instruction stored in the memory 1004 and specifically perform the following:
[0115] Get the first sample group and the second sample group, the first sample group and the second sample group each comprise a plurality of samples, each sample in the plurality of samples comprises a variety of attribute data;
[0116] It is determined whether or not the attribute data distribution feature of the first sample group is whether the attribute data distribution feature of the second sample is satisfied, and the first sample group is determined without satisfying the equalization condition without satisfying the equalization condition. The difficulty score of each attribute data and the difficulty score of each attribute data in the second sample group, the difficulty score is related to the degree of discretity of attribute data;
[0117] Based on the difficulty score of each attribute data in the first sample, the movement of each sample in the first sample is determined, and based on each attribute data in the second sample group. The difficulty score is determined to determine the movement of the movement of each sample in the second sample group, the movement affect score for indicating the total attribute data distribution characteristics of the sample to the sample group to the sample group. influence level;
[0118] The first sample group is exchanged with k samples with the second sample of the second sample, and the updated first sample group and the updated second sample group are obtained, and the K is Positive integersion; and determine the updated first sample group of attribute data distribution features and the updated second sample of the attribute data distribution feature satisfy the equalization condition until the updated first sample group The attribute data distribution feature and the attribute data distribution feature of the updated second sample satisfy the equalization condition, and outputs the first sample group and the updated second sample group after the update of the equalization condition is output. .
[0119] It should be noted that the specific implementation process can be found figure 1 Specific description of the method embodiment shown in the method shown is not described herein.
[0120] The specific execution step can be refer to the description of the foregoing embodiments, which is not described herein.
[0121] The present application embodiment also provides a computer storage medium that can store multiple instructions, the instructions being implemented by the processor and executes as described above. figure 1The method steps of the illustrated embodiment, the specific execution process can be found figure 1 Specifically, details are not described herein.
[0122] One of ordinary skill in the art will appreciate that all or part of the flow in the above-described embodiment is to be done by a hardware that can be related to instructions by a computer program, and the program can be stored in a computer readable storage medium, which isWhen executed, the flow of an embodiment of each method is performed.Wherein, the storage medium can be a disk, an optical disk, a read-only memory, a ROM, or a random storage memory (RAM), and the like.
PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
Similar technology patents
A system and method for accelerate parallel operation of an ARM processor
Owner:XIAN UNIV OF POSTS & TELECOMM
Classification and recommendation of technical efficacy words
- Evenly distributed data
A system and method for accelerate parallel operation of an ARM processor
Owner:XIAN UNIV OF POSTS & TELECOMM