Method and device for identifying potential users of 5g terminal replacement
By using two classification models and a base classifier voting method, the problem of poor identification of potential users for 5G terminal replacement was solved, and high-accuracy identification was achieved under the condition of imbalanced positive and negative samples.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA MOBILE INFORMATION TECHNOLOGY CO LTD
- Filing Date
- 2021-05-27
- Publication Date
- 2026-06-19
Smart Images

Figure CN115409527B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data mining technology, and in particular to a method and apparatus for identifying potential users of 5G terminal replacement. Background Technology
[0002] With the rollout of 5G, consumers are beginning to encounter 5G devices and data plans. Supporting precise 5G marketing is crucial for its promotion, and identifying potential 5G users and recommending compatible devices is key to this success.
[0003] Based on this, operators analyze user behavior characteristics and consumption trends using basic user information, subscription data, terminal usage, internet preferences, devices in their social circles, terminal replacement time, DPI (Deep Packet Inspection), and digital content data. They then leverage data mining techniques to identify potential 5G terminal replacement users, providing a reference for 5G terminal marketing, meeting customer needs, and improving customer satisfaction. Currently, operators often rely on market-defined rules to screen users for 5G terminals, which can lead to user selection bias.
[0004] Existing technologies include using KNN (K-Nearest Neighbor) algorithms and clustering algorithms to undersample imbalanced data, and then using decision tree algorithms to obtain the final model result after balancing the samples. While undersampling can indeed solve the problem of imbalanced positive and negative samples, the number of 5G users is currently small. If negative samples are undersampled to balance the positive and negative samples, the overall sample size will be small, resulting in poor identification of potential 5G users. Summary of the Invention
[0005] This invention provides a method and apparatus for identifying potential users of 5G terminal replacement, in order to solve the defect of poor identification effect of potential users of 5G terminal replacement in the prior art, and to improve the identification effect of potential users of 5G terminal replacement.
[0006] This invention provides a method for identifying potential users of 5G terminals during handset replacement, comprising:
[0007] Input the target user's data into the first classification model and output the first 5G terminal replacement potential user category identifier of the target user; input the data into the second classification model and output the probability that the target user belongs to the communication terminal replacement potential user.
[0008] Calculate the distance between the target user's data and the mean center of the positive sample dataset A1 of potential 5G terminal replacement users, adjust the probability according to the distance, and obtain the second 5G terminal replacement potential user category identifier of the target user according to the adjusted probability;
[0009] Based on the first and second 5G terminal replacement potential user category identifiers of the target user, obtain the third 5G terminal replacement potential user category identifier of the target user;
[0010] The first classification model is trained based on the training set A11 in A1 and the training set B11 in the negative sample dataset of potential users of 5G terminal replacement B1, and the ratio of B11 to A11 is greater than a first threshold.
[0011] The second classification model is obtained by training based on the positive sample dataset of potential users of communication terminal replacement and the negative sample dataset of potential users of communication terminal replacement.
[0012] According to a method for identifying potential 5G terminal replacement users provided by the present invention, the step of adjusting the probability based on the distance and obtaining a second 5G terminal replacement potential user category identifier of the target user based on the adjusted probability includes:
[0013] The test set A12 in A1 and the test set B12 in B1 are used as the test dataset D for potential users of 5G terminal replacement.
[0014] Calculate the distance between the data of each user sample in D and the mean center;
[0015] The distance to the target user is standardized based on the maximum and minimum distances among all user samples.
[0016] Calculate the product of the standardized distance and the probability, compare the product with a second threshold, and determine the category identifier of the potential user for the second 5G terminal replacement based on the comparison result.
[0017] According to the present invention, a method for identifying potential users of 5G terminal replacement is provided, which standardizes the distance corresponding to the target user based on the maximum and minimum values of the distances corresponding to all user samples using the following formula:
[0018]
[0019] Where max is the maximum value, min is the minimum value, and d is the distance to the target user. * The distance is the standardized value.
[0020] According to a method for identifying potential users of 5G terminal replacement provided by the present invention, before comparing the product with a second threshold, the method further includes:
[0021] Input the data of each user sample in D into the second classification model, and output the probability that each user sample belongs to a potential user of communication terminal replacement;
[0022] Based on the maximum and minimum values, the distance corresponding to each user sample is standardized, and the product between the standardized distance corresponding to each user sample and the probability corresponding to each user sample is calculated.
[0023] The user samples are sorted in descending order of the product. The potential user category identifier for the second 5G terminal replacement of the user samples with the first preset proportion is set as the first preset identifier, and the potential user category identifier for the second 5G terminal replacement of the user samples that do not have the first preset identifier is set as the second preset identifier; wherein, there are multiple preset proportions.
[0024] Based on the second 5G terminal replacement potential user category identifier of the user sample under each preset ratio, calculate the first classification evaluation index of the user sample, and determine the second threshold according to the preset ratio corresponding to the highest first classification evaluation index.
[0025] According to the present invention, a method for identifying potential 5G terminal replacement users, wherein obtaining a third 5G terminal replacement potential user category identifier of the target user based on a first 5G terminal replacement potential user category identifier and a second 5G terminal replacement potential user category identifier of the target user includes:
[0026] Add the product of the first 5G terminal replacement potential user category identifier of the target user multiplied by the first weight and the product of the second 5G terminal replacement potential user category identifier of the target user multiplied by the second weight to obtain the probability that each target user belongs to the 5G terminal replacement potential user.
[0027] The probability of each target user belonging to a potential 5G terminal replacement user is compared with a third threshold, and the third 5G terminal replacement potential user category identifier of the target user is obtained based on the comparison result.
[0028] According to a method for identifying potential 5G terminal replacement users provided by the present invention, the method further includes, before adding the product of the first 5G terminal replacement potential user category identifier of the target user multiplied by a first weight and the product of the second 5G terminal replacement potential user category identifier of the target user multiplied by a second weight to obtain the probability that each target user belongs to the potential 5G terminal replacement user category, the method further includes:
[0029] Input the data of each user sample in D into the first classification model, and output the first 5G terminal replacement potential user category identifier of each user sample;
[0030] Based on the first 5G terminal replacement potential user category identifier of all user samples, calculate the second classification evaluation index of the user samples;
[0031] The first weight and the second weight are calculated based on the second classification evaluation index and the highest first classification evaluation index of the user sample.
[0032] According to the present invention, a method for identifying potential users of 5G terminal replacement is provided. The first classification model includes multiple base classifiers, each of which is trained by extracting a preset number of negative samples from B11 and A11.
[0033] The step of inputting the target user's data into the first classification model and outputting the target user's first 5G terminal replacement potential user category identifier includes:
[0034] The target user's data is input into each base classifier, and the initial 5G terminal replacement potential user category identifier of the target user is output.
[0035] Voting is performed based on the initial 5G terminal replacement potential user category identifiers of the target user output by all base classifiers to obtain the first 5G terminal replacement potential user category identifier of the target user.
[0036] The present invention also provides a device for identifying potential users of 5G terminal replacement, comprising:
[0037] The classification module is used to input the target user's data into the first classification model and output the first 5G terminal replacement potential user category identifier of the target user, and input the data into the second classification model and output the probability that the target user belongs to the communication terminal replacement potential user.
[0038] The calculation module is used to calculate the distance between the data of the target user and the mean center of the positive sample dataset A1 of potential 5G terminal replacement users, adjust the probability according to the distance, and obtain the second 5G terminal replacement potential user category identifier of the target user according to the adjusted probability.
[0039] The identification module is used to obtain the third 5G terminal replacement potential user category identifier of the target user based on the first 5G terminal replacement potential user category identifier and the second 5G terminal replacement potential user category identifier of the target user.
[0040] The first classification model is trained based on the training set A11 in A1 and the training set B11 in the negative sample dataset of potential users of 5G terminal replacement B1, and the ratio of B11 to A11 is greater than a first threshold.
[0041] The second classification model is obtained by training based on the positive sample dataset of potential users of communication terminal replacement and the negative sample dataset of potential users of communication terminal replacement.
[0042] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of any of the above-described 5G terminal replacement potential user identification methods.
[0043] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the 5G terminal replacement potential user identification method as described above.
[0044] The present invention provides a method and apparatus for identifying potential users of 5G terminal replacement. Under conditions of imbalanced positive and negative samples, two classification models trained on a balanced dataset of potential users of communication terminal replacement samples and a dataset of potential users of 5G terminal replacement samples are used to classify the target user's sample data. The classification results of the communication terminal replacement model are corrected based on the distance between the target user's data and the data of the 5G terminal replacement user samples. The two corrected classification results are then combined to identify potential users of 5G terminal replacement, thus ensuring the accuracy of identifying potential users of 5G terminal replacement even under conditions of imbalanced positive and negative samples. Attached Figure Description
[0045] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0046] Figure 1 This is one of the flowcharts of the 5G terminal replacement potential user identification method provided by the present invention;
[0047] Figure 2 This is the second flowchart of the 5G terminal replacement potential user identification method provided by the present invention;
[0048] Figure 3 This is a schematic diagram of the structure of the 5G terminal replacement potential user identification device provided by the present invention;
[0049] Figure 4 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0050] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0051] The following is combined Figure 1 The present invention describes a method for identifying potential users of 5G terminal replacement, comprising: step 101, inputting the data of a target user into a first classification model, outputting a first 5G terminal replacement potential user category identifier of the target user, inputting the data into a second classification model, and outputting the probability that the target user belongs to a potential user of communication terminal replacement;
[0052] The target users are those who need to be identified as potential users for 5G terminal replacement. Optionally, the target user data includes seven categories: time dimension, identification, basic attributes, consumption behavior, terminal usage behavior, social circle characteristics, and internet access behavior preferences. This embodiment does not limit the data of the target users.
[0053] This embodiment is not limited to the types of the first classification model and the second classification model. The first classification model and the second classification model may be the same or different.
[0054] Optionally, the first classification model is a classification algorithm such as random forest or GBDT (Gradient Boosting Decision Tree), and the second classification model is a classification algorithm such as decision tree.
[0055] The first classification model is used to classify the target user's data to obtain the first category identifier of potential 5G terminal replacement users. The second classification model is then used to classify the target user's data to obtain the probability that the target user belongs to the category of potential communication terminal replacement users.
[0056] Optionally, the first 5G terminal replacement potential user category is identified as 1 or 0, where 1 represents a 5G terminal replacement potential user and 0 represents a non-5G terminal replacement potential user.
[0057] Step 102: Calculate the distance between the target user's data and the mean center of the positive sample dataset A1 of potential 5G terminal replacement users, adjust the probability according to the distance, and obtain the second 5G terminal replacement potential user category identifier of the target user according to the adjusted probability.
[0058] The mean center of the positive sample dataset A1 is calculated using the following formula:
[0059]
[0060] Where m3 is the sample size of A1. The sum of the same feature field of all user samples in dataset A1 is represented by the n1-dimensional mean center vector.
[0061] Optionally, the Euclidean distance between the target user's data and the mean center can be calculated; this embodiment is not limited to this type of distance. A larger distance results in a smaller probability; a smaller distance results in a larger probability. This embodiment is not limited to specific methods for adjusting the probability.
[0062] If the adjusted probability is greater than the second threshold, the target user's potential user category for the second 5G terminal replacement is identified as 1; otherwise, it is identified as 0.
[0063] Step 103: Obtain the third 5G terminal replacement potential user category identifier of the target user based on the first 5G terminal replacement potential user category identifier and the second 5G terminal replacement potential user category identifier of the target user.
[0064] This embodiment is not limited to the method of obtaining the third 5G terminal replacement potential user category identifier of the target user based on the first 5G terminal replacement potential user category identifier and the second 5G terminal replacement potential user category identifier. A complete flowchart of user identification is shown below. Figure 2 As shown.
[0065] The first classification model is obtained by training on training set A11 in A1 and training set B11 in negative sample dataset B1 of potential users of 5G terminal replacement, and the ratio between the number of B11 and the number of A11 is greater than a first threshold; the second classification model is obtained by training on positive sample dataset of potential users of communication terminal replacement and negative sample dataset of potential users of communication terminal replacement.
[0066] Modeling the first and second classification models using existing user sample data from normal online users. Constructing the dataset. Where X i ∈R n Let y be the feature set of user samples, where n is the number of feature fields and m is the number of user samples. i ∈{0,1} represents the category identifier of the potential 5G terminal replacement user for the i-th user sample, v i ∈{0,1} represents the potential user category identifier for the actual communication terminal switching of the i-th user sample.
[0067] Data cleaning and invalid user removal are performed on the feature fields of the user samples. The data cleaning process includes the following steps:
[0068] Step 1: Handling Missing Values and Zero Values. For each feature field, calculate the percentage of user samples with zero values in that feature field out of all user samples. Delete feature fields where the percentage of zero values exceeds a certain threshold, which can be set to 70%. For retained feature fields, fill zero values with 0 or the mean for continuous variables, and fill zero values with "null" or the mode for discrete variables. Feature fields with a zero value percentage exceeding 30% but less than 70% are retained after verification; feature fields with a zero value percentage exceeding 70% are removed as appropriate.
[0069] Step 2, data type conversion. For example, convert the values of binary discrete variables with the value of yes or no to 1 or 0, which is a Boolean type; convert other discrete variables to string type.
[0070] Step 3, Outlier Handling. Identify and handle outliers in the feature fields, such as feature field values exceeding 3 standard deviations, and handle outliers by removing or imputing other values.
[0071] Step 4, create derived variables. Perform statistical analysis on the feature set of each user sample, adding k feature fields;
[0072] Step 5: Output the cleaned dataset. The number of user samples is m1, and the number of feature fields is n1 = n + k.
[0073] Next, invalid users are removed, which includes the following steps:
[0074] Step 1: Select the desired analysis field from the n1 feature fields, such as age;
[0075] Step 2: Segment the continuous fields in the analysis fields. For example, for the age field, determine the segment interval [0,18,25,30,35,40,50,60,100], separate the field values, and convert them into discrete fields for output.
[0076] Step 3: Analyze the distribution of 5G terminal users in the user sample across the discrete values of each discrete field:
[0077] 1. Select a discrete field to be analyzed, count the total number of users and the number of 5G terminal users at each discrete value, and calculate the proportion of 5G terminal users in the total number of users at each discrete value.
[0078] 2. Sort each discrete value based on the proportion of 5G terminal users at each discrete value, and obtain the discrete values of 5G terminal user proportion that are much smaller than the proportion of 5G terminal users in dataset S1. For example, if it is less than 1 / 5 of the proportion of 5G terminal users in dataset S1, retain the distribution result of the 5G terminal user proportion of this discrete field.
[0079] Step 4: Remove invalid user samples.
[0080] 1. Based on the distribution results of the proportion of 5G terminal users in the above-retained discrete fields, users who are extremely unlikely to apply for 5G terminal replacement services, such as users under 18 years old or over 60 years old, account for 0.1% of 5G terminal users. Therefore, this part of the user sample is removed from the dataset S1.
[0081] 2. Remove test cards, virtual cards, temporary or other abnormal tariff cards, IoT cards, wireless landlines, M2M, data cards, TD wireless landline user samples, and user samples with abnormal status.
[0082] 3. Sample of users who frequently changed their devices in the past 3 months. Since the IMEI of the device changes constantly, this was determined by checking if the number of unique IMEIs after deduplication exceeds a threshold.
[0083] Step 5, Output the dataset The number of user samples is m2, and the number of feature fields is n1.
[0084] Filter y from dataset S2 i =1 user sample, to obtain 5G handset replacement positive sample dataset A1; filter y i User samples with a value of 0 were used to obtain the 5G handset replacement negative sample dataset B1; v was then filtered. i =1 user sample, obtain the positive sample dataset A2 for device replacement; filter v i The user samples with a value of 0 are used to obtain the negative sample dataset B2 for device replacement.
[0085] Divide the positive and negative sample datasets into K parts. Select one part from each of the K positive and negative sample datasets as the test dataset, and use the remaining K-1 samples as the training dataset to obtain the training set A. 11 A 21 B 11 and B 21 Test set A 12 A 22 B 12 and B 22 .
[0086] Judge B 11 The sample size and A 11Does the ratio between the sample sizes exceed a first threshold? If yes, then the method in this embodiment is used for user identification; otherwise, the existing method is used for user identification.
[0087] This embodiment classifies target users' sample data based on their own data, using two classification models trained on a balanced dataset of potential users for 5G terminal replacements and a dataset of potential users for 5G terminal replacements, respectively, even when positive and negative samples are imbalanced. The classification results of the communication terminal replacement model are corrected based on the distance between the target user's data and the data of the 5G terminal replacement user samples. The two corrected classification results are then combined to identify potential 5G terminal replacement users for the target user, thus ensuring the accuracy of identifying potential 5G terminal replacement users even when positive and negative samples are imbalanced.
[0088] Based on the above embodiments, the step of adjusting the probability according to the distance and obtaining the second 5G terminal replacement potential user category identifier of the target user according to the adjusted probability in this embodiment includes: taking the test set A12 in A1 and the test set B12 in B1 as the 5G terminal replacement potential user test dataset D.
[0089] The test sets A12 and B12 are merged into a test dataset D, at which point the number of samples is m4, which is 1 / K of m2.
[0090] Calculate the distance between the data of each user sample in D and the mean center;
[0091] Optionally, the distance between the i-th user sample and the mean center can be calculated using the following formula:
[0092]
[0093] Where, x ik This represents the value of the k-th feature field of the i-th user sample in dataset D. Represents the average center vector The value of the k-th element is used to output the distance result vector d, which has a dimension of m4.
[0094] The distance to the target user is standardized based on the maximum and minimum distances among all user samples.
[0095] This embodiment is not limited to the method of standardization based on the maximum and minimum values. After standardization, the distance corresponding to the target user falls into the [0,1] interval.
[0096] Calculate the product of the standardized distance and the probability, compare the product with a second threshold, and determine the category identifier of the potential user for the second 5G terminal replacement based on the comparison result.
[0097] Calculate the standardized distance d * Multiply by the probability p′1, and compare the product with the second threshold to obtain the second 5G terminal replacement potential user category identifier.
[0098] Based on the above embodiments, this embodiment standardizes the distance to the target user using the following formula, based on the maximum and minimum values among the distances corresponding to all user samples:
[0099]
[0100] Where max is the maximum value, min is the minimum value, and d is the distance to the target user. * The distance is the standardized value.
[0101] Based on the above embodiments, in this embodiment, before comparing the product with the second threshold, the method further includes: inputting the data of each user sample in D into the second classification model, and outputting the probability that each user sample belongs to a potential user of communication terminal replacement;
[0102] Based on the maximum and minimum values, the distance corresponding to each user sample is standardized, and the product between the standardized distance corresponding to each user sample and the probability corresponding to each user sample is calculated.
[0103] The method for standardizing the distance for each user sample is the same as the method for standardizing the distance for the target user.
[0104] Optionally, for user samples with a probability greater than a specified threshold, such as 0.5, the similarity to 5G phone upgrade users can be further filtered based on the standardized distance.
[0105] The user samples are sorted in descending order of the product. The potential user category identifier for the second 5G terminal replacement of the user samples with the first preset proportion is set as the first preset identifier, and the potential user category identifier for the second 5G terminal replacement of the user samples that do not have the first preset identifier is set as the second preset identifier; wherein, there are multiple preset proportions.
[0106] Sort the products corresponding to the user samples, select the top g% of user samples, and set the second 5G terminal replacement potential user category identifier as the first preset identifier 1; otherwise, set it as the second preset identifier 0. g can be selected from multiple values from 1 to 100.
[0107] Based on the second 5G terminal replacement potential user category identifier of the user sample under each preset ratio, calculate the first classification evaluation index of the user sample, and determine the second threshold according to the preset ratio corresponding to the highest first classification evaluation index.
[0108] The final value of g is selected as the value that yields the highest first classification evaluation metric for the user samples. The second threshold is then determined based on the classification result obtained from the final value of g.
[0109] This embodiment improves the accuracy of the second threshold, thereby further enhancing the accuracy of identifying potential users for 5G terminal replacement.
[0110] Based on the above embodiments, the step of obtaining the third 5G terminal replacement potential user category identifier of the target user according to the first 5G terminal replacement potential user category identifier and the second 5G terminal replacement potential user category identifier of the target user includes: adding the product of the first 5G terminal replacement potential user category identifier of the target user multiplied by a first weight and the product of the second 5G terminal replacement potential user category identifier of the target user multiplied by a second weight to obtain the probability that each target user belongs to the 5G terminal replacement potential user category;
[0111] The probability that a target user is a potential user for 5G device upgrades is calculated using the following formula:
[0112] p′3=w1×p′0+w2×p′2
[0113] Where p′3 represents the probability that the target user is a potential user for 5G terminal replacement, w1 is the first weight, w2 is the first weight, p′0 is the first potential user category identifier for the target user (e.g., 1 or 0), and p′2 is the second potential user category identifier for the target user (e.g., 1 or 0).
[0114] The probability of each target user belonging to a potential 5G terminal replacement user is compared with a third threshold, and the third 5G terminal replacement potential user category identifier of the target user is obtained based on the comparison result.
[0115] When p′3 is greater than the third threshold, the target user's third 5G terminal replacement potential user category is identified as 1; otherwise, it is identified as 0.
[0116] Based on the above embodiments, in this embodiment, the step of adding the product of the first 5G terminal replacement potential user category identifier of the target user multiplied by the first weight and the product of the second 5G terminal replacement potential user category identifier of the target user multiplied by the second weight to obtain the probability that each target user belongs to the 5G terminal replacement potential user further includes: inputting the data of each user sample in D into the first classification model respectively, and outputting the first 5G terminal replacement potential user category identifier of each user sample;
[0117] Optionally, when the first classification model includes l base classifiers, the data of each user sample in D is input into the l base classifiers, and l classification results are output. A voting process is then performed on the l classification results to obtain the first 5G terminal replacement potential user category identifier for each user sample.
[0118] Based on the first 5G terminal replacement potential user category identifier of all user samples, calculate the second classification evaluation index of the user samples;
[0119] Optionally, the first 5G terminal replacement potential user category identifier of the user sample is compared with the actual 5G terminal replacement potential user category identifier, and the accuracy of the classification results is calculated.
[0120] The first weight and the second weight are calculated based on the second classification evaluation index and the highest first classification evaluation index of the user sample.
[0121] The first weight and the second weight are calculated using the following formula based on the second classification evaluation metric and the highest first classification evaluation metric of the user sample:
[0122]
[0123] w2 = 1 - w1;
[0124] Where F is the second category evaluation index, F' is the highest first category evaluation index, w1 is the first weight, and w2 is the first weight.
[0125] This embodiment uses classification performance evaluation metrics to determine the weight coefficients of multi-model classification, which avoids the problem of poor voting effect due to a small number of models and ensures the accuracy of the final result.
[0126] Based on the above embodiments, the first classification model in this embodiment includes multiple base classifiers, each of which is obtained by training based on a preset number of negative samples extracted from B11 and A11;
[0127] Let the sample size of A1 be m3, A 12 The sample size is m4, A 11The sample size is m5. The training steps for the base classifier are as follows:
[0128] Step 1: Let i = 1, determine the number of base classifiers to be established as l, l is greater than 0, and is set to B by default. 11 The sample size and A 11 The ratio is taken as integer multiplied by 2;
[0129] Step 2, extract the training set. From B 11 Randomly select m5 negative samples from A and compare them with A. 11 Merge to generate sub-training set C i ;
[0130] Step 3: Establish a base classifier using classification algorithms such as decision trees. Input sub-training set C i Train the model and output the base classifier MDL. i ;
[0131] Step 4: Determine if i is greater than l. If yes, output l primitive classifiers; otherwise, let i = i + 1 and return to step 2.
[0132] The step of inputting the target user's data into the first classification model and outputting the target user's first 5G terminal replacement potential user category identifier includes: inputting the target user's data into each base classifier and outputting the target user's initial 5G terminal replacement potential user category identifier; and performing a voting process based on the target user's initial 5G terminal replacement potential user category identifiers output by all base classifiers to obtain the target user's first 5G terminal replacement potential user category identifier.
[0133] Input the target user's data into MDL1 to MDL l The algorithm outputs l classification results. It counts the number of identical classification results and uses the classification result with the most occurrences as the first potential 5G terminal upgrade user category identifier for the target user.
[0134] The following describes the 5G terminal replacement potential user identification device provided by the present invention. The 5G terminal replacement potential user identification device described below can be referred to in correspondence with the 5G terminal replacement potential user identification method described above.
[0135] like Figure 3 As shown, the device includes a classification module 301, a calculation module 302, and a recognition module 303, wherein,
[0136] The classification module 301 is used to input the target user's data into the first classification model, output the first 5G terminal replacement potential user category identifier of the target user, input the data into the second classification model, and output the probability that the target user belongs to the communication terminal replacement potential user.
[0137] The calculation module 302 is used to calculate the distance between the data of the target user and the mean center of the positive sample dataset A1 of potential 5G terminal replacement users, adjust the probability according to the distance, and obtain the second 5G terminal replacement potential user category identifier of the target user according to the adjusted probability.
[0138] The identification module 303 is used to obtain the third 5G terminal replacement potential user category identifier of the target user based on the first 5G terminal replacement potential user category identifier and the second 5G terminal replacement potential user category identifier of the target user.
[0139] The first classification model is trained based on the training set A11 in A1 and the training set B11 in the negative sample dataset of potential users of 5G terminal replacement B1, and the ratio of B11 to A11 is greater than a first threshold.
[0140] The second classification model is obtained by training based on the positive sample dataset of potential users of communication terminal replacement and the negative sample dataset of potential users of communication terminal replacement.
[0141] This embodiment classifies target users' sample data based on their own data, using two classification models trained on a balanced dataset of potential users for 5G terminal replacements and a dataset of potential users for 5G terminal replacements, respectively, even when positive and negative samples are imbalanced. The classification results of the communication terminal replacement model are corrected based on the distance between the target user's data and the data of the 5G terminal replacement user samples. The two corrected classification results are then combined to identify potential 5G terminal replacement users for the target user, thus ensuring the accuracy of identifying potential 5G terminal replacement users even when positive and negative samples are imbalanced.
[0142] Figure 4 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 4As shown, the electronic device may include a processor 410, a communications interface 420, a memory 430, and a communication bus 440, wherein the processor 410, communications interface 420, and memory 430 communicate with each other through the communication bus 440. The processor 410 can call logical instructions in the memory 430 to execute a 5G terminal replacement potential user identification method. This method includes inputting target user data into a first classification model and outputting a first 5G terminal replacement potential user category identifier for the target user; inputting the data into a second classification model and outputting the probability that the target user belongs to a communication terminal replacement potential user; calculating the distance between the target user's data and the mean center of the positive sample dataset A1 of 5G terminal replacement potential users; adjusting the probability according to the distance; obtaining a second 5G terminal replacement potential user category identifier for the target user based on the adjusted probability; and obtaining a third 5G terminal replacement potential user category identifier for the target user based on the first and second 5G terminal replacement potential user category identifiers.
[0143] Furthermore, the logical instructions in the aforementioned memory 430 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0144] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, wherein when the program instructions are executed by a computer, the computer is able to execute the 5G terminal replacement potential user identification method provided by the above methods, the method comprising: inputting the target user's data into a first classification model, outputting a first 5G terminal replacement potential user category identifier for the target user; inputting the data into a second classification model, outputting the probability that the target user belongs to a communication terminal replacement potential user; calculating the distance between the target user's data and the mean center of the 5G terminal replacement potential user positive sample dataset A1, adjusting the probability according to the distance, obtaining a second 5G terminal replacement potential user category identifier for the target user according to the adjusted probability; and obtaining a third 5G terminal replacement potential user category identifier for the target user according to the first 5G terminal replacement potential user category identifier and the second 5G terminal replacement potential user category identifier for the target user.
[0145] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon. When executed by a processor, the computer program is implemented to perform the aforementioned methods for identifying potential 5G terminal replacement users. The method includes: inputting data of a target user into a first classification model and outputting a first 5G terminal replacement potential user category identifier for the target user; inputting the data into a second classification model and outputting the probability that the target user belongs to a potential communication terminal replacement user; calculating the distance between the target user's data and the mean center of the positive sample dataset A1 of potential 5G terminal replacement users; adjusting the probability according to the distance; obtaining a second 5G terminal replacement potential user category identifier for the target user based on the adjusted probability; and obtaining a third 5G terminal replacement potential user category identifier for the target user based on the first and second 5G terminal replacement potential user category identifiers.
[0146] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0147] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0148] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for identifying potential users of 5G terminal replacement, characterized in that, include: Input the target user's data into the first classification model and output the first 5G terminal replacement potential user category identifier of the target user; input the data into the second classification model and output the probability that the target user belongs to the communication terminal replacement potential user. Calculate the distance between the target user's data and the mean center of the positive sample dataset A1 of potential 5G terminal replacement users, adjust the probability according to the distance, and obtain the second 5G terminal replacement potential user category identifier of the target user according to the adjusted probability; Based on the first and second 5G terminal replacement potential user category identifiers of the target user, obtain the third 5G terminal replacement potential user category identifier of the target user; The first classification model is trained based on the training set A11 in A1 and the training set B11 in the negative sample dataset of potential users of 5G terminal replacement B1, and the ratio of the training set B11 to the training set A11 is greater than a first threshold. The second classification model is trained based on a positive sample dataset of potential users switching communication terminals and a negative sample dataset of potential users switching communication terminals; the ratio of the positive sample dataset to the negative sample dataset is 1. The step of adjusting the probability based on the distance and obtaining the second 5G terminal replacement potential user category identifier of the target user based on the adjusted probability includes: The test set A12 in A1 and the test set B12 in B1 are used as the test dataset D for potential users of 5G terminal replacement. Calculate the distance between the data of each user sample in the potential user test dataset D for 5G terminal replacement and the mean center; The distance to the target user is standardized based on the maximum and minimum distances among all user samples. Calculate the product of the standardized distance and the probability, compare the product with a second threshold, and determine the category identifier of the potential user for the second 5G terminal replacement based on the comparison result; The step of obtaining the third 5G terminal replacement potential user category identifier of the target user based on the first 5G terminal replacement potential user category identifier and the second 5G terminal replacement potential user category identifier includes: Add the product of the first 5G terminal replacement potential user category identifier of the target user multiplied by the first weight and the product of the second 5G terminal replacement potential user category identifier of the target user multiplied by the second weight to obtain the probability that each target user belongs to the 5G terminal replacement potential user. The probability of each target user belonging to a potential 5G terminal replacement user is compared with a third threshold, and the third 5G terminal replacement potential user category identifier of the target user is obtained based on the comparison result. 2.The 5G terminal changer potential user identification method of claim 1, wherein, The distance to the target user is standardized using the following formula, based on the maximum and minimum distances among all user samples: ; wherein max is the maximum value, min is the minimum value, d is the distance corresponding to the target user, is the standardized distance. 3.The 5G terminal changer potential user identification method of claim 1, wherein, Before comparing the product with the second threshold, the method further includes: Input the data of each user sample in the 5G terminal replacement potential user test dataset D into the second classification model, and output the probability that each user sample belongs to the communication terminal replacement potential user; Based on the maximum and minimum values, the distance corresponding to each user sample is standardized, and the product between the standardized distance corresponding to each user sample and the probability corresponding to each user sample is calculated. The user samples are sorted in descending order of the product. The potential user category identifier for the second 5G terminal replacement of the user samples with the first preset proportion is set as the first preset identifier, and the potential user category identifier for the second 5G terminal replacement of the user samples that do not have the first preset identifier is set as the second preset identifier; wherein, there are multiple preset proportions. Based on the second 5G terminal replacement potential user category identifier of the user sample under each preset ratio, calculate the first classification evaluation index of the user sample, and determine the second threshold according to the preset ratio corresponding to the highest first classification evaluation index. 4.The 5G terminal changer potential user identification method of claim 1, wherein, The step of adding the product of the first 5G terminal replacement potential user category identifier of the target user multiplied by the first weight and the product of the second 5G terminal replacement potential user category identifier of the target user multiplied by the second weight to obtain the probability that each target user belongs to the 5G terminal replacement potential user category, further includes: Input the data of each user sample in the 5G terminal replacement potential user test dataset D into the first classification model, and output the first 5G terminal replacement potential user category identifier of each user sample; Based on the first 5G terminal replacement potential user category identifier of all user samples, calculate the second classification evaluation index of the user samples; The first weight and the second weight are calculated based on the second classification evaluation index and the highest first classification evaluation index of the user sample. 5.The method of claim 1-4, wherein, The first classification model includes multiple base classifiers, each of which is trained by extracting a preset number of negative samples from B11 and A11. The step of inputting the target user's data into the first classification model and outputting the target user's first 5G terminal replacement potential user category identifier includes: The target user's data is input into each base classifier, and the initial 5G terminal replacement potential user category identifier of the target user is output. Voting is performed based on the initial 5G terminal replacement potential user category identifiers of the target user output by all base classifiers to obtain the first 5G terminal replacement potential user category identifier of the target user. 6.A device for identifying potential users of 5G terminal replacement, characterized in that, include: The classification module is used to input the target user's data into the first classification model and output the first 5G terminal replacement potential user category identifier of the target user, and input the data into the second classification model and output the probability that the target user belongs to the communication terminal replacement potential user. The calculation module is used to calculate the distance between the data of the target user and the mean center of the positive sample dataset A1 of potential 5G terminal replacement users, adjust the probability according to the distance, and obtain the second 5G terminal replacement potential user category identifier of the target user according to the adjusted probability. The identification module is used to obtain the third 5G terminal replacement potential user category identifier of the target user based on the first 5G terminal replacement potential user category identifier and the second 5G terminal replacement potential user category identifier of the target user. The first classification model is trained based on the training set A11 in A1 and the training set B11 in the negative sample dataset of potential users of 5G terminal replacement B1, and the ratio of B11 to A11 is greater than a first threshold. The second classification model is trained based on a positive sample dataset of potential users switching communication terminals and a negative sample dataset of potential users switching communication terminals; the ratio of the positive sample dataset to the negative sample dataset is 1. The calculation module is further configured to use test set A12 in A1 and test set B12 in B1 as a test dataset D for potential 5G terminal replacement users; calculate the distance between the data of each user sample in the test dataset D and the mean center; standardize the distance corresponding to the target user according to the maximum and minimum values of the distances corresponding to all user samples; calculate the product of the standardized distance and the probability, compare the product with a second threshold, and determine the second 5G terminal replacement potential user category identifier based on the comparison result; The identification module is further configured to add the product of the first 5G terminal replacement potential user category identifier of the target user multiplied by the first weight and the product of the second 5G terminal replacement potential user category identifier of the target user multiplied by the second weight to obtain the probability that each target user belongs to the 5G terminal replacement potential user; compare the probability that each target user belongs to the 5G terminal replacement potential user with the third threshold, and obtain the third 5G terminal replacement potential user category identifier of the target user based on the comparison result.
7. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the 5G terminal replacement potential user identification method as described in any one of claims 1 to 5.
8. A non-transitory computer-readable storage medium having stored thereon a computer program, characterized in that, When the computer program is executed by the processor, it implements the steps of the 5G terminal replacement potential user identification method as described in any one of claims 1 to 5.