A method and system for MR information backfilling based on big data
By extracting key information and performing time-series processing on S1-MME and S1UHTTP data, combined with the aggregation and flattening of MR data, efficient real-time backfilling of MR data was achieved, solving the lag problem of MR data backfilling in existing technologies and meeting the real-time and accuracy requirements of 5G networks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING JIUYU BOHONG TECH CO LTD
- Filing Date
- 2023-03-16
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies suffer from significant lag in MR data backfilling, with processing time increasing as the amount of data grows, failing to meet the real-time and accuracy requirements of 5G networks.
By acquiring S1-MME and S1UHTTP data from mobile terminals, key information data is extracted, time-series classification and normalization are performed, and the original MR data is aggregated and flattened. Finally, it is merged and backfilled with key information data, utilizing big data real-time processing technology.
It improves the real-time performance and accuracy of MR data backfilling, simplifies the processing flow, and meets the data support and requirements of 5G networks.
Smart Images

Figure CN116390149B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of mobile communication technology, and specifically to a method and system for backfilling MR information based on big data. Background Technology
[0002] With the development of 5G mobile internet and the basic popularization of 4G networks, O-domain data is experiencing explosive growth, and XDR and MR are the two main types of big data on the network side. XDR data mainly collects data from interfaces including S11, S1-MME, S1-U, SGS, S6A, S5 / S8, GB, IU_PS, GN, A, IU-CS, and C / D. Among these, S1-MME and S1-U data contain user information, service information, and call detail records (CDRs). MR data mainly includes three types of test reports: MRO, MRS, and MRE. MRO and MRE represent periodic measurement report sample data files, containing user location parameters and network coverage information. Currently, using big data analytics methods to clean, correlate, and perform algorithmic mining on large amounts of S1-MME, S1-U, and MR data can be practically applied for network problem discovery and analysis.
[0003] However, before using MR data, a crucial step is solving the data backfilling problem. With the rapid development of 5G, there are even higher requirements for the real-time performance and accuracy of data. Traditional backfilling methods use S1-MME, S1UHTTP, and MR data for correlation backfilling. This method has a strong lag, and the processing time increases with the size of the data, which cannot meet the current support requirements of many network optimization business systems. Summary of the Invention
[0004] To address the shortcomings of existing technologies, this invention provides a method and system for MR information backfilling based on big data.
[0005] Firstly, a method for backfilling MR information based on big data includes:
[0006] Acquire S1-MME data and S1UHTTP data from the mobile terminal, and extract key information data based on the S1-MME data and S1UHTTP data;
[0007] The key information data is classified and normalized based on time series to obtain a linear relationship between the key information data.
[0008] Obtain raw MR data from the mobile terminal, aggregate and classify the raw MR data, and flatten the aggregated and classified raw MR data to obtain flattened MR data.
[0009] The linear relationship between the flattened MR data and the key information data is fused to perform information backfilling on the MR data.
[0010] Furthermore, the step of acquiring the S1-MME data and S1UHTTP data of the mobile terminal, and extracting key information data based on the S1-MME data and S1UHTTP data, specifically involves:
[0011] Collect XDR data from the mobile terminal, and obtain S1-MME data and S1UHTTP data from the XDR data;
[0012] Extract key information data based on the S1-MME data and S1UHTTP data;
[0013] The key information data includes, but is not limited to, MME_UE_S1AP_ID data, ENB_UE_S1AP_ID data, MSISDN data, IMEI data, IMSI data, STARTTIME data, ENDTIME data, and ECI data.
[0014] Furthermore, the step of classifying and normalizing the key information data based on time series to obtain a linear relationship between the key information data specifically involves:
[0015] Obtain a preset time interval range, process the key information data within the time interval range for time attributes, and uniformly assign the time attribute values of the key information data.
[0016] Historical data is obtained from the associated information data after processing based on time attributes, and the ECI distribution of S1-MME data and S1UHTTP data is analyzed based on the historical data.
[0017] Based on the ECI distribution, the ECI classification sequence of S1-MME data and S1UHTTP data is calculated using a balanced allocation algorithm to obtain the ECI classification results of S1-MME data and S1UHTTP data.
[0018] Based on the ECI classification results, data imputation and automatic correction algorithms were used to process the S1-MME data and S1UHTTP data to obtain a time series arrangement.
[0019] The time series arrangement is normalized to obtain new data KEY for key information data, which reflects the linear relationship of key information data in the time series.
[0020] Further, the process of acquiring the raw MR data from the mobile terminal, aggregating and classifying the raw MR data, and then flattening the aggregated and classified raw MR data to obtain flattened MR data specifically involves:
[0021] Collect raw MR data from the mobile terminal and obtain the ECI association information of the raw MR data;
[0022] Based on the ECI association information, the original MR data is aggregated and classified to obtain MR classification data;
[0023] The MR classification data is flattened using a weighted dynamic algorithm to obtain flattened MR data, which is then stored in a queue to be processed.
[0024] Furthermore, the process of fusing the linear relationship between the flattened MR data and the key information data to backfill information into the MR data specifically involves:
[0025] Take the flattened MR data from the queue to be processed, and parse the flattened MR data to obtain the new data KEY of the flattened MR data;
[0026] Real-time acquisition of key information data in the time series arrangement;
[0027] Based on the new data KEY of the flattened MR data, a binary search sequential matching algorithm is used to match the KEY values of the flattened MR data with the acquired key information data, and the matching key information data is backfilled into the flattened MR data according to the matching results.
[0028] Secondly, a big data-based MR information backfilling system includes:
[0029] Key information extraction module: used to acquire S1-MME data and S1UHTTP data from the mobile terminal, and extract key information data based on the S1-MME data and S1UHTTP data;
[0030] Data classification and normalization module: used to classify and normalize the key information data based on time series to obtain the linear relationship of the key information data;
[0031] MR data acquisition module: used to acquire raw MR data from mobile terminals, aggregate and classify the raw MR data, and flatten the aggregated and classified raw MR data to obtain flattened MR data;
[0032] MR Information Backfilling Module: This module is used to fuse the linear relationship between the flattened MR data and the key information data to backfill the MR data with information.
[0033] Furthermore, the key information extraction module is specifically used for:
[0034] Collect XDR data from the mobile terminal, and obtain S1-MME data and S1UHTTP data from the XDR data;
[0035] Extract key information data based on the S1-MME data and S1UHTTP data;
[0036] The key information data includes, but is not limited to, MME_UE_S1AP_ID data, ENB_UE_S1AP_ID data, MSISDN data, IMEI data, IMSI data, STARTTIME data, ENDTIME data, and ECI data.
[0037] Furthermore, the data classification and normalization module is specifically used for:
[0038] Obtain a preset time interval range, process the key information data within the time interval range for time attributes, and uniformly assign the time attribute values of the key information data.
[0039] Historical data is obtained from the associated information data after processing based on time attributes, and the ECI distribution of S1-MME data and S1UHTTP data is analyzed based on the historical data.
[0040] Based on the ECI distribution, the ECI classification sequence of S1-MME data and S1UHTTP data is calculated using a balanced allocation algorithm to obtain the ECI classification results of S1-MME data and S1UHTTP data.
[0041] Based on the ECI classification results, data imputation and automatic correction algorithms were used to process the S1-MME data and S1UHTTP data to obtain a time series arrangement.
[0042] The time series arrangement is normalized to obtain new data KEY for key information data, which reflects the linear relationship of key information data in the time series.
[0043] Furthermore, the MR data acquisition module is specifically used for:
[0044] Collect raw MR data from the mobile terminal and obtain the ECI association information of the raw MR data;
[0045] Based on the ECI association information, the original MR data is aggregated and classified to obtain MR classification data;
[0046] The MR classification data is flattened using a weighted dynamic algorithm to obtain flattened MR data, which is then stored in a queue to be processed.
[0047] Furthermore, the MR information backfilling module is specifically used for:
[0048] Take the flattened MR data from the queue to be processed, and parse the flattened MR data to obtain the new data KEY of the flattened MR data;
[0049] Real-time acquisition of key information data in the time series arrangement;
[0050] Based on the new data KEY of the flattened MR data, a binary search sequential matching algorithm is used to match the KEY values of the flattened MR data with the acquired key information data, and the matching key information data is backfilled into the flattened MR data according to the matching results.
[0051] The beneficial effects of this invention are as follows: by extracting key information data from S1-MME and S1UHTTP data, and performing time series sorting and normalization, the linear relationship of key information data in the time series is obtained. Then, the differences in the original MR data are flattened. Finally, the flattened original MR data and the linear relationship of the key information data are fused to output the final structured MR backfill data. Using big data real-time processing technology, the problem of long data association time between hundreds of millions of data points is solved. Moreover, the application of linear time queue can significantly improve the backfill rate and accuracy, thereby enabling MR data to carry key information in a timely manner, reducing the association between hundreds of millions of data points, simplifying the backfill processing process, and meeting the support requirements of business systems. Attached Figure Description
[0052] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. In all the drawings, similar elements or parts are generally identified by similar reference numerals. In the drawings, the elements or parts are not necessarily drawn to scale.
[0053] Figure 1 A flowchart illustrating a big data-based MR information backfilling method provided in Embodiment 1 of the present invention;
[0054] Figure 2 This is a block diagram of a big data-based MR information backfilling system provided in Embodiment 2 of the present invention. Detailed Implementation
[0055] The embodiments of the technical solution of the present invention will now be described in detail with reference to the accompanying drawings. These embodiments are merely illustrative of the technical solution of the present invention and are therefore intended to limit the scope of protection of the present invention.
[0056] It should be noted that, unless otherwise stated, the technical or scientific terms used in this application should have the ordinary meaning as understood by one of ordinary skill in the art to which this invention pertains.
[0057] Example 1
[0058] like Figure 1 As shown, a method for backfilling MR information based on big data includes:
[0059] S1: Obtain the S1-MME data and S1UHTTP data from the mobile terminal, and extract key information data based on the S1-MME data and S1UHTTP data;
[0060] Specifically, XDR data of mobile terminals is collected from communication base stations. The main acquisition interfaces of the XDR data include, but are not limited to, S11, S1-MME, S1-U, SGS, S6A, S5 / S8, GB, IU_PS, GN, A, IU-CS, C / D, etc.
[0061] S1-MME data and S1UHTTP data are obtained from the XDR data, and key information data is extracted from the S1-MME data and S1UHTTP data. Specifically, data such as MME_UE_S1AP_ID, ENB_UE_S1AP_ID, MSISDN, IMEI, IMSI, STARTTIME, ENDTIME, and ECI are extracted from the S1-MME data; and key information data such as MSISDN, IMEI, IMSI, STARTTIME, ENDTIME, and CI are extracted from the S1UHTTP data.
[0062] Preferably, the key information extraction of S1-MME data requires parsing different text files according to different data types to obtain the same information. The different data types include, but are not limited to, context release, management, PDN connection, PDN disconnection, paging and service request, UE-initiated bearer resource request, UE-initiated bearer resource modification, network-initiated EPS bearer context activation, network-initiated EPS bearer context deactivation, network-initiated EPS bearer context modification, handover, attach, detach, tracking area update, etc.
[0063] S2: Classify and normalize the key information data based on time series to obtain the linear relationship of the key information data;
[0064] Specifically, a preset time interval range is obtained. This time interval range can be set according to actual needs. In this embodiment, the interval is divided into minutes that are multiples of 5, with 5+X and 5-X as the time interval range values, where X is a multiple of 5. Based on the STARTTIME and ENDTIME data in the key information data, the key information data within the time interval range is processed for time attributes. The time attribute processing includes uniformly assigning values to the time attributes of key information data belonging to the same time interval range. For example, if X is set to 10s, then the time interval range is 5s to 10s, and the time attribute of key information data belonging to the 5s to 10s time interval range can be uniformly assigned the value 5s.
[0065] Furthermore, historical data is obtained based on the associated information data processed by time attributes. The historical data includes, but is not limited to, the data size, number of entries, and server of S1-MME data and S1UHTTP data. The ECI distribution of S1-MME data and S1UHTTP data is analyzed based on the historical data to generate prerequisites for subsequent ECI classification.
[0066] Based on the ECI distribution, a balanced allocation algorithm is used to calculate the ECI classification sequence for the S1-MME and S1UHTTP data, yielding the ECI classification results for the S1-MME and S1UHTTP data. The formula for the balanced allocation algorithm is as follows:
[0067] ① When M≤N:
[0068] S i =D j (i=1,2,...,M; j=1,2,...,M)
[0069] ②When M>N:
[0070]
[0071]
[0072] In the formula, M represents the number of data items, N represents the number of sets to which the M data items are divided, D represents the number of M datasets, and D has been sorted in descending order of data size. i Let I represent N sets, and let I represent the average amount of data that needs to be allocated to the N sets.
[0073] The balanced allocation algorithm process includes:
[0074] (a1) Initially, the dataset D is sorted in descending order of data size, D' = D, S i '=S i Where D' represents unassigned data, S i 'Represents a set that is not yet finalized;
[0075] (a2) take Assigned sequentially Where len(S i ) represents sequence S i The number of elements, S i '=S i '-S i (S) i =S i (S i >I, i = 1, 2, ..., N);
[0076] (a3) If D' is empty, the allocation is complete and the algorithm terminates; otherwise, proceed to the next step.
[0077] (a4)D' Sort by data size in descending order, S i 'Press IS i Sort the values in descending order, then jump to step (a2).
[0078] S obtained from the balanced allocation algorithm i This represents the final ECI classification sequence, i.e., the ECI classification results of the S1-MME data and the S1UHTTP data.
[0079] Preferably, based on the classification sequence results, the set S is... i Each dataset's sequence ECI is used as a partition category. For example, all ECI data in S1 is used as one partition category, and so on. The ECI classification rules are used as data transmission rules to distribute the data as evenly as possible, thereby improving the query and writing efficiency of key information data.
[0080] Based on ECI classification results S i The data is a set. Data imputation and automatic correction algorithms are used to process the S1-MME data and S1UHTTP data to obtain the time-sorted sequence Q. t The time-sorted sequence Q described in this embodiment t It refers to the rules for arrangement and the method of filling, not the processing of the data itself.
[0081] The purpose of the data supplementation is to: appropriately select and discard multiple data points from the same time period, ultimately retaining only one data point; and when data is missing at a certain time, retrieve data from the previous time period as the data for that time period. The formula for the data supplementation is:
[0082] Q t =Q t-1
[0083] In the formula, Q t This represents the data at time t.
[0084] The data entry workflow includes:
[0085] (b1) When there are multiple data points at a certain time (assuming the time is t and the previous time is t-1; there are multiple data points at time t, and only one data point at time t-1):
[0086] (b11) Take one data point at time t and calculate its similarity with the data at time t-1 using an automatic correction algorithm. The result is denoted as G. k (k = 1, 2, 3, ..., j), where j is the number of data entries at time t;
[0087] (b12) Take sequence G k The data corresponding to the maximum value is taken as the final data at time t.
[0088] (b2) When data is missing at a certain time (assuming the time is t, the previous time is t-1, the data at time t is missing, and there is only one data at time t-1): the data at time t is used to carry over the data at time t-1.
[0089] The automatic correction algorithm formula is as follows:
[0090]
[0091] In the formula, n represents the number of fields in a single data entry; M i M represents the comparison result between the i-th field value of a data point at time t and the corresponding field value of the data at time t-1. If they are the same, then M... i G is 1 if it is not 0 otherwise. k This represents the similarity between a data point at time t and a data point at time t-1.
[0092] Furthermore, the time series permutation Q is obtained. tThen, the sorted data is normalized using key information data such as STARTTIME, ENDTIME, MME_UE_S1AP_ID, ENB_UE_S1AP_ID, and ECI to obtain new key information data KEYs. These KEYs are then stored using MSISDN, IMEI, and IMSI as their values. The new key information data KEYs reflect the linear relationship of the key information data over time. The normalization formula is as follows:
[0093]
[0094] The key information data such as STARTTIME, ENDTIME, MME_UE_S1AP_ID, ENB_UE_S1AP_ID, and ECI in the time series are calculated according to the above normalization formula, and the normalized result x' is used as the unique new data KEY.
[0095] S3: Obtain the raw MR data from the mobile terminal, aggregate and classify the raw MR data, and flatten the aggregated and classified raw MR data to obtain flattened MR data;
[0096] Specifically, raw MR data from mobile terminals is collected, preserving its structure and type. Association information is obtained from the ECI (Electronic Content Identification) information carried in the filenames. Based on this association information, the raw MR data is packaged into smaller files for initial aggregation. Then, it is classified according to multiple dimensions, including different manufacturers, different data collection server addresses, MRE (Medium-Related Equipment) data, and MRO (Medium-Related Equipment) data, to obtain MR classification data.
[0097] Furthermore, the classified MR data is processed using a dynamic weighting algorithm to flatten the differences in size and type of the MR data, resulting in flattened MR data. This flattened MR data is then stored in a queue to be processed, which reduces the possibility of data skew in subsequent big data processing and speeds up the parsing of MR data.
[0098] Preferably, the weight dynamic algorithm is a data transmission and delivery algorithm, and the result of the algorithm is idx, which is the specific index value of the flattened MR data placed in the queue to be processed. The formula of the weight dynamic algorithm is:
[0099] when max(seq) = min(seq):
[0100] idx = Rand(0, len(seq))
[0101] When max(seq) ≠ min(seq):
[0102] min_val = min(seq)
[0103] min_indices=seq.index(min_val)
[0104] rand_id x=Rand(0,len(min_indices))
[0105] idx = min_indices[rand_idx]
[0106] In the formula, seq is a sequence of values composed of several queues, that is, the number of MR classification data to be processed in each program in the existing data storage program; len(seq) is the number of elements contained in seq; Rand(0,len(seq)) is a randomly generated integer in the range [0,len(seq)); idx is the target result, the index of the queue to which the new task should be placed, that is, the newly generated MR classification data; max(seq) is the maximum value in seq; min(seq) is the minimum value in seq; seq.index(min_val) is the index of all elements in seq whose value is equal to min_val; min_indices[rand_idx] is the value at index rand_idx in min_indices.
[0107] The workflow of the dynamic weighting algorithm includes:
[0108] (c1) When a new task arrives, query the number of tasks in each task queue;
[0109] (c2) When the maximum and minimum number of tasks in all task queues are the same, the new task is randomly added to one of the queues.
[0110] (c3) When the maximum and minimum number of tasks in all task queues are different, select the part of the task queues corresponding to the minimum number of tasks, and randomly add the new task to one of the queues in this part of the task queues.
[0111] Preferably, after the original MR data is differentially flattened, the flattened MR data is stored using the HDFS component in Hadoop for big data, and the full storage path is output for subsequent use.
[0112] The full path of the flattened MR raw data output (e.g., the data address path stored in the HDFS component of Hadoop) is used with a weighted dynamic processing algorithm to obtain the result value of the data transmission and delivery for the new task, and the stored full path is then used to deliver the data according to this result value.
[0113] S4: The linear relationship between the flattened MR data and the key information data is fused to backfill the MR data with information.
[0114] Specifically, the flattened MR data is retrieved from the queue to be processed, and the flattened MR data is parsed. Based on the MRE and MRO in the flattened MR data, the files are decompressed, XML files are read, and data is parsed according to the corresponding specifications to generate a new data KEY for the flattened MR data.
[0115] At the same time, key information data in the time series can be acquired in real time. The acquisition time interval can be set according to the actual operation. For example, it can be set to acquire key information data in the time series every 60 seconds.
[0116] Furthermore, based on the new data KEY of the flattened data, a binary search sequential matching algorithm is used to match the KEY values of the flattened data with the acquired key information data. If a match is successful, the matching key information data is backfilled into the MR data.
[0117] The formula for the binary search sequential matching algorithm is:
[0118]
[0119] when L≤R:when seq[mid]>item:
[0120] R = mid-1
[0121] when seq[mid]<item:
[0122] L = mid + 1
[0123] when seq[mid] = item:
[0124] Y = mid
[0125] In the formula, L is the coordinate of the left endpoint of the search interval (i.e., the leftmost / frontmost data in the new data KEY of the key information data); R is the coordinate of the right endpoint of the search interval (i.e., the rightmost / backmost data in the new data KEY of the key information data); seq is the existing data sequence (i.e., the data length in the new data KEY of the key information data), which has been sorted in ascending order of data size; item is the new data (i.e., the new data KEY of the flattened MR data); Y is the final result (i.e., the result value found using the new data KEY of the flattened MR data). If the new data is found in the existing data sequence, it is the coordinate of the corresponding data in the existing data sequence; otherwise, it is empty.
[0126] The workflow of the binary search sequential matching algorithm includes: taking the new data KEY corresponding to the key information data of SI-MME data and the new data KEY corresponding to the key information data of S1UHTTP data as two sets of data, and performing the following operations on each set of data in sequence (if a set of data is successfully searched, the process terminates and returns):
[0127] (d1) Data preprocessing: (d11) Existing data preprocessing: Sort the existing data in ascending order of time; (d12) New data preprocessing: Obtain the time of the new data.
[0128] (d2) Search for new data in existing data by time: Each time, take the middle data of the interval to be searched and compare it with the new data by time.
[0129] (d3) If a match is found, return the current index of the existing data.
[0130] (d4) If no match is found, if the time of the current intermediate data is greater than that of the new data, the right endpoint of the search interval is changed to the index of the current intermediate data - 1; if the time of the current intermediate data is less than that of the new data, the left endpoint of the search interval is changed to the index of the current intermediate data + 1.
[0131] (d5) If the left endpoint of the search interval is to the right of the right endpoint of the search interval, the algorithm terminates. Otherwise, go to step (d2).
[0132] This embodiment uses a big data real-time processing program for flat MR data. First, it parses the MRE and MEO of flat MR data from different manufacturers and of different types to generate new data keys for the flat MR data. Then, based on the new data keys of the flat MR data, it uses the aforementioned binary search sequential matching algorithm to match them with the new data keys of the key information data. The first sequential matching is performed using S1-MME. If the first sequential matching fails, the stored S1UHTTP is used for the second sequential matching. The matching key information data is then backfilled into the MR data. Finally, the result value is output to obtain new MR data. The new MR data effectively backfills key information without going through the association of hundreds of millions of data points.
[0133] Example 2
[0134] like Figure 2 As shown, a big data-based MR information backfilling system includes:
[0135] Key information extraction module: used to acquire S1-MME data and S1UHTTP data from the mobile terminal, and extract key information data based on the S1-MME data and S1UHTTP data;
[0136] Data classification and normalization module: used to classify and normalize the key information data based on time series to obtain the linear relationship of the key information data;
[0137] MR data acquisition module: used to acquire raw MR data from mobile terminals, aggregate and classify the raw MR data, and flatten the aggregated and classified raw MR data to obtain flattened MR data;
[0138] MR Information Backfilling Module: This module is used to fuse the linear relationship between the flattened MR data and the key information data to backfill the MR data with information.
[0139] Furthermore, the key information extraction module is specifically used for:
[0140] Collect XDR data from the mobile terminal, and obtain S1-MME data and S1UHTTP data from the XDR data;
[0141] Extract key information data based on the S1-MME data and S1UHTTP data;
[0142] The key information data includes, but is not limited to, MME_UE_S1AP_ID data, ENB_UE_S1AP_ID data, MSISDN data, IMEI data, IMSI data, STARTTIME data, ENDTIME data, and ECI data.
[0143] Furthermore, the data classification and normalization module is specifically used for:
[0144] Obtain a preset time interval range, process the key information data within the time interval range for time attributes, and uniformly assign the time attribute values of the key information data.
[0145] Historical data is obtained from the associated information data after processing based on time attributes, and the ECI distribution of S1-MME data and S1UHTTP data is analyzed based on the historical data.
[0146] Based on the ECI distribution, the ECI classification sequence of S1-MME data and S1UHTTP data is calculated using a balanced allocation algorithm to obtain the ECI classification results of S1-MME data and S1UHTTP data.
[0147] Based on the ECI classification results, data imputation and automatic correction algorithms were used to process the S1-MME data and S1UHTTP data to obtain a time series arrangement.
[0148] The time series arrangement is normalized to obtain new data KEY for key information data, which reflects the linear relationship of key information data in the time series.
[0149] Furthermore, the MR data acquisition module is specifically used for:
[0150] Collect raw MR data from the mobile terminal and obtain the ECI association information of the raw MR data;
[0151] Based on the ECI association information, the original MR data is aggregated and classified to obtain MR classification data;
[0152] The MR classification data is flattened using a weighted dynamic algorithm to obtain flattened MR data, which is then stored in a queue to be processed.
[0153] Furthermore, the MR information backfilling module is specifically used for:
[0154] Take the flattened MR data from the queue to be processed, and parse the flattened MR data to obtain the new data KEY of the flattened MR data;
[0155] Real-time acquisition of key information data in the time series arrangement;
[0156] Based on the new data KEY of the flattened MR data, a binary search sequential matching algorithm is used to match the KEY values of the flattened MR data with the acquired key information data, and the matching key information data is backfilled into the flattened MR data according to the matching results.
[0157] It should be noted that for a more detailed workflow of the big data-based MR information backfilling system, please refer to the aforementioned method implementation section, which will not be repeated here.
[0158] This invention extracts key information data from S1-MME and S1UHTTP data, performs time-series sorting and normalization to obtain the linear relationship of key information data over time, then flattens the differences in the original MR data, and finally fuses the flattened original MR data with the linear relationship of the key information data to output the final structured MR backfill data. Using big data real-time processing technology, it solves the problem of long data association time between hundreds of millions of data points, and the application of linear time queues can significantly improve the backfill rate and accuracy, thereby enabling MR data to carry key information in a timely manner, reducing the association between hundreds of millions of data points, simplifying the backfill processing process, and meeting the support requirements of business systems.
[0159] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention, and they should all be covered within the scope of the claims and specification of the present invention.
Claims
1. A method for backfilling MR information based on big data, characterized in that, include: Acquire S1-MME data and S1UHTTP data from the mobile terminal, and extract key information data based on the S1-MME data and S1UHTTP data; The key information data is classified and normalized based on time series to obtain a linear relationship between the key information data. Obtain raw MR data from the mobile terminal, aggregate and classify the raw MR data, and flatten the aggregated and classified raw MR data to obtain flattened MR data. The linear relationship between the flattened MR data and the key information data is fused to perform information backfilling on the MR data; The process of acquiring the S1-MME data and S1UHTTP data of the mobile terminal, and extracting key information data based on the S1-MME data and S1UHTTP data, specifically involves: Collect XDR data from the mobile terminal, and obtain S1-MME data and S1UHTTP data from the XDR data; Extract key information data based on the S1-MME data and S1UHTTP data; The key information data includes, but is not limited to, MME_UE_S1AP_ID data, ENB_UE_S1AP_ID data, MSISDN data, IMEI data, IMSI data, STARTTIME data, ENDTIME data, and ECI data; The step of classifying and normalizing the key information data based on time series to obtain a linear relationship between the key information data is as follows: Obtain a preset time interval range, process the key information data within the time interval range for time attributes, and uniformly assign the time attribute values of the key information data. Based on the key information data processed by time attributes, historical data is obtained, and the ECI distribution of S1-MME data and S1UHTTP data is analyzed based on the historical data. Based on the ECI distribution, the ECI classification sequence of S1-MME data and S1UHTTP data is calculated using a balanced allocation algorithm to obtain the ECI classification results of S1-MME data and S1UHTTP data. Based on the ECI classification results, data imputation and automatic correction algorithms were used to process the S1-MME data and S1UHTTP data to obtain a time series arrangement. The time series arrangement is normalized to obtain new data KEY for key information data, which reflects the linear relationship of key information data in the time series.
2. The method for backfilling MR information based on big data according to claim 1, characterized in that, The process of acquiring raw MR data from the mobile terminal, aggregating and classifying the raw MR data, and then flattening the aggregated and classified raw MR data to obtain flattened MR data is as follows: Collect raw MR data from the mobile terminal and obtain the ECI association information of the raw MR data; Based on the ECI association information, the original MR data is aggregated and classified to obtain MR classification data; The MR classification data is flattened using a weighted dynamic algorithm to obtain flattened MR data, which is then stored in a queue to be processed.
3. The method for backfilling MR information based on big data according to claim 2, characterized in that, The process of fusing the linear relationship between the flattened MR data and the key information data to backfill information into the MR data specifically involves: Take the flattened MR data from the queue to be processed, and parse the flattened MR data to obtain the new data KEY of the flattened MR data; Real-time acquisition of key information data in the time series arrangement; Based on the new data KEY of the flattened MR data, a binary search sequential matching algorithm is used to match the KEY values of the flattened MR data with the acquired key information data, and the matching key information data is backfilled into the flattened MR data according to the matching results.
4. A big data-based MR information backfilling system, characterized in that, include: Key information extraction module: used to acquire S1-MME data and S1UHTTP data from the mobile terminal, and extract key information data based on the S1-MME data and S1UHTTP data; Data classification and normalization module: used to classify and normalize the key information data based on time series to obtain the linear relationship of the key information data; MR data acquisition module: used to acquire raw MR data from mobile terminals, aggregate and classify the raw MR data, and flatten the aggregated and classified raw MR data to obtain flattened MR data; MR information backfilling module: used to fuse the linear relationship between the flattened MR data and key information data to backfill the MR data; The key information extraction module is specifically used for: Collect XDR data from the mobile terminal, and obtain S1-MME data and S1UHTTP data from the XDR data; Extract key information data based on the S1-MME data and S1UHTTP data; The key information data includes, but is not limited to, MME_UE_S1AP_ID data, ENB_UE_S1AP_ID data, MSISDN data, IMEI data, IMSI data, STARTTIME data, ENDTIME data, and ECI data; The data classification and normalization module is specifically used for: Obtain a preset time interval range, process the key information data within the time interval range for time attributes, and uniformly assign the time attribute values of the key information data. Based on the key information data processed by time attributes, historical data is obtained, and the ECI distribution of S1-MME data and S1UHTTP data is analyzed based on the historical data. Based on the ECI distribution, the ECI classification sequence of S1-MME data and S1UHTTP data is calculated using a balanced allocation algorithm to obtain the ECI classification results of S1-MME data and S1UHTTP data. Based on the ECI classification results, data imputation and automatic correction algorithms were used to process the S1-MME data and S1UHTTP data to obtain a time series arrangement. The time series arrangement is normalized to obtain new data KEY for key information data, which reflects the linear relationship of key information data in the time series.
5. The MR information backfilling system based on big data according to claim 4, characterized in that, The MR data acquisition module is specifically used for: Collect raw MR data from the mobile terminal and obtain the ECI association information of the raw MR data; Based on the ECI association information, the original MR data is aggregated and classified to obtain MR classification data; The MR classification data is flattened using a weighted dynamic algorithm to obtain flattened MR data, which is then stored in a queue to be processed.
6. The MR information backfilling system based on big data according to claim 5, characterized in that, The MR information backfilling module is specifically used for: Take the flattened MR data from the queue to be processed, and parse the flattened MR data to obtain the new data KEY of the flattened MR data; Real-time acquisition of key information data in the time series arrangement; Based on the new data KEY of the flattened MR data, a binary search sequential matching algorithm is used to match the KEY values of the flattened MR data with the acquired key information data, and the matching key information data is backfilled into the flattened MR data according to the matching results.