Multi-dimensional anomaly detection method and system for carrier-grade short message charging system
By constructing a six-dimensional tensor structure and using dynamic feature scoring, the problem of low anomaly detection accuracy in carrier-grade SMS billing systems is solved, enabling timely identification and processing of complex billing anomalies and improving the accuracy and efficiency of the billing system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN YINGJIETONG INFORMATION TECHNOLOGY CO LTD
- Filing Date
- 2025-04-28
- Publication Date
- 2026-06-16
AI Technical Summary
Existing carrier-grade SMS billing systems suffer from low data processing efficiency and insufficient anomaly detection capabilities when faced with ever-increasing SMS traffic, leading to decreased billing accuracy and an inability to promptly identify complex billing anomalies.
A multi-dimensional anomaly detection method is adopted. By extracting a six-dimensional tensor structure, the actual billing result of the SMS event is matched with the tensor return result. Combined with feature extraction and feature deviation, a dynamic feature score is constructed to generate a total anomaly score, thereby improving the accuracy of anomaly detection.
It improves the accuracy of anomaly detection in the SMS billing system, enabling timely identification and handling of billing anomalies, thereby enhancing user experience and improving the operator's economic benefits.
Smart Images

Figure CN120238835B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of billing system detection technology, specifically to a multi-dimensional anomaly detection method and system for carrier-grade SMS billing systems. Background Technology
[0002] Carrier-grade SMS billing systems are primarily responsible for accurately billing SMS services to ensure the accuracy of user consumption records. However, existing traditional SMS billing systems often suffer from low data processing efficiency and insufficient anomaly detection capabilities when faced with the ever-increasing volume of SMS traffic. This leads to decreased billing accuracy and the potential for undetected anomalies, impacting user experience and the operator's economic benefits. Most existing anomaly detection methods focus on single-dimensional analysis, such as basic information about the sender and receiver, lacking comprehensive analysis and processing of multi-dimensional data, making it difficult to identify complex billing anomalies in a timely manner. Furthermore, traditional billing anomaly detection methods generally rely on static rules or simple statistical analysis, which are ill-suited to handle the ever-changing SMS service scenarios and network environments. Summary of the Invention
[0003] This application provides a multi-dimensional anomaly detection method and system for carrier-grade SMS billing systems, which solves the technical problem of low accuracy in SMS billing anomaly detection in the prior art.
[0004] The first aspect of this application provides a multi-dimensional anomaly detection method for a carrier-grade SMS billing system, the method comprising:
[0005] After parsing the operator's billing file, a six-dimensional tensor structure is extracted. This six-dimensional tensor structure includes the sending operator, receiving operator, sending location, receiving location, SMS type, and user package type. When an SMS event arrives, the attribute six-tuple of the SMS event is extracted as a tensor index. The tensor unit corresponding to the six-dimensional tensor structure is accessed, and a matching calculation is performed between the actual billing result of the SMS event and the tensor return result to generate a matching calculation result. Using the SMS event as the extraction center, a time window is created, and SMS samples within the time window are called. The SMS samples are used to establish the feature time window weights of the SMS event. Feature extraction of the SMS event is performed, and the feature deviation of each feature is calculated using the feature extraction results. A dynamic feature score is constructed using the feature deviation and the feature time window weights. A total anomaly score is generated using the matching calculation results and the dynamic feature score.
[0006] A second aspect of this application provides a multi-dimensional anomaly detection system for a carrier-grade SMS billing system, the system comprising:
[0007] The data extraction module extracts a six-dimensional tensor structure after parsing the operator's billing file. This six-dimensional tensor structure includes the sending operator, receiving operator, sending location, receiving location, SMS type, and user package type. The matching module extracts the six-tuple attributes of the SMS event as a tensor index upon arrival, accesses the corresponding tensor unit of the six-dimensional tensor structure, performs a matching calculation between the actual billing result of the SMS event and the tensor return result, and generates a matching calculation result. The weight establishment module creates a time window using the SMS event as the extraction center, calls SMS samples within the time window, and uses the SMS samples to establish the feature time window weights of the SMS event. The calculation module performs feature extraction of the SMS event and calculates the feature deviation of each feature using the feature extraction results. The feature scoring construction module constructs a dynamic feature score using the feature deviation and the feature time window weights. The score generation module generates a total anomaly score using the matching calculation results and the dynamic feature score.
[0008] One or more technical solutions provided in this application have at least the following technical effects or advantages:
[0009] After parsing the operator's billing file, a six-dimensional tensor structure is extracted. This structure includes the sending operator, receiving operator, sending location, receiving location, SMS type, and user package type. When an SMS event arrives, the six-tuple of attributes of the SMS event is extracted as a tensor index. The corresponding tensor unit of the six-dimensional tensor structure is accessed, and a matching calculation is performed between the actual billing result of the SMS event and the tensor return result to generate a matching calculation result. Next, using the SMS event as the extraction center, a time window is created, and SMS samples within the time window are called. The SMS samples are used to establish the feature time window weights for the SMS event. Then, feature extraction of the SMS event is performed, and the feature deviation of each feature is calculated using the feature extraction results. Furthermore, a dynamic feature score is constructed using the feature deviation and feature time window weights. Finally, a total anomaly score is generated using the matching calculation results and the dynamic feature score. This solves the technical problem of low accuracy in SMS billing anomaly detection in existing technologies and achieves the technical effect of improving the accuracy of SMS billing anomaly detection. Attached Figure Description
[0010] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0011] Figure 1A schematic diagram of the multi-dimensional anomaly detection method for a carrier-grade SMS billing system provided in this application embodiment;
[0012] Figure 2 This is a schematic diagram of the structure of a multi-dimensional anomaly detection system for a carrier-grade SMS billing system provided in an embodiment of this application.
[0013] Figure labeling: Data extraction module 11, matching module 12, weight establishment module 13, calculation module 14, feature score construction module 15, score generation module 16. Detailed Implementation
[0014] This application solves the technical problem of low accuracy in SMS billing anomaly detection in the prior art by providing a multi-dimensional anomaly detection method and system for carrier-grade SMS billing systems.
[0015] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this application without creative effort are within the scope of protection of this application.
[0016] It should be noted that the terms "comprising" and "having" are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or server that includes a series of steps or units is not necessarily limited to those steps or units that are explicitly listed, but may include other steps or modules that are not explicitly listed or that are inherent to these processes, methods, products, or devices.
[0017] Example 1, as Figure 1 As shown, this application provides a multi-dimensional anomaly detection method for a carrier-grade SMS billing system, wherein the method includes:
[0018] After parsing the operator's billing file, a six-dimensional tensor structure is extracted. The six-dimensional tensor structure includes the sending operator, the receiving operator, the sending location, the receiving location, the SMS type, and the user's package type.
[0019] Operators' billing files are usually stored in specific formats (such as CSV, XML, etc.) and contain a large amount of data related to SMS billing.
[0020] For billing files of different formats, select the appropriate parsing method. For example, for CSV files, a parsing method based on delimiters (such as commas) can be used to split each line of data in the file into multiple fields; for XML files, an XML parsing library (such as Python's ElementTree library) can be used to parse according to the hierarchical structure of XML and extract the values of each node and attribute.
[0021] After parsing the operator's billing file, the various data points in the billing file are extracted and preprocessed to obtain basic information related to SMS events. Through structured analysis of the data, a six-dimensional tensor structure is constructed, which includes the sending operator, receiving operator, sending location, receiving location, SMS type, and user plan type. Specifically, the sending operator and receiving operator represent the operator identifiers of the sending and receiving SMS messages, reflecting the operator type of the initiator and receiver; the sending location and receiving location record the geographical location of the SMS message, usually in units of provinces, cities, or more granular geographical information, to identify the geographical distribution of the SMS messages; the SMS type specifies the specific category of the SMS message, such as regular SMS, MMS, verification code SMS, etc.; and the user plan type refers to the billing plan selected by the user, which determines the SMS billing rules, fee standards, and possible preferential policies.
[0022] Furthermore, after parsing the operator's billing files, a six-dimensional tensor structure is extracted, including:
[0023] Based on the analysis results, the duration of the billing rules is evaluated and a general comparison is performed. The baseline tensor structure is determined using the results of the duration evaluation and the general comparison. After determining the baseline tensor structure, incremental learning of all analysis results relative to the baseline tensor structure is performed. A six-dimensional tensor structure is extracted based on the incremental learning results and the baseline tensor structure.
[0024] In this embodiment, the billing rule-related fields in the parsed results are evaluated for rule duration to analyze the stability and duration of each billing rule in historical data. Specifically, the system calculates the duration of each rule by statistically analyzing the start and end times of its appearance in the billing file (or the current time if the rule is still in use). Simultaneously, it analyzes the usage frequency and changes of the rule over different time periods by combining historical billing file data. For example, if a billing rule has been continuously used and its usage frequency is stable over the past year, it is considered to have a high duration evaluation score; conversely, if the rule changes frequently or has a short usage period, the score is lower.
[0025] The general comparison evaluation compares the current billing rules with industry-wide accepted standard billing rules to assess their compliance and universality. The system compares each parsed billing rule with the corresponding rule in a pre-set standard billing rule library, which contains standard billing rules for various SMS services in different scenarios. Based on the comparison results, a general comparison evaluation score is calculated. A higher score is awarded if the current rule is highly consistent with the standard rule; a lower score is awarded if there are significant differences.
[0026] After completing the above evaluations, a baseline tensor structure is determined based on the results of the rule duration evaluation and the general comparison evaluation. As a standard template, the baseline tensor structure represents the typical pattern of billing rules across various dimensions, providing basic reference values and expected behavior to ensure that subsequent billing calculations can be performed according to a unified standard.
[0027] Specifically, the system comprehensively ranks each billing rule based on its rule duration evaluation score and general comparison evaluation score. It selects combinations of billing rules with higher scores, better stability, and stronger compliance as the initial foundation for the baseline tensor structure. These billing rule combinations are then mapped into a six-dimensional tensor structure, determining the baseline value range for each dimension based on six dimensions: sender operator, receiver operator, sending location, receiving location, SMS type, and user package type. For example, under the sender operator dimension, operators with high usage frequency and stable rules are selected as the baseline value; under the SMS type dimension, common SMS service types are selected as the baseline value, and so on.
[0028] After determining the baseline tensor structure, the system performs incremental learning on all parsed results relative to the baseline tensor structure to improve and update the six-dimensional tensor structure. Specifically, the system compares the billing rule information of each parsed SMS record with the baseline tensor structure; records that are consistent with the baseline tensor structure are considered to conform to normal billing rules and are included in the normal dataset; records that differ from the baseline tensor structure are treated as anomalous data or potential new rule data for further analysis.
[0029] In the incremental learning process, machine learning algorithms (such as clustering and classification algorithms) are used to process anomalous data and potential new rule data. Clustering algorithms group similar anomalous or new rule data together to form different data clusters. Then, classification algorithms are used to classify each data cluster, determining whether it belongs to a new billing rule or a genuine anomaly. Based on the results of incremental learning, the baseline tensor structure is updated and expanded to extract the final six-dimensional tensor structure.
[0030] For data clusters identified as having new billing rules, these new rules are combined and incorporated into a six-dimensional tensor structure by analyzing the corresponding dimensions such as the sending operator, receiving operator, sending location, receiving location, SMS type, and user package type, thus expanding the range of values for the tensor structure. For example, if a new SMS service type emerges in a certain region and its billing rules differ from the existing rules in the baseline tensor structure, then values for this new service type will be added to the SMS type dimension and the corresponding sending and receiving locations dimensions.
[0031] For data clusters identified as anomalous, their characteristics are further analyzed to determine the type and severity of the anomaly. These anomalies are marked in a six-dimensional tensor structure so that subsequent anomaly detection algorithms can focus on monitoring and analyzing them. For example, anomaly flags can be set in the tensor cells, or information such as the frequency and impact range of anomalies can be recorded.
[0032] When an SMS event arrives, the six-tuple of the SMS event's attributes is extracted as a tensor index. The tensor unit corresponding to the six-dimensional tensor structure is accessed, and the matching calculation between the actual billing result of the SMS event and the tensor return result is performed to generate the matching calculation result.
[0033] When an SMS event arrives at the operator's billing system, the event is parsed, its various attributes are extracted, and these attributes are organized into a six-tuple, including the sender's operator, the recipient's operator, the sending location, the receiving location, the SMS type, and the user's package type. These attributes represent the basic information of the SMS event and constitute its unique identifier.
[0034] After extracting the six-tuple attributes of the SMS event, the system uses it as an index to access a pre-constructed six-dimensional tensor structure. This six-dimensional tensor structure is a multi-dimensional data storage structure, where each dimension corresponds to an attribute of the SMS event. Tensor cells store billing-related information corresponding to that attribute combination. The system locates the tensor cell corresponding to the SMS event level by level within the six-tuple based on each attribute value. For example, it determines the index positions of the first two dimensions of the tensor based on the sending and receiving operators, and then determines the index positions of subsequent dimensions based on the sending location, receiving location, SMS type, and user plan type, thus accurately locating the target tensor cell. Once the corresponding tensor cell is accessed, the system extracts the billing return results related to the SMS event from that cell, i.e., the corresponding expected billing data.
[0035] The system compares the actual billing information of the SMS event (such as SMS cost, number of messages sent, etc.) with the predicted billing value returned by the tensor unit; and generates a matching calculation result by calculating the difference between the two.
[0036] For example, the attributes of an SMS event include: sender operator A, receiver operator B, sender location Beijing, receiver location Shanghai, SMS type regular SMS, and user plan type Plan X. For this SMS event, the actual billing information shows an actual cost of 0.1 yuan and a sending volume of 100 messages. Based on historical data stored in the six-dimensional tensor structure, the predicted billing information returns a predicted cost of 0.12 yuan and a predicted sending volume of 95 messages. During the matching calculation process, the system first calculates the cost difference: the difference between the actual cost and the predicted cost is 0.02 yuan (0.12 yuan - 0.1 yuan). Secondly, it calculates the sending volume difference: the difference between the actual sending volume and the predicted sending volume is 5 messages (100 messages - 95 messages). Finally, the system generates the matching calculation result, showing a cost difference of 0.02 yuan and a sending volume difference of 5 messages.
[0037] Furthermore, the matching calculation of the actual billing result and the tensor return result of the SMS event is performed to generate the matching calculation result, including:
[0038] The historical deviation between the actual billing value of the SMS event and the tensor mapping value is retrieved; a lightweight regression model is used to learn the local residual trend of the historical deviation to generate a tensor residual prediction term; and the matching calculation result is corrected based on the tensor residual prediction term.
[0039] By analyzing the historical deviation between the actual billing results of SMS events and the tensor return results, a lightweight regression model is used to learn the local residual trend of the historical deviation, thereby generating a tensor residual prediction term. Based on this prediction term, the matching calculation results are corrected to improve the accuracy of the matching calculation.
[0040] Specifically, the process involves acquiring actual billing results and tensor return results for SMS events; matching these data to calculate the deviation between the actual billing result and the tensor mapping value for each SMS event, using the formula: Deviation = Actual Billing Result - Tensor Mapping Value; extracting relevant features from historical deviation data, such as time features (hours, days of the week, months, etc.), SMS type features, and user features (such as user level, user region, etc.); encoding and normalizing the extracted features to facilitate model training; selecting lightweight regression models, such as linear regression, decision tree regression, and support vector machine regression (SVM regression); dividing the historical deviation data into training and testing sets; training the selected regression model using the training set and optimizing its performance by adjusting model parameters; and evaluating the trained model using the testing set, employing metrics such as mean squared error (MSE) and mean absolute error (MAE) to measure the model's prediction accuracy. After model training, the trained model is used to predict new SMS events, yielding tensor residual prediction terms. These terms represent the deviation between the predicted actual billing result and the tensor mapping value. The generated tensor residual prediction terms are then combined with the matching calculation results to correct the results. The correction formula is: Corrected matching calculation result = Preliminary matching calculation result + Tensor residual prediction term. The corrected matching calculation results are then checked for reasonableness to ensure they remain within a reasonable range. For example, the corrected billing amount cannot be negative.
[0041] Using the SMS event as the extraction center, a time window is created, and SMS samples within the time window are called. The SMS samples are then used to establish the feature time window weights of the SMS event.
[0042] In this embodiment, a time window is defined centered on the SMS event, covering a time range related to the current SMS event. The size of the time window can be dynamically adjusted according to actual needs; for example, the start and end times of the time window can be set based on the time period of SMS sending or the billing cycle.
[0043] The system retrieves SMS samples from the operator's SMS service database, based on a predefined time window. Using these samples, the system extracts features related to the SMS events, such as the number of SMS messages sent, the time period during which they were sent, user behavior patterns, and billing amounts. Through comprehensive analysis of these features, the system assigns a weight to each feature, reflecting its relative importance within the current time window.
[0044] For example, a 24-hour time window is created, and SMS samples are extracted within that 24-hour period. Each SMS sample includes information such as the sender's operator, the recipient's operator, the SMS type, the user's plan type, and the number of messages sent. Features of the SMS samples are extracted, such as the number of messages sent, the sending time, and the user's plan type. The number of messages sent within this time window ranges from 90 to 120, the sending time is mainly concentrated between 12 PM and 2 PM, and all SMS samples belong to Plan X. Based on these features, the system assigns a weight to each feature. For example, the number of messages sent has a high weight (0.4) because most SMS samples within the time window are concentrated around 100; the sending time has a weight of 0.35 because the peak period from noon to afternoon has a significant impact; and the plan type has a low weight (0.25).
[0045] Furthermore, the feature time window weights of the SMS events are established using the SMS samples, including:
[0046] Perform six-tuple dimension embedding encoding on historical SMS samples to establish a dense vector representation; combine the dense vector representations pairwise to construct cross-feature terms; input the cross-feature terms into an attention network to perform weighted influence learning on the contribution of cross-combinations to anomaly detection; generate a combined dynamic feature weighted tensor based on the weighted influence learning results; and compensate the feature time window weights based on the combined dynamic feature weighted tensor.
[0047] The six-tuple dimension information is extracted from historical SMS samples, including the sender's operator, the recipient's operator, the sending location, the receiving location, the SMS type, and the user's data plan type. For example, for an SMS, the sender's operator is "China Mobile", the recipient's operator is "China Unicom", the sending location is "Beijing", the receiving location is "Shanghai", the SMS type is "regular SMS", and the user's data plan type is "unlimited chat data plan".
[0048] Embedding techniques are used to transform discrete categorical variables in a six-tuple dimension into continuous vector representations. For each dimension, the size of the embedding vector is determined based on the number of categories. For example, for the sender operator dimension, if there are three different operators, the embedding vector dimension can be set to 2 or 3. The vectors obtained after embedding encoding of the six-tuple dimension are concatenated to obtain a dense vector representation of each SMS sample.
[0049] The system combines these dense vector representations pairwise to construct cross-feature terms. By combining all dense vectors in the six-tuple dimension, multiple cross-feature terms are generated to reflect the relationships between different features. For example, the combination of the sending operator and the receiving operator may reveal differences in their billing rules, while the combination of the sending location and the SMS type may reflect regional differences in charges.
[0050] The system inputs these cross-feature terms into an attention network to learn the weighted impact of cross-feature combinations on anomaly detection. The attention mechanism automatically assigns a weight to each cross-feature term based on its frequency of occurrence in historical data and its influence on anomaly detection. This weight reflects the importance of the cross-feature in anomaly detection. The system can identify which feature combinations contribute significantly to the prediction of billing anomalies and accordingly strengthen their weights, reducing attention to irrelevant feature combinations. After learning, the system generates a combined dynamic feature weighting tensor based on the weighted impact learning results. This tensor contains the weighted results of all cross-feature terms, dynamically reflecting the contribution of each cross-feature to anomaly detection within the current time window. Finally, the system compensates for the feature time window weights based on the generated combined dynamic feature weighting tensor. By dynamically adjusting the features within the time window, the system can optimize the influence of each feature based on the learned feature combinations and weights, making anomaly detection more accurate and efficient.
[0051] Preferably, an attention network is constructed to learn the weighted influence of cross-feature combinations on the contribution of anomaly detection. The attention network typically consists of multiple neural network layers, including an input layer, hidden layers, and an output layer. The input layer receives cross-feature terms as input, the hidden layers can employ fully connected layers, convolutional layers, recurrent neural network layers, etc., to extract high-level features from the cross-feature terms, and the output layer outputs the weights of each cross-feature term. Specifically, a training dataset is prepared, including cross-feature terms and corresponding anomaly detection labels. Anomaly detection labels can be generated manually or using rule-based methods. The attention network is trained using the training dataset, employing an appropriate loss function (such as the cross-entropy loss function) and optimization algorithm (such as stochastic gradient descent) to minimize the prediction error. During training, the attention network automatically adjusts the weights of each cross-feature term by learning the relationship between the cross-feature terms and the anomaly detection labels, giving higher weights to cross-feature terms that contribute significantly to anomaly detection.
[0052] Based on the weights of each cross-feature term output by the attention network, all cross-feature terms are weighted and aggregated. A weighted average approach can be used, multiplying each cross-feature term by its corresponding weight and summing the results to obtain a combined dynamic feature vector. This combined dynamic feature vector is then organized according to the order of the SMS samples to construct a combined dynamic feature weighted tensor. The tensor's dimensions can be set according to specific needs; for example, it can be a three-dimensional tensor, where the first dimension represents the index of the SMS sample, and the second and third dimensions represent the dimensions of the combined dynamic feature vector.
[0053] When constructing feature time window weights, methods such as average weighting and time distance-based weighting can be used to calculate the initial feature time window weights. Then, element-wise multiplication and weighted summation are used to fuse the combined dynamic feature weight tensor with the initial feature time window weights to compensate for the feature time window weights. For example, using weighted summation, each element in the combined dynamic feature weight tensor is multiplied by the corresponding element in the initial feature time window weight, and then summed to obtain the compensated feature time window weights.
[0054] Furthermore, using the aforementioned SMS event as the extraction center, a time window is created, including:
[0055] Context-aware analysis is performed on the SMS event to establish context-aware constraints; the time window is reconstructed using the context-aware constraints, and the weights of time points within the time window are adjusted.
[0056] In this application embodiment, multi-dimensional scenario elements are extracted. Specifically, the historical behavior patterns of SMS senders and receivers are analyzed, including SMS sending frequency, sending time preferences (e.g., weekdays, weekends, daytime, nighttime), and frequently used contacts. For example, if a user typically communicates frequently with a specific contact between 8 PM and 10 PM, this time period can be considered an active scenario for that user's SMS communication. The business scenario to which the SMS belongs is identified, such as marketing promotion, customer service, and social interaction. Different types of business scenarios have different time patterns and characteristics. For example, marketing promotion SMS may be sent intensively during specific promotional activities, while customer service SMS may be sent immediately after a user encounters a problem. Users in different regions may have different SMS usage habits due to time zone differences, cultural habits, and other factors. For example, users in different time zones may communicate via SMS at different times.
[0057] Based on the extracted multi-dimensional contextual elements, data mining techniques such as cluster analysis and association rule mining are employed to identify common contextual patterns. For example, the contextual pattern of "users socializing with friends on weekend evenings" and "users receiving customer service SMS messages during weekday days" are identified. Based on the identified contextual patterns, context-aware constraints are established, including time range constraints, event sequence constraints, and associated event constraints. For example, for the contextual pattern of "users socializing with friends on weekend evenings," the time range constraint could be 7 PM to 11 PM on weekend evenings; the event sequence constraint could be sending a greeting SMS first, followed by multiple interactive SMS exchanges; and the associated event constraint could be that this contextual pattern is typically associated with the user's social activities.
[0058] Based on the occurrence time of the SMS event, an initial time window is set. The size of the initial time window can be set according to business needs and experience, for example, one hour before and after the SMS event. Context-aware constraints are applied to the initial time window to restructure it. If the SMS event meets a certain context-aware constraint, the range and time point weights of the time window are adjusted according to that constraint. The initial time window is expanded or shrunk according to the time range constraint in the context-aware constraints. For example, if the SMS event belongs to the scenario pattern of "users socially interacting with friends on weekend evenings," the time window is adjusted to 7 pm to 11 pm on weekend evenings. The weights of each time point within the time window are adjusted according to the event sequence constraint and related event constraint in the context-aware constraints. For example, in the scenario pattern of "users socially interacting with friends on weekend evenings," the time point of sending a greeting SMS has a higher weight, while the time points of subsequent interactive SMS exchanges have relatively lower weights.
[0059] Preferably, a series of rules are formulated to adjust the weights of time points based on context-aware constraints. For example, if a time point belongs to an important event explicitly defined in the context-aware constraints, its weight is set to a higher value; if a time point is associated with a related event in the context-aware constraints, its weight is adjusted according to the degree of association. A large amount of historical SMS data is collected, and the context pattern and time point weight of each SMS event are labeled. Machine learning algorithms (such as decision trees, neural networks, support vector machines, etc.) are used to train the labeled data to establish a time point weight prediction model. New SMS events are input into the trained model, and the model predicts the weights of each time point within a time window based on the context-aware characteristics of the SMS event.
[0060] Perform feature extraction on the SMS event, and calculate the feature deviation of each feature using the feature extraction results.
[0061] The system extracts relevant features from each SMS event, such as the sender's operator, the recipient's operator, the sending location, the receiving location, the SMS type, the user's package type, the sending time period, the billing amount, the SMS length, and the number of messages sent.
[0062] After feature extraction, the system calculates the feature deviation for each extracted feature. Feature deviation refers to the degree of difference between the value of a feature in the current SMS event and its historical data or expected standard value. Specifically, the system calculates the deviation for each feature based on historical data or benchmark values. For example, if the number of SMS messages sent within a certain time window is significantly higher than the average value in historical data, the system will calculate the deviation of the sending volume at that time point. The larger the deviation, the more likely it is that the sending volume for that event is abnormal. Similarly, if the billing amount for SMS messages is usually around 0.1 yuan, but the billing amount for a certain event is 0.2 yuan, the system will calculate the deviation of that billing amount to determine if there is a billing error or anomaly. If an SMS message is sent at night, while historical data is usually concentrated during the day, the system will also calculate the deviation for that sending period.
[0063] A dynamic feature score is constructed using the feature deviation and the feature time window weight.
[0064] Feature deviation measures the degree to which a feature's value deviates from its historical average or expected value at the current moment. The greater the deviation, the more abnormal the feature's performance at the current moment. Feature time window weights are dynamically calculated based on SMS samples within the time window, reflecting the relative importance of a feature within the current time window.
[0065] By combining feature deviation and time window weights with a weighted average, a dynamic feature score is generated. This score comprehensively considers the feature deviation and time window weights to determine the importance and degree of abnormality of each feature in the current event.
[0066] For example, a certain SMS event has the following characteristics: the sender's operator is Operator A, the receiver's operator is Operator B, the sending location is Beijing, the receiving location is Shanghai, the SMS type is a regular SMS, and the user's plan type is Plan X. The actual billing cost for this SMS event is 0.15 yuan, and the number of messages sent is 150. Based on historical data, the system calculates the historical average number of messages sent as 100 and the historical average billing cost as 0.10 yuan. By comparing the actual billing information of the current event with historical data, the system calculates the deviation of the number of messages sent as 50 (150 - 100) and the deviation of the billing cost as 0.05 yuan (0.15 - 0.10). Next, the system adjusts the feature weights based on the time window. Assuming the current event occurs during peak hours (such as holidays or nighttime peak hours), the system assigns a higher time window weight (0.6) to the number of messages sent and a lower weight to the billing cost (0.4). Then, the system calculates the dynamic feature score for each feature: the dynamic feature score for the amount of data sent is 50 * 0.6 = 30, and the dynamic feature score for the billing cost is 0.05 * 0.4 = 0.02.
[0067] The overall anomaly score is generated using the matching calculation results and dynamic feature scores.
[0068] The system calculates a total anomaly score by weighting the dynamic feature score of each feature and the matching calculation result. For example, suppose the system calculates a dynamic feature score of 30 for the sending volume, a dynamic feature score of 0.02 for the billing cost, and a matching calculation result (i.e., the difference between the actual billing and the predicted billing) of 0.03 yuan. The system will obtain a total anomaly score by weighting and combining these scores, reflecting the overall degree of anomaly of the SMS event. The total anomaly score considers the anomaly impact of each feature, as well as the matching situation between the actual billing and the predicted billing, making the final anomaly score more comprehensive and accurate. The higher the value of the total anomaly score, the greater the degree of anomaly of the SMS event; if the total anomaly score exceeds the set threshold, the system will trigger an alarm or conduct a detailed investigation to promptly handle possible billing errors or other anomalies.
[0069] Furthermore, generating a total anomaly score using the matching calculation results and dynamic feature scoring includes:
[0070] Using the SMS event as the starting point for backtracking, a set of associated SMS messages is established; behavioral trajectories are extracted using the associated SMS messages to establish a behavioral trajectory sequence; joint trajectory development analysis of the SMS event is performed based on the behavioral trajectory sequence to generate trajectory-aware anomalies; and total anomaly score compensation is performed using the trajectory-aware anomalies.
[0071] In this embodiment, the system uses the current SMS event as the starting point for backtracking. It establishes a set of associated SMS messages by backtracking historical SMS samples related to the event. These associated SMS messages are used to extract behavioral trajectories. By analyzing features such as SMS sending patterns, billing details, and sending times, a behavioral trajectory sequence is constructed. This sequence reflects the evolution and trend of the SMS event in historical data. Based on the behavioral trajectory sequence, the system performs joint trajectory development analysis to assess the relationship between the current SMS event and historical trajectories, identify potential abnormal behaviors, and generate trajectory-aware anomalies. The system uses these generated anomalies to compensate for the overall anomaly score. If the trajectory-aware anomaly indicates a significant deviation between the current event's behavioral pattern and historical trajectories, the system adjusts the overall anomaly score based on this anomaly information, making the score more accurate and ensuring the accurate identification and handling of abnormal events.
[0072] Furthermore, using the aforementioned SMS event as the starting point for backtracking, a set of related SMS messages is established, including:
[0073] Perform SMS event analysis under six-dimensional attributes to establish an event semantic vector; use the event semantic vector to perform association matching at the tracing starting point to generate a first association result; obtain the spatiotemporal location data of the SMS event, and perform association matching at the tracing starting point based on the spatiotemporal location data to generate a second association result; use the first association result and the second association result to filter SMS messages and establish an associated SMS set.
[0074] Preferably, the system uses SMS events as the starting point for backtracking and performs correlation backtracking. By performing SMS event analysis under six attributes, it constructs event semantic vectors, transforming the six main attributes of SMS events (such as sender operator, receiver operator, sender location, receiver location, SMS type, and user package type) into high-dimensional vector representations to capture the basic information of the events and their inherent correlations. Then, the system uses these event semantic vectors to perform correlation matching at the backtracking starting point, generating a first correlation result that identifies historical SMS events most semantically similar to the current SMS event. Simultaneously, the system also acquires the spatiotemporal location data of the SMS events, such as sending time and geographical location, and performs correlation matching based on this information at the backtracking starting point to generate a second correlation result, revealing historical SMS events similar to the current event in terms of spatiotemporal characteristics. Finally, the system integrates the first and second correlation results and filters the SMS messages to establish a final set of correlated SMS messages. This set of correlated SMS messages reflects historical events similar to the current SMS event in terms of semantic and spatiotemporal characteristics.
[0075] Furthermore, after generating a total anomaly score using the matching calculation results and dynamic feature scoring, the process includes:
[0076] Anomaly levels are matched based on the total anomaly score. An early warning scheme is established based on the anomaly level matching results and anomalies. The early warning scheme includes an early warning issuance scheme and a control response scheme. Account management is performed for the corresponding user based on the early warning scheme, and early warning notifications are sent out.
[0077] In this embodiment, after generating a total anomaly score using the matching calculation results and dynamic feature scoring, the system performs anomaly level matching based on the total anomaly score. According to the score, the system classifies SMS events into different anomaly levels, such as low risk, medium risk, and high risk, and sets corresponding countermeasures based on the anomaly level matching results. The system combines the anomaly level matching results and anomalies to formulate an early warning plan, including an early warning alert plan and a control response plan. The early warning alert plan defines how to send an alert to relevant personnel or the system when the anomaly level reaches a set threshold; the control response plan specifies the specific control measures to be taken when an anomaly occurs, such as suspending billing, reviewing billing rules, or implementing other intervention measures. According to the early warning plan, the system manages the corresponding user's account to ensure that anomalies are handled promptly. If an anomaly is related to a user's account, the system will take measures such as freezing the account or suspending service. In addition, the system will also provide early warning notifications, promptly informing relevant personnel or the system of the anomaly detection results to ensure that subsequent processing measures can be taken quickly.
[0078] Furthermore, generating a total anomaly score using the matching calculation results and dynamic feature scoring also includes:
[0079] Set a dynamic density threshold and use the dynamic density threshold to evaluate the anomaly significance of SMS events; correct the total anomaly score based on the anomaly significance evaluation results.
[0080] In generating the overall anomaly score using matching calculation results and dynamic feature scoring, the system also sets a dynamic density threshold and uses this threshold to evaluate the anomaly significance of SMS events. The dynamic density threshold is dynamically calculated based on historical data and the characteristics of real-time SMS events, reflecting the density of event anomalies within a given time window. When the anomaly severity of an SMS event exceeds this threshold, the system considers it a significant anomaly and prioritizes its handling. Based on the anomaly significance evaluation results, the system corrects the overall anomaly score. If an SMS event has high anomaly significance, the system increases its anomaly score to better reflect the event's risk; if the significance is low, the anomaly score is reduced accordingly.
[0081] In summary, the embodiments of this application have at least the following technical effects:
[0082] After parsing the operator's billing file, a six-dimensional tensor structure is extracted. This structure includes the sending operator, receiving operator, sending location, receiving location, SMS type, and user package type. When an SMS event arrives, the six-tuple of attributes of the SMS event is extracted as a tensor index. The corresponding tensor unit of the six-dimensional tensor structure is accessed, and a matching calculation is performed between the actual billing result of the SMS event and the tensor return result to generate a matching calculation result. Next, using the SMS event as the extraction center, a time window is created, and SMS samples within the time window are called. The SMS samples are used to establish the feature time window weights for the SMS event. Then, feature extraction of the SMS event is performed, and the feature deviation of each feature is calculated using the feature extraction results. Furthermore, a dynamic feature score is constructed using the feature deviation and feature time window weights. Finally, a total anomaly score is generated using the matching calculation results and the dynamic feature score. This solves the technical problem of low accuracy in SMS billing anomaly detection in existing technologies and achieves the technical effect of improving the accuracy of SMS billing anomaly detection.
[0083] Example 2, based on the same inventive concept as the multi-dimensional anomaly detection method for the carrier-grade SMS billing system in the foregoing examples, such as... Figure 2 As shown, this application provides a multi-dimensional anomaly detection system for carrier-grade SMS billing systems, wherein the system includes:
[0084] The data extraction module 11 is used to extract a six-dimensional tensor structure after parsing the operator's billing file. The six-dimensional tensor structure includes the sending operator, receiving operator, sending location, receiving location, SMS type, and user package type. The matching module 12 is used to extract the attribute six-tuple of the SMS event as a tensor index after the SMS event arrives, access the tensor unit corresponding to the six-dimensional tensor structure, perform matching calculations between the actual billing result of the SMS event and the tensor return result, and generate matching calculation results. The weight establishment module 13 is used to create a time window with the SMS event as the extraction center, call the SMS samples within the time window, and use the SMS samples to establish the feature time window weights of the SMS event. The calculation module 14 is used to perform feature extraction of the SMS event and calculate the feature deviation of each feature using the feature extraction results. The feature scoring construction module 15 is used to construct a dynamic feature score using the feature deviation and the feature time window weights. The score generation module 16 is used to generate a total anomaly score using the matching calculation results and the dynamic feature score.
[0085] Furthermore, the data extraction module 11 is used to perform the following method:
[0086] Based on the analysis results, the duration of the billing rules is evaluated and a general comparison is performed. The baseline tensor structure is determined using the results of the duration evaluation and the general comparison. After determining the baseline tensor structure, incremental learning of all analysis results relative to the baseline tensor structure is performed. A six-dimensional tensor structure is extracted based on the incremental learning results and the baseline tensor structure.
[0087] Furthermore, the weight establishment module 13 is used to perform the following method:
[0088] Perform six-tuple dimension embedding encoding on historical SMS samples to establish a dense vector representation; combine the dense vector representations pairwise to construct cross-feature terms; input the cross-feature terms into an attention network to perform weighted influence learning on the contribution of cross-combinations to anomaly detection; generate a combined dynamic feature weighted tensor based on the weighted influence learning results; and compensate the feature time window weights based on the combined dynamic feature weighted tensor.
[0089] Furthermore, the weight establishment module 13 is used to perform the following method:
[0090] Context-aware analysis is performed on the SMS event to establish context-aware constraints; the time window is reconstructed using the context-aware constraints, and the weights of time points within the time window are adjusted.
[0091] Furthermore, the matching module 12 is used to perform the following method:
[0092] The historical deviation between the actual billing value of the SMS event and the tensor mapping value is retrieved; a lightweight regression model is used to learn the local residual trend of the historical deviation to generate a tensor residual prediction term; and the matching calculation result is corrected based on the tensor residual prediction term.
[0093] Furthermore, the score generation module 16 is used to perform the following method:
[0094] Using the SMS event as the starting point for backtracking, a set of associated SMS messages is established; behavioral trajectories are extracted using the associated SMS messages to establish a behavioral trajectory sequence; joint trajectory development analysis of the SMS event is performed based on the behavioral trajectory sequence to generate trajectory-aware anomalies; and total anomaly score compensation is performed using the trajectory-aware anomalies.
[0095] Furthermore, the score generation module 16 is used to perform the following method:
[0096] Perform SMS event analysis under six-dimensional attributes to establish an event semantic vector; use the event semantic vector to perform association matching at the tracing starting point to generate a first association result; obtain the spatiotemporal location data of the SMS event, and perform association matching at the tracing starting point based on the spatiotemporal location data to generate a second association result; use the first association result and the second association result to filter SMS messages and establish an associated SMS set.
[0097] Furthermore, the score generation module 16 is used to perform the following method:
[0098] Anomaly levels are matched based on the total anomaly score. An early warning scheme is established based on the anomaly level matching results and anomalies. The early warning scheme includes an early warning issuance scheme and a control response scheme. Account management is performed for the corresponding user based on the early warning scheme, and early warning notifications are sent out.
[0099] Furthermore, the score generation module 16 is used to perform the following method:
[0100] Set a dynamic density threshold and use the dynamic density threshold to evaluate the anomaly significance of SMS events; correct the total anomaly score based on the anomaly significance evaluation results.
[0101] It should be noted that the order of the embodiments described above is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. Furthermore, the above description focuses on specific embodiments of this specification. The processes depicted in the accompanying drawings do not necessarily require a specific or sequential order to achieve the desired results. In some implementations, multitasking and parallel processing are possible or may be advantageous.
[0102] The above description is only a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.
[0103] This specification and accompanying drawings are merely illustrative examples of this application and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of this application. Clearly, those skilled in the art can make various alterations and modifications to this application without departing from its scope. Therefore, if such modifications and modifications fall within the scope of this application and its equivalents, this application intends to include such modifications and modifications.
Claims
1. A multi-dimensional anomaly detection method for carrier-grade SMS billing systems, characterized in that, The method includes: After parsing the operator's billing file, a six-dimensional tensor structure is extracted. The six-dimensional tensor structure includes the sending operator, the receiving operator, the sending location, the receiving location, the SMS type, and the user's package type. When an SMS event arrives, the six-tuple of the SMS event's attributes is extracted as a tensor index. The tensor unit corresponding to the six-dimensional tensor structure is accessed, and the matching calculation between the actual billing result of the SMS event and the tensor return result is performed to generate the matching calculation result. Using the SMS event as the extraction center, a time window is created, and SMS samples within the time window are called. The SMS samples are then used to establish the feature time window weights of the SMS event. Perform feature extraction on the SMS event, and calculate the feature deviation of each feature using the feature extraction results; A dynamic feature score is constructed using the feature deviation and the feature time window weight; The overall anomaly score is generated using the matching calculation results and dynamic feature scores.
2. The multi-dimensional anomaly detection method for a carrier-grade SMS billing system as described in claim 1, characterized in that, The process of extracting a six-dimensional tensor structure after parsing the operator's billing file includes: Based on the analysis results, the duration of the billing rules is evaluated and a general comparison is performed. The baseline tensor structure is determined by using the evaluation results of rule duration and general comparison evaluation results; After determining the baseline tensor structure, incremental learning of all analytical results relative to the baseline tensor structure is performed, and a six-dimensional tensor structure is extracted based on the incremental learning results and the baseline tensor structure.
3. The multi-dimensional anomaly detection method for a carrier-grade SMS billing system as described in claim 1, characterized in that, The step of establishing the feature time window weights of the SMS event using the SMS samples includes: Perform six-tuple-dimensional embedding encoding on historical SMS samples to establish a dense vector representation; The dense vectors are combined pairwise to construct cross-feature terms; The cross-feature terms are input into the attention network to learn the weighted influence of the cross-combination on the contribution of anomaly detection; Generate a combined dynamic feature weighted tensor based on the influence of weights on the learning results; The feature time window weights are compensated based on the weighted tensor of the combined dynamic features.
4. The multi-dimensional anomaly detection method for a carrier-grade SMS billing system as described in claim 1, characterized in that, The step of creating a time window using the SMS event as the extraction center includes: Context-aware analysis is performed on the SMS events to establish context-aware constraints; The time window is reconstructed using the context-aware constraints, and the weights of time points within the time window are adjusted.
5. The multi-dimensional anomaly detection method for a carrier-grade SMS billing system as described in claim 1, characterized in that, The matching calculation of the actual billing result and the tensor return result of the SMS event is performed to generate a matching calculation result, including: Historical deviation between the actual billing value of the SMS event and the tensor mapping value; A lightweight regression model is used to learn the local residual trend of the historical bias and generate tensor residual prediction terms. The matching calculation results are corrected based on the tensor residual prediction term.
6. The multi-dimensional anomaly detection method for a carrier-grade SMS billing system as described in claim 1, characterized in that, The step of generating a total anomaly score using the matching calculation results and dynamic feature scoring includes: Using the aforementioned SMS event as the starting point for backtracking, a set of related SMS messages is established. The associated text messages are used to extract behavioral trajectories and establish behavioral trajectory sequences; Based on the behavioral trajectory sequence, perform joint trajectory development analysis on the SMS event to generate trajectory-aware anomalies; The trajectory-aware anomalies are used to perform overall anomaly score compensation.
7. The multi-dimensional anomaly detection method for a carrier-grade SMS billing system as described in claim 5, characterized in that, The step of using the SMS event as the starting point for backtracking to establish a set of associated SMS messages includes: Perform SMS event analysis based on six-dimensional attributes and establish event semantic vectors; Using the event semantic vector, perform association matching at the backtracking starting point to generate the first association result; Obtain the spatiotemporal location data of the SMS event, perform correlation matching based on the spatiotemporal location data at the starting point of the backtracking, and generate a second correlation result; Using the first association result and the second association result, SMS messages are filtered to establish an associated SMS set.
8. The multi-dimensional anomaly detection method for a carrier-grade SMS billing system as described in claim 1, characterized in that, After generating the total anomaly score using the matching calculation results and dynamic feature scoring, the process includes: Anomaly level matching is performed based on the total anomaly score, and an early warning scheme is established based on the anomaly level matching results and anomalies. The early warning scheme includes an early warning issuance scheme and a control response scheme. The account management of the corresponding user will be carried out according to the aforementioned early warning scheme, and an early warning notification will be sent out.
9. The multi-dimensional anomaly detection method for a carrier-grade SMS billing system as described in claim 1, characterized in that, The step of generating a total anomaly score using the matching calculation results and dynamic feature scoring also includes: Set a dynamic density threshold and use the dynamic density threshold to evaluate the anomaly significance of SMS events; The total anomaly score is corrected based on the anomaly significance assessment results.
10. A multi-dimensional anomaly detection system for carrier-grade SMS billing systems, characterized in that, A multi-dimensional anomaly detection method for implementing the carrier-grade SMS billing system according to any one of claims 1-9, the system comprising: The data extraction module is used to extract a six-dimensional tensor structure after parsing the operator's billing file. The six-dimensional tensor structure includes the sending operator, the receiving operator, the sending location, the receiving location, the SMS type, and the user package type. The matching module is used to extract the six-tuple of attributes of the SMS event as a tensor index when the SMS event arrives, access the tensor unit corresponding to the six-dimensional tensor structure, perform matching calculation between the actual billing result of the SMS event and the tensor return result, and generate the matching calculation result. The weight establishment module is used to create a time window with the SMS event as the extraction center, call the SMS samples within the time window, and use the SMS samples to establish the feature time window weight of the SMS event. The calculation module is used to perform feature extraction of the SMS event and calculate the feature deviation of each feature using the feature extraction results. A feature scoring construction module is used to construct a dynamic feature score using the feature deviation and the feature time window weight; The scoring generation module is used to generate a total anomaly score using the matching calculation results and dynamic feature scores.