A method, device, medium and equipment for analyzing alarms based on keywords

By analyzing the frequency of keywords in the data stream and using a dynamic baseline model, the problem of false alarms and missed alarms in existing alarm systems at different time periods has been solved, achieving accurate keyword anomaly monitoring and alarms.

CN121509192BActive Publication Date: 2026-06-26HAPPY ELEMENTS TECH (BEIJING) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HAPPY ELEMENTS TECH (BEIJING) CO LTD
Filing Date
2025-11-14
Publication Date
2026-06-26

Smart Images

  • Figure CN121509192B_ABST
    Figure CN121509192B_ABST
Patent Text Reader

Abstract

The application relates to the technical field of data analysis, and particularly provides a keyword-based alarm analysis method, device, medium and equipment. The method can comprise the following steps: acquiring the occurrence frequency of a keyword in a preset time period in a data stream to be processed; acquiring an abnormal parameter of the keyword based on the average frequency of the keyword in a corresponding historical preset time period and a target model constructed in advance; the target model is a distribution model based on probability statistics; and whether to generate alarm information is determined based on the abnormal parameter and the occurrence frequency. Some embodiments of the application can realize accurate and automatic monitoring of keyword abnormalities, effectively reduce false positives and false negatives of alarms, and improve alarm accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data monitoring and alarm technology, and more specifically, to a method, apparatus, medium, and device for alarm analysis based on keywords. Background Technology

[0002] With the continuous development of internet platforms, the amount of data generated is also gradually increasing. Currently, in order to supervise their own data, various platforms usually set up alarm systems to monitor the data so that they can issue timely alerts when anomalies occur.

[0003] Currently, widely used alarm systems primarily monitor and issue alerts based on changes in the total number of issues. Specifically, alarm systems calculate the total number of issues within a certain time period (such as daily or hourly) and compare it with a preset threshold to determine whether an alarm should be triggered. However, setting fixed thresholds or simple percentage fluctuation thresholds makes it difficult to adapt to different time periods, ultimately leading to a high number of false alarms and missed alarms.

[0004] Therefore, how to provide a technical solution for accurate keyword-based alarm analysis has become a technical problem that needs to be solved. Summary of the Invention

[0005] The purpose of some embodiments of this application is to provide a method, apparatus, medium and device for alarm based on keyword analysis. By establishing a dynamic baseline instead of a fixed threshold, the technical solutions of the embodiments of this application achieve accurate and automated monitoring of keyword anomalies, improve the accuracy and efficiency of alarms, and greatly reduce the probability of false alarms and missed alarms.

[0006] In a first aspect, some embodiments of this application provide a method for alarm based on keyword analysis, including: obtaining the frequency of occurrence of keywords in a data stream to be processed within a preset time period; obtaining abnormal parameters of the keywords based on the average frequency of the keywords in the corresponding historical preset time period and a pre-constructed target model; wherein the target model is a distribution model based on probability statistics; and determining whether to generate alarm information based on the abnormal parameters and the frequency of occurrence.

[0007] Some embodiments of this application obtain the frequency of occurrence of keywords in the data stream to be processed by statistical analysis, and then obtain abnormal parameters by combining the average frequency over a preset historical period and the target model; finally, the frequency of occurrence and abnormal parameters are used to determine whether to generate alarm information. Embodiments of this application can achieve accurate alarm analysis based on keyword statistics, improving the accuracy and efficiency of alarms, and greatly reducing the probability of false alarms and missed alarms.

[0008] In some embodiments, obtaining the abnormal parameters of the keyword based on the average frequency of the keyword in the corresponding historical preset time period and the pre-constructed target model includes: taking the output result of the target model not being less than the occurrence frequency as a constraint, taking the average frequency as the input of the target model, and determining the occurrence probability value of the final output of the target model; wherein the occurrence probability value is used as the abnormal parameter.

[0009] Some embodiments of this application set constraints on the target model, using the average frequency as input to the target model, to obtain abnormal parameters that meet the constraints, providing data support for whether to generate alarm information subsequently.

[0010] In some embodiments, determining whether to generate an alarm message based on the anomaly parameter and the occurrence frequency includes: when the occurrence probability value is not less than the occurrence frequency, comparing the occurrence probability value with a preset salience parameter to determine anomaly candidate keywords; filtering the anomaly candidate keywords to determine whether to generate the alarm message.

[0011] Some embodiments of this application improve the accuracy of alarm information generation by comparing the occurrence probability value with a preset salience parameter to determine abnormal candidate keywords, and then filtering them to determine whether to generate alarm information.

[0012] In some embodiments, comparing the occurrence probability value with a preset significance parameter to determine the abnormal candidate keyword includes: if the occurrence probability value is less than the preset significance parameter, then the keyword is confirmed as the abnormal candidate keyword; if the occurrence probability value is greater than or equal to the preset significance parameter, then the keyword is not the abnormal candidate keyword.

[0013] Some embodiments of this application determine abnormal candidate keywords by comparing their occurrence probability values ​​with preset salience parameters, providing an analytical basis for whether to generate alarm information subsequently.

[0014] In some embodiments, obtaining the abnormal parameters of the keyword based on the average frequency of the keyword in the corresponding historical preset time period and the pre-constructed target model includes: when the target model follows a Poisson distribution, determining the standard deviation through the relationship between the average frequency and the variance of the Poisson distribution; wherein the standard deviation is used as the abnormal parameter.

[0015] Some embodiments of this application determine the standard deviation by using the average frequency and a target model that follows a Poisson distribution, providing data support for whether to generate alarm information subsequently.

[0016] In some embodiments, determining whether to generate an alarm message based on the abnormal parameters and the occurrence frequency includes: calculating the standard deviation, the average frequency, and the occurrence frequency to obtain the standard deviation deviation multiple; comparing the standard deviation deviation multiple with a preset threshold to determine abnormal candidate keywords; and filtering the abnormal candidate keywords to determine whether to generate the alarm message.

[0017] Some embodiments of this application determine the deviation multiple of the standard deviation by standard deviation, average frequency and occurrence frequency, and compare it with a preset threshold to determine abnormal candidate keywords, and then filter them to determine whether to generate alarm information, thereby improving the accuracy of alarm information generation.

[0018] In some embodiments, the step of filtering the abnormal candidate keywords to determine whether to generate the alarm information includes: filtering the abnormal candidate keywords using a filter; if the abnormal candidate keywords belong to a first category of keywords, then confirming that the alarm information will not be generated; if the abnormal candidate keywords do not belong to the first category of keywords, then confirming that the alarm information will be generated.

[0019] Some embodiments of this application use filters to screen abnormal candidate keywords to determine whether to generate alarm information, thereby achieving accurate and efficient generation of alarm information.

[0020] Secondly, some embodiments of this application provide a device for alarm based on keyword analysis, comprising: an acquisition module for acquiring the frequency of occurrence of keywords in a data stream to be processed within a preset time period; a model processing module for acquiring abnormal parameters of the keywords based on the average frequency of the keywords in the corresponding historical preset time period and a pre-constructed target model; wherein the target model is a distribution model based on probability statistics; and an alarm module for determining whether to generate alarm information based on the abnormal parameters and the frequency of occurrence.

[0021] Thirdly, some embodiments of this application provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, can implement the method described in any embodiment of the first aspect.

[0022] Fourthly, some embodiments of this application provide an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the program, can implement the method as described in any embodiment of the first aspect.

[0023] Fifthly, some embodiments of this application provide a computer program product, the computer program product including a computer program, wherein the computer program, when executed by a processor, can implement the method described in any embodiment of the first aspect. Attached Figure Description

[0024] To more clearly illustrate the technical solutions of some embodiments of this application, the accompanying drawings used in some embodiments of this application will be briefly described below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0025] Figure 1 System diagrams for keyword analysis-based alarms provided for some embodiments of this application;

[0026] Figure 2 One of the flowcharts for a keyword analysis-based alarm method provided in some embodiments of this application;

[0027] Figure 3 This is the second flowchart of a keyword analysis-based alarm method provided for some embodiments of this application;

[0028] Figure 4 A schematic diagram of a time weighting coefficient model provided for some embodiments of this application;

[0029] Figure 5 A block diagram of a device for alarm based on keyword analysis provided for some embodiments of this application;

[0030] Figure 6 A schematic diagram of an electronic device provided for some embodiments of this application. Detailed Implementation

[0031] The technical solutions of some embodiments of this application will now be described with reference to the accompanying drawings.

[0032] It should be noted that similar reference numerals and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures. Furthermore, in the description of this application, terms such as "first," "second," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.

[0033] Alarm systems widely used in related technologies primarily monitor and trigger alarms based on changes in the total number of issues. Specifically, the system calculates the total number of issues within a certain time period (e.g., daily or hourly) and compares it to preset thresholds to determine whether an alarm should be triggered. These thresholds typically include two sets of values: an upper limit and a lower limit. Each set of thresholds is composed of both a "proportion" and an "absolute number" to address alarm needs under different baseline conditions. For example, when the baseline number of issues is large (e.g., exceeding 100), the system mainly relies on the proportion value (e.g., setting a high threshold for exceeding the average by 20% and a low threshold for falling below the average by 20%) as the judgment criterion to accommodate fluctuations of different magnitudes. When the baseline number of issues is small (e.g., below 100), the absolute value is used (e.g., setting a high threshold for exceeding the average by 10 and a low threshold for falling below the average by 5) as the judgment standard to avoid triggering invalid alarms due to minor fluctuations in low baseline conditions. For example, the system will calculate both sets of values, and when the total number of issues exceeds the calculated upper limit or falls below the calculated lower limit, the system will issue an alarm notification.

[0034] However, the above methods struggle to balance the trade-off between sensitivity and false alarm rate: setting fixed thresholds or simple percentage fluctuation thresholds fails to adapt to normal traffic fluctuations across different time periods (e.g., weekdays / weekends, daytime / nighttime). Overly strict thresholds lead to more false alarms, while overly lenient thresholds result in missed alarms, making it impossible to achieve a good balance between sensitivity and accuracy. Furthermore, these methods only focus on the total number of issues and cannot identify structural changes in the content of those issues. For example, even if the total number of issues is within the normal range, a surge in the number of specific erroneous keywords in a key business area can overwhelm valuable, potential "unknown threat" signals, making them difficult to detect in a timely manner. Moreover, alarm thresholds heavily rely on manual setting and adjustment by operations personnel, which is not only labor-intensive but also lacks data-driven decision support, hindering intelligent operations and maintenance. In older systems, it is necessary to set minimum alarm counts, maximum alarm percentage thresholds, minimum alarm percentage thresholds, maximum absolute alarm thresholds (effective when the base number is high), and minimum absolute alarm thresholds (effective when the base number is low) for each alarm group.

[0035] In view of this, some embodiments of this application provide a keyword-based alarm method. This method involves statistically analyzing the frequency of keywords in the data stream to be processed within a preset time period, combining this with the average frequency over historical preset time periods and a target model to obtain anomaly parameters. Finally, the frequency and anomaly parameters are used to determine whether to generate an alarm message. This application's embodiments establish a dynamic baseline using a target model to replace the traditional fixed threshold method, achieving accurate and automated monitoring of keyword anomalies. Furthermore, the keyword statistics method enables accurate judgment of whether alarm messages should be generated for keywords in the data stream to be processed, improving alarm accuracy and efficiency, and significantly reducing the probability of false alarms and missed alarms.

[0036] The following is in conjunction with the appendix Figure 1 The overall structure of a keyword analysis-based alarm system provided by some embodiments of this application is illustrated by way of example.

[0037] like Figure 1 As shown, some embodiments of this application provide a system diagram for keyword-based alarm analysis. This keyword-based alarm analysis system may include a terminal 100 and an alarm server 200. The terminal 100 can send a data stream to be processed generated by a certain system to the alarm system deployed on the alarm server 200; or the alarm system can read the data stream to be processed from the terminal 100. The alarm system obtains the frequency of occurrence of keywords in the data stream by statistical analysis; then, it determines abnormal parameters by combining the historical average frequency and a pre-deployed target model; finally, it determines whether to generate alarm information based on the frequency of occurrence and the abnormal parameters.

[0038] In some embodiments of this application, if terminal 100 can also deploy the alarm system of alarm server 200 and implement the corresponding alarm analysis function, then alarm server 200 may not be required. Specific configurations can be made according to the actual application scenario, and this application embodiment does not impose specific limitations here.

[0039] In some embodiments of this application, the terminal 100 can be a mobile terminal or a non-portable computer terminal, and the embodiments of this application are not specifically limited here.

[0040] The following is in conjunction with the appendix Figure 2 The present application provides an exemplary embodiment of the implementation process of keyword analysis-based alarms performed by the alarm server 200.

[0041] Please see the appendix Figure 2 , Figure 2 A flowchart of a keyword-based alarm method is provided for some embodiments of this application. The keyword-based alarm method may include:

[0042] S210, obtain the frequency of occurrence of keywords in the data stream to be processed within a preset time period.

[0043] For example, in a specific embodiment of this application, the processing stream to be processed can be a real-time log stream, or other types of data such as work orders. For instance, the real-time log stream (as a specific example of the data stream to be processed) is automatically segmented and its frequency is statistically analyzed to obtain the frequency of keyword occurrences. The alarm system of the alarm server 200 pre-stores a dynamic word segmentation table, which includes a specialized terminology table and a general terminology library. The specialized terminology table includes professional terms related to business systems, error types, core modules, etc.; the general terminology library is used to capture general error description terms. This word segmentation table allows for real-time word segmentation of the real-time log stream, transforming unstructured text data into a structured set of keywords. Then, the frequency of keywords appearing in the real-time log stream within a preset time period is statistically analyzed to obtain the occurrence frequency. For example, the alarm system uses "hours" as the basic time granularity and statistically analyzes the occurrence frequency of keywords during the time period "Every Monday 14:00-15:00" (as a specific example of a preset time period).

[0044] In other embodiments, the word segmentation table may also consist of three parts: a sensitive word segmentation table containing words that require key monitoring, such as those related to core business and critical modules; a general word segmentation table, divided into a specialized terminology table (related to business systems and error types) and a general vocabulary (general error description words); and a non-sensitive word segmentation table / weak-semantic word table, containing words with ambiguous meanings, irrelevant words, or words that cannot represent specific problems (e.g., "problem," "error," "check," "how," "will," etc.), used to filter noise in subsequent steps. Different word segmentation groups can correspond to different preset thresholds Z0; for example, sensitive word segmentation table Z0=2, general word segmentation table Z0=5, and non-sensitive word segmentation table / weak-semantic word table Z0=10. These preset thresholds can be used for subsequent screening of abnormal candidate keywords.

[0045] S220, based on the average frequency of the keyword in the corresponding historical preset time period and the pre-constructed target model, obtain the abnormal parameters of the keyword; wherein, the target model is a distribution model based on probability statistics.

[0046] For example, in a specific embodiment of this application, using "hours" as the basic time granularity, the average frequency (i.e., the expected value) of each keyword among multiple keywords within a preset period (e.g., the most recent week or a configurable historical period, such as the most recent four weeks) is dynamically calculated. For example, the average frequency of each keyword's historical occurrence during the preset period of "Monday 14:00-15:00" is calculated. Then, combined with a pre-built Poisson distribution model (as a specific example of the target model), the abnormal parameters of the keywords are obtained.

[0047] Specifically, the Poisson distribution model is constructed after analyzing historical log data. For example, by automatically segmenting and counting words according to a word segmentation table on the input historical log data, the historical average frequency of each keyword is calculated; a Poisson distribution model P|λ is then established based on the baseline expected value corresponding to this historical average frequency. Specifically, the historical average frequency is calculated as the average hourly frequency μ of the keyword over the entire historical period, calculated as μ = total historical occurrences of the keyword / (number of days in the historical period × 24). Combined with a time-weighted model, a weighted baseline expected value (λ) is generated for each keyword-time period pair, λ = μ × W. i Among them, W i This represents the weight coefficient corresponding to the hourly segment containing the keyword. Finally, a baseline model (e.g., a Poisson distribution model) is constructed using λ. This baseline model can be a lookup table with (keyword, hourly segment) as the key and λ as the value, or any other data structure that enables fast querying.

[0048] S230, determine whether to generate alarm information based on the abnormal parameters and the frequency of occurrence.

[0049] For example, in a specific embodiment of this application, it is determined whether to generate alarm information based on the abnormal parameters and the frequency of occurrence of keywords obtained above.

[0050] The implementation process of S220~S230 is illustrated below.

[0051] In some embodiments of this application, S220 may include: taking the output of the target model as not less than the occurrence frequency as a constraint, using the average frequency as the input of the target model, and determining the occurrence probability value of the final output of the target model; wherein the occurrence probability value is used as the anomaly parameter.

[0052] For example, in a specific embodiment of this application, the output value P ≥ the frequency of occurrence of the Poisson distribution model is used as a constraint condition, and λ = the average frequency is input into the Poisson distribution model to output the probability value P that satisfies the preset condition.

[0053] In some embodiments of this application, S230 may include: S231, when the occurrence probability value is not less than the occurrence frequency, comparing the occurrence probability value with a preset salience parameter to determine abnormal candidate keywords; S232, filtering the abnormal candidate keywords to determine whether to generate the alarm information.

[0054] For example, in a specific embodiment of this application, the calculated P-value is compared with a preset significance level (as a specific example of a preset significance parameter, such as 0.05) to determine whether the keyword should be listed as an abnormal candidate keyword. Finally, the abnormal candidate keywords are filtered to determine whether to generate an alarm message.

[0055] In some embodiments of this application, S231 may include: if the occurrence probability value is less than the preset saliency parameter, then the keyword is confirmed as the abnormal candidate keyword; if the occurrence probability value is greater than or equal to the preset saliency parameter, then the keyword is not the abnormal candidate keyword.

[0056] For example, in a specific embodiment of this application, if the P-value is less than the significance level, the change in the frequency of the keyword is considered to be statistically significant rather than a random fluctuation, and thus it is marked as an abnormal candidate keyword; otherwise, it is not an abnormal candidate keyword.

[0057] In some other embodiments of this application, S220 may include: when the target model follows a Poisson distribution, determining the standard deviation by the relationship between the average frequency and the variance of the Poisson distribution; wherein the standard deviation is used as the outlier parameter.

[0058] For example, in a specific embodiment of this application, when the target model follows a Poisson distribution, the variance... From this, the standard deviation can be obtained. The standard deviation is used as the basis for subsequent calculations.

[0059] In some other embodiments of this application, S230 may include: S233, calculating the standard deviation, the average frequency, and the occurrence frequency to obtain the standard deviation deviation multiple; comparing the standard deviation deviation multiple with a preset threshold to determine abnormal candidate keywords; S234, filtering the abnormal candidate keywords to determine whether to generate the alarm information.

[0060] For example, in a specific embodiment of this application, the standard deviation is used. The standard deviation Z is calculated from the average frequency λ and the occurrence frequency K using the following formula: Z = (K - λ) / The keyword type is determined using a word segmentation table to set a preset threshold Z0. For example, if the keyword belongs to a standard word segmentation table, Z0 = 5. After determining the preset threshold Z0, Z is compared with Z0. If Z is not less than Z0, the keyword is listed as an abnormal candidate keyword; otherwise, it is considered a normal candidate keyword. Finally, the abnormal candidate keywords are filtered to determine whether to generate an alarm message.

[0061] In some embodiments of this application, S232 or S234 may include: using a filter to screen the abnormal candidate keywords; if the abnormal candidate keywords belong to a first category of keywords, then confirm that the alarm information is not generated; if the abnormal candidate keywords do not belong to the first category of keywords, then confirm that the alarm information is generated.

[0062] For example, in a specific embodiment of this application, the alarm system uses pre-set filters or machine learning to dynamically identify a batch of words with ambiguous meanings, irrelevant information, or those that cannot characterize specific problems (such as "problem," "error," "check," etc.). These words are marked as weak words (as a specific example of the first category of keywords), while other words are non-weak words. Abnormal candidate keywords marked as weak words are filtered out and no longer trigger alarms; otherwise, alarm information is generated and relevant personnel are notified. This method can effectively reduce noise and improve the quality of alarm information. Subsequently, the direction of change (surge / decrease) and severity (characterized by Z-score) of the keyword list composed of non-weak words can be statistically analyzed to generate comprehensive alarm information.

[0063] It should be noted that the target model described above is based on a Poisson distribution model; in other embodiments, a normal distribution or a T-distribution can also be used for testing. When the historical frequency of a keyword is sufficiently large, its distribution can be approximated as a normal distribution according to the central limit theorem. Significance can also be determined by calculating the Z-value ((observed value - expected value) / standard deviation) and comparing it with the critical Z-value; control charts, such as C-charts (charts specifically for defect counting), can also be used to create a control chart for each keyword. When the current frequency exceeds the upper control limit (UCL) or lower control limit (LCL) calculated based on historical data, an alarm is triggered. Specific choices can be made based on the actual application scenario, and the embodiments in this application are not limited to this.

[0064] The following is in conjunction with the appendix Figure 3 The present application provides an exemplary description of the specific process of keyword analysis-based alarms in some embodiments.

[0065] Please see the appendix Figure 3 , Figure 3 A flowchart of a keyword-based alarm analysis method is provided for some embodiments of this application.

[0066] As an example of an application scenario: An online service platform needs to monitor user question logs in real time to detect abnormal issues with specific functions or content. This scenario takes the user questions during the period from 13:00 to 14:00 as an example to demonstrate the complete process from raw data to triggering an alarm. Before executing the following method, the alarm system has established and maintained a word segmentation table group (including a weak word table, which contains words such as "how", "will"), and has constructed and continuously updated a dynamic baseline model at an hourly granularity (such as a Poisson distribution model). This model records the occurrence frequency of each keyword in the same historical period and integrates a time weight coefficient model.

[0067] The above process will be described below by way of example.

[0068] S310, Obtain the real-time log stream.

[0069] For example, the system received three user question texts between 13:00 and 14:00, which are respectively:

[0070] Log 1 (13:10): "The special effects weekly competition will freeze when clicked. It has been like this all morning. Please fix the weekly competition as soon as possible.";

[0071] Log 2 (13:36): "Why can't I enter the weekly competition? Urgent!"

[0072] Log 3 (13:50): "Why is the special effect gone? I can't participate in the weekly competition."

[0073] S320, Automatically segment the real-time log stream using the word segmentation table and count the word frequencies to obtain the occurrence frequencies of the keywords.

[0074] For example, using the word segmentation table, segment the log text, remove duplicates within a single log, and filter out weak words (such as "one", "already", "please", "how", "will", "urgent", "gone", "participate", "not"). Count the occurrence frequencies of all keywords after filtering in the current period, that is: weekly competition (K = 3), special effect (K = 2), click (K = 1), freeze (K = 1), morning (K = 1), as soon as possible (K = 1), fix (K = 1), can't enter (K = 1).

[0075] S330, Obtain the average frequency of the keyword in the corresponding historical preset period.

[0076] For example, query the historical baseline expectation value λ (i.e., the average frequency) of the above keywords from the baseline model. The construction of this baseline model includes key time context correction, and its time weight coefficient model is as Figure 4 shown. Figure 4 Shows the weighted coefficient of the average data volume per hour (W i), and this coefficient is calculated by W i = (Total historical data volume in this hour period) / (Average data volume per hour within the historical period). As Figure 4 shown, the weight coefficient W 14 for the time period from 13:00 to 14:00 (corresponding to the 14th interval) is 1.6.

[0077] Obtain the average hourly frequency μ of the keyword throughout the historical period: μ_weekly competition = 0.179 times / hour, μ_special effect = 0.893 times / hour. Calculate the weighted historical baseline expectation value λ: λ_weekly competition = μ_weekly competition × W 14 = 0.179 × 1.6 ≈ 0.286; λ_special effect = μ_special effect × W 14 = 0.893 × 1.6 ≈ 1.43.

[0078] S340, calculate the standard deviation and the multiple of standard deviation deviation.

[0079] For example, for the keyword "weekly competition", the standard deviation: _weekly competition = ≈ 0.535, and the multiple of standard deviation deviation: Z_weekly competition = (3 - 0.286) / 0.535 ≈ 5.07.

[0080] For the keyword "special effect", the standard deviation: _special effect = ≈ 1.196, and the multiple of standard deviation deviation: Z_special effect = (2 - 1.43) / 1.196 ≈ 0.48.

[0081] S350, determine whether the multiple of standard deviation deviation is not less than the preset threshold. If so, execute S360; otherwise, end.

[0082] For example, compare the Z value with the preset threshold (set as Z0 = 5 in this embodiment).

[0083] S360, confirm that the keyword is an abnormal candidate keyword.

[0084] For example, Z_weekly competition 5.07 ≥ 5, so the keyword "weekly competition" is determined to be statistically significant and marked as an abnormal candidate keyword. Z_special effect 0.48 < 5, not reaching the preset threshold, is regarded as normal fluctuation.

[0085] S370, use the filter to confirm that the abnormal candidate keyword is not a weak keyword and generate an alarm message.

[0086] For example, the abnormal candidate "Weekly Competition" is compared again with the list of weak synonyms. "Weekly Competition" is a term that explicitly refers to a specific business function and is not a weak synonym; therefore, an alarm message is generated through filtering. The alarm message includes the following:

[0087] Time slot: 13:00 - 14:00;

[0088] Abnormal signal: A statistically significant surge in the keyword weekly contest.

[0089] Data support: The occurrence has occurred 3 times so far, the expected value based on the historical baseline is 0.286 times, and the standard deviation deviation (Z-value) is 5.61.

[0090] Recommendation: Please check the service modules related to "Weekly Competition" immediately.

[0091] It should be noted that the above method can be a cyclical process, that is, performing alarm detection on different log streams in real time, or performing detection according to a set period. That is, if S350 is negative, it can return to S310 to perform alarm detection on the log stream in the next period; in another implementation, it can also end directly and restart from S310 in the next period; the embodiments of this application are not limited to this.

[0092] It is understood that the specific implementation process of S310~S370 can be referred to the method embodiment provided above. To avoid repetition, detailed descriptions are omitted here.

[0093] As can be seen from the above embodiments of this application, this application establishes a dynamic and accurate baseline for normal behavior through an hourly granular mean model, namely the Poisson distribution model. The Poisson distribution is naturally suitable for describing the number of occurrences of random events per unit time, and its probability model can automatically adapt to keywords with different bases. For events with a small base, a more stringent significance level is required; for events with a large base, even small absolute changes may be captured. Furthermore, the use of the p-value provides a quantifiable and objective statistical confidence level for "anomalies," thereby effectively distinguishing random noise from real signals. This invention achieves higher detection accuracy and lower false alarm rate without relying on heavy manual intervention, and possesses the ability to detect unknown threats.

[0094] Please refer to Figure 5 , Figure 5 The diagram illustrates the composition of a keyword analysis-based alarm device according to some embodiments of this application. It should be understood that this keyword analysis-based alarm device corresponds to the method embodiments described above and is capable of performing the various steps involved in the method embodiments. The specific functions of this keyword analysis-based alarm device can be found in the description above; detailed descriptions are omitted here to avoid repetition.

[0095] Figure 5 The keyword analysis-based alarm device includes at least one software function module that can be stored in a memory or embedded in the keyword analysis-based alarm device in the form of software or firmware. The keyword analysis-based alarm device includes: an acquisition module 510, used to acquire the frequency of occurrence of keywords in the data stream to be processed within a preset time period; a model processing module 520, used to acquire the abnormal parameters of the keywords based on the average frequency of the keywords in the corresponding historical preset time period and a pre-built target model; wherein the target model is a distribution model based on probability statistics; and an alarm module 530, used to determine whether to generate alarm information based on the abnormal parameters and the frequency of occurrence.

[0096] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working process of the device described above can be referred to the corresponding process in the aforementioned method, and will not be elaborated further here.

[0097] Some embodiments of this application also provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, can perform the operation of any of the methods corresponding to the methods provided in the above embodiments.

[0098] Some embodiments of this application also provide a computer program product, which includes a computer program, wherein when the computer program is executed by a processor, it can implement the operation of any of the methods corresponding to the above embodiments provided in the above embodiments.

[0099] like Figure 6 As shown, some embodiments of this application provide an electronic device 600, which includes a memory 610, a processor 620, and a computer program stored in the memory 610 and executable on the processor 620. When the processor 620 reads the program from the memory 610 via a bus 630 and executes the program, it can implement the methods of any of the above embodiments.

[0100] Processor 620 can process digital signals and can include various computing architectures. For example, it can be a complex instruction set computer architecture, a reduced instruction set computer architecture, or an architecture that implements multiple instruction set combinations. In some examples, processor 620 can be a microprocessor.

[0101] The memory 610 can be used to store instructions executed by the processor 620 or data related to the execution of instructions. These instructions and / or data may include code for implementing some or all of the functions of one or more modules described in the embodiments of this application. The processor 620 of this disclosure embodiment can be used to execute the instructions in the memory 610 to implement the methods shown above. The memory 610 includes dynamic random access memory, static random access memory, flash memory, optical memory, or other memories well known to those skilled in the art.

[0102] Those skilled in the art will understand that Figure 6 This is merely an example of an electronic device and does not constitute a limitation on the electronic device. It may include more or fewer components than shown, or a combination of certain components, or different components.

[0103] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application. It should be noted that similar reference numerals and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.

[0104] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0105] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

Claims

1. A method for alarm based on keyword analysis, characterized in that, include: Obtain the frequency of occurrence of keywords in the data stream to be processed within a preset time period; Based on the average frequency of the keyword in the corresponding historical preset time period and the pre-constructed target model, the abnormal parameters of the keyword are obtained; wherein, the target model is a distribution model based on probability statistics; Based on the abnormal parameters and the frequency of occurrence, it is determined whether to generate an alarm message; The step of obtaining the abnormal parameters of the keyword based on the average frequency of the keyword in the corresponding historical preset time period and the pre-built target model includes: Using the constraint that the output of the target model is not less than the occurrence frequency, and taking the average frequency as the input of the target model, the occurrence probability value of the final output of the target model is determined; wherein, the occurrence probability value is used as the anomaly parameter.

2. The method as described in claim 1, characterized in that, The step of determining whether to generate an alarm message based on the abnormal parameters and the frequency of occurrence includes: When the occurrence probability value is not less than the occurrence frequency, the occurrence probability value is compared with a preset significance parameter to determine abnormal candidate keywords; The abnormal candidate keywords are filtered to determine whether to generate the alarm information.

3. The method as described in claim 2, characterized in that, The step of comparing the occurrence probability value with a preset significance parameter to determine abnormal candidate keywords includes: If the probability value of occurrence is less than the preset significance parameter, then the keyword is confirmed as the abnormal candidate keyword; If the probability value of occurrence is greater than or equal to the preset saliency parameter, then the keyword is not an abnormal candidate keyword.

4. The method as described in claim 1, characterized in that, The step of obtaining abnormal parameters of the keyword based on the average frequency of the keyword in the corresponding historical preset time period and the pre-built target model includes: When the target model follows a Poisson distribution, the standard deviation is determined by the relationship between the average frequency and the variance of the Poisson distribution; wherein the standard deviation is used as the outlier parameter.

5. The method as described in claim 4, characterized in that, The step of determining whether to generate an alarm message based on the abnormal parameters and the frequency of occurrence includes: The standard deviation, the average frequency, and the frequency of occurrence are calculated to obtain the standard deviation deviation factor; The deviation factor of the standard deviation is compared with a preset threshold to determine abnormal candidate keywords; The abnormal candidate keywords are filtered to determine whether to generate the alarm information.

6. The method according to any one of claims 2-3 and 5, characterized in that, The step of filtering the abnormal candidate keywords to determine whether to generate the alarm information includes: The abnormal candidate keywords are filtered. If the abnormal candidate keyword belongs to the first category of keywords, it is confirmed that no alarm information will be generated; if the abnormal candidate keyword does not belong to the first category of keywords, it is confirmed that the alarm information will be generated.

7. A device for alarm based on keyword analysis, characterized in that, include: The acquisition module is used to acquire the frequency of occurrence of keywords in the data stream to be processed within a preset time period; The model processing module is used to obtain the abnormal parameters of the keyword based on the average frequency of the keyword in the corresponding historical preset time period and the pre-constructed target model; wherein, the target model is a distribution model based on probability statistics; The alarm module is used to determine whether to generate alarm information based on the abnormal parameters and the frequency of occurrence. Specifically, the model processing module is used to determine the occurrence probability value of the final output of the target model by taking the average frequency as the input of the target model, with the output result of the target model being no less than the occurrence frequency as a constraint; wherein the occurrence probability value is used as the anomaly parameter.

8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, wherein the computer program is executed by a processor to perform the method as described in any one of claims 1-6.

9. An electronic device, characterized in that, The method includes a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the computer program is executed by the processor to perform the method as described in any one of claims 1-6.