Data processing method, apparatus, device, medium, and product
By employing data processing methods based on format classification and pattern processing, combined with text similarity judgment, the problem of high workload and low accuracy in monitoring the interactive text content between the front-end and back-end has been solved, achieving efficient anomaly monitoring and improved user experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA MOBILE INFORMATION TECHNOLOGY CO LTD
- Filing Date
- 2022-12-15
- Publication Date
- 2026-06-19
AI Technical Summary
The existing text-based monitoring method for front-end and back-end interactions suffers from high workload, is prone to missed monitoring and false monitoring, and affects user satisfaction.
By acquiring the response type and response text of business interaction data, the format is classified and patterned to generate a first text list. The text similarity method is used to judge the similarity with the reference text list, and an alarm is issued when the similarity is lower than a preset value.
It enables effective monitoring of abnormal business interaction data, reduces development and maintenance workload, and improves the accuracy of monitoring results and user satisfaction.
Smart Images

Figure CN115905540B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to data processing methods, apparatus, devices, media and products. Background Technology
[0002] Whether users encounter system errors and whether the returned results meet expectations during the use of Internet technology (IT) applications directly affects user satisfaction and can even lead to user churn. Therefore, IT application development and maintenance teams need to pay close attention to this.
[0003] Currently, user experience and satisfaction can be monitored by tracking the text content of front-end and back-end interactions. Common monitoring methods include developing standardized methods and manually setting keyword monitoring, but both involve a large workload.
[0004] Application content
[0005] This application provides a data processing method, apparatus, device, medium, and product that can reduce workload when monitoring text content of front-end and back-end interactions.
[0006] In a first aspect, embodiments of this application provide a data processing method, including:
[0007] Obtain business interaction data, as well as the response type and response text of the business interaction data;
[0008] The format of the response body text is classified according to the response type to obtain the format type of the response body text;
[0009] The response body text of each format type is processed into a pattern to obtain a first text list, which is used to store the patterned response body text.
[0010] Based on the text similarity method, the similarity between the first text list and the reference text list is determined. The text similarity method is related to the number of elements contained in the first text list. The elements are the patterned response text. The reference text list is used to store the reference response text.
[0011] If the similarity is less than the preset similarity, an alarm will be issued to alert the user that the business interaction data is abnormal.
[0012] Secondly, embodiments of this application provide a data processing apparatus, including:
[0013] The acquisition module is used to acquire business interaction data, as well as the response type and response text of the business interaction data;
[0014] The classification module is used to classify the format of the response body text according to the response type, and obtain the format type of the response body text;
[0015] The processing module is used to perform pattern processing on the response body text of each format type to obtain the first text list, which is used to store the pattern-processed response body text.
[0016] The determination module is used to determine the similarity between the first text list and the reference text list based on the text similarity method. The text similarity method is related to the number of elements contained in the first text list. The elements are the patterned response text. The reference text list is used to store the reference response text.
[0017] The alarm module is used to issue an alarm when the similarity is less than the preset similarity, so as to notify the user of abnormal business interaction data.
[0018] Thirdly, embodiments of this application provide an electronic device, including:
[0019] processor;
[0020] Memory is used to store computer program instructions;
[0021] When computer program instructions are executed by the processor, the method described in the first aspect is implemented.
[0022] Fourthly, embodiments of this application provide a computer-readable storage medium having computer program instructions stored thereon, which, when executed by a processor, implement the method described in the first aspect.
[0023] Fifthly, embodiments of this application provide a computer program product in which instructions, when executed by a processor of an electronic device, cause the electronic device to perform the method described in the first aspect.
[0024] The data processing method provided in this application involves acquiring business interaction data, as well as the response type and response text of the business interaction data; classifying the format of the response text according to the response type to obtain the format type of the response text; then performing pattern processing on the response text of each format type to obtain a first text list; and determining the similarity between the first text list and a reference text list according to a text similarity method, wherein the text similarity method is related to the number of response texts contained in the first text list. In other words, this application embodiment obtains the first text list through classification and pattern processing, and executes an alarm strategy based on the similarity between the first text list and the reference text list, achieving effective monitoring of abnormal business interaction data without the need for extensive development of standardized methods or manual setting of numerous keywords, thus reducing workload. Attached Figure Description
[0025] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the embodiments of this application will be briefly introduced below. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0026] Figure 1 A flowchart illustrating a data processing method provided in an embodiment of this application;
[0027] Figure 2 A flowchart illustrating another data processing method provided in this application embodiment;
[0028] Figure 3 A structural diagram of a data processing apparatus provided in an embodiment of this application;
[0029] Figure 4 This is a structural diagram of an electronic device provided in an embodiment of this application. Detailed Implementation
[0030] The features and exemplary embodiments of various aspects of this application will now be described in detail. To make the objectives, technical solutions, and advantages of this application clearer, the application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only configured to explain this application and are not configured to limit this application. For those skilled in the art, this application can be implemented without some of these specific details. The following description of the embodiments is merely to provide a better understanding of this application by illustrating examples of this application.
[0031] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising..." does not exclude the presence of additional identical elements in the process, method, article, or apparatus that includes said element.
[0032] As mentioned above, whether users encounter system errors and whether the returned results meet expectations during the use of Internet Technology (IT) applications directly affects user satisfaction and can even lead to user churn. Therefore, IT application development and maintenance teams need to pay close attention to this.
[0033] Early IT application systems based on a browser / server (BS) architecture directly fed back Hypertext Transfer Protocol (HTTP) status codes to the front-end user. For example, a 500 status code indicated an internal server error. Monitoring backend system errors was very easy at that time; simply checking the HTTP status code was sufficient. However, this approach had significant drawbacks. For instance, it exposed backend anomalies to the user, resulting in a poor user experience. It also posed security risks and couldn't effectively address business restrictions that weren't related to system errors, such as certain service plans restricting access to specific services.
[0034] Therefore, modern B / S-based IT application systems uniformly return an HTTP status code of 200 to the front end, and then include the return results from each system in the response body. Over the years, IT application systems developed by different vendors have developed significantly different front-end and back-end technologies. For example, some dynamically generate HyperText Markup Language (HTML) on the back end for the front end, while others receive Extensible Markup Language (XML) or JavaScript Object Notation (JSON) content from the back end, which is then processed and displayed on the front end using JavaScript. This makes it extremely difficult to assess user experience by monitoring the text content of the front-end and back-end interactions.
[0035] In response to this situation, commonly used monitoring methods include:
[0036] (1) Standardize the backend response through development: For example, define various error codes, including specific error messages such as errormsg, and then monitor them in a standardized manner through code logs, log collection systems or other full log collection methods.
[0037] (2) Keyword judgment by setting the response content: It is divided into two categories: success keyword judgment and failure keyword judgment. The success keyword is set as the keyword that should be included in the returned content when the business query or processing is successful. If the actual returned content does not contain the keyword, the business operation is judged to have failed. The failure keyword is the opposite.
[0038] By developing a standardized approach to backend responses, relatively standardized and regulated monitoring can be achieved. However, the cost of modification is high, the coupling between development and monitoring is high, and the monitoring method needs to be constantly improved when upgrading or adding new features.
[0039] Using keyword-based judgments on response content requires manually defining keywords for each business function, resulting in a massive workload. Furthermore, this definition method rarely encompasses all keyword definitions. In real-world business environments, application system anomalies or failures are varied, and each application system update may necessitate changes to the monitored keywords, leading to a significant maintenance workload.
[0040] In summary, both standardized development and manual keyword monitoring methods involve a significant amount of development and maintenance work, and are prone to issues such as missed or incorrect monitoring, which can affect user satisfaction and even cause user churn.
[0041] Therefore, embodiments of this application provide a data processing method, apparatus, device, medium, and product that can reduce workload and decrease the occurrence of missed monitoring and false monitoring when monitoring text content of front-end and back-end interactions.
[0042] The data processing method provided in this application will be described below with reference to specific embodiments. Figure 1 This is a flowchart illustrating a data processing method provided in an embodiment of this application. The method can be applied to electronic devices, including but not limited to mobile phones, tablets, laptops, and PDAs.
[0043] like Figure 1 As shown, the data processing method may include the following steps:
[0044] S110. Obtain business interaction data, as well as the response type and response text of the business interaction data.
[0045] S120. Classify the format of the response body text according to the response type to obtain the format type of the response body text.
[0046] S130. Perform pattern processing on the response body text of each format type to obtain the first text list.
[0047] The first text list is used to store the response body text after pattern processing.
[0048] S140. Determine the similarity between the first text list and the reference text list based on the text similarity method.
[0049] The text similarity method is related to the number of elements contained in the first text list. The elements are the patterned response text, and the reference text list is used to store reference response text.
[0050] S150. If the similarity is less than the preset similarity, issue an alarm to alert the user that the business interaction data is abnormal.
[0051] The data processing method provided in this application involves acquiring business interaction data, as well as the response type and response text of the business interaction data; classifying the format of the response text according to the response type to obtain the format type of the response text; then performing pattern processing on the response text of each format type to obtain a first text list; and determining the similarity between the first text list and a reference text list according to a text similarity method, wherein the text similarity method is related to the number of response texts contained in the first text list. In other words, this application embodiment obtains the first text list through classification and pattern processing, and executes an alarm strategy based on the similarity between the first text list and the reference text list, achieving effective monitoring of abnormal business interaction data without the need for extensive development of standardized methods or manual setting of numerous keywords, thus reducing workload.
[0052] The above steps are explained in detail below:
[0053] In S110, business interaction data can be data containing business request and response content, where the response content is the actual business response content. For example, business interaction data can be obtained in the following way:
[0054] Obtain interaction data;
[0055] The interaction data is filtered based on the response status code to obtain the interaction data with the preset response status code. The interaction data with the preset response status code is then identified as the business interaction data.
[0056] For example, the full interaction data can be collected by obtaining network traffic through a switch or server, or it can be collected through logs. For example, the interaction data can be based on the HTTP protocol or other protocols; this embodiment uses the HTTP protocol as an example. For example, the interaction data can be interaction data between a server and a client.
[0057] The embodiments of this application calculate subsequent text similarity based on full-volume real user experience interaction data, which can more accurately understand anomalies in the customer experience process.
[0058] Considering that the full - volume interaction data may contain content with various system - level response anomalies, which is irrelevant to the actual business, this part of the content can be filtered out from the full - volume interaction data to obtain the response content related to the actual business.
[0059] Exemplarily, the interaction data can be filtered according to the response status code to obtain the interaction data with the response status code being the preset response status code, and this interaction data is determined as the business interaction data. The response status code can be obtained from the first line of the response text. The preset response status code can be the status code whose response content is related to the actual business. Exemplarily, the preset response status code can be 200, that is, the interaction data with the response status code other than 200 can be filtered out to obtain the business interaction data, so as to avoid the influence of the error status code on the judgment of the response content.
[0060] The response type is the basic basis for the subsequent browser to determine the content returned by the backend. Exemplarily, the response type can be obtained from the Content - Type in the http response header. Exemplarily, the response type can include at least one of text / plain, text / html, application / json, application / xml, and other types.
[0061] In S120, the format type of the response body text usually includes at least one of unformatted, HTML format, JSON format, and XML format.
[0062] Exemplarily, the format of the response body text can be classified according to the response type. For example, if the response type is application / json, application / xml, or text / html, it can be determined that the corresponding format types of the response body text are JSON format, XML format, or HTML format respectively. In actual application, it can be carried out in the order of whether the response type is application / json, application / xml, text / html. Of course, it can also be carried out in other orders, which is not limited in the embodiments of this application.
[0063] If the response type is text / plain or other types, the format type of the response body text can be judged. The judgment rules can be in the following order: if the response body text starts with { or [, it is in JSON format; if the response body text starts with <xml, it is in XML format; if the response body text starts with <, it is in HTML format; the rest of the response body text is general text, that is, the format type of the response body text is unformatted. Of course, it can also be executed in other orders, which is not limited in the embodiments of this application. No matter which order is executed, it does not affect the final judgment result.
[0064] In S130, since the response body text is typical machine response content—meaning that various formats of response body text may contain many control elements unrelated to user perception, such as page frames and formatting control tags—taking a successful operation as an example, the HTML format response body text can be represented as follows: Operation successful ,in As the starting tag, The start and end tags are control elements that are irrelevant to user perception.
[0065] The purpose of pattern processing is to remove these control elements, reduce their interference with text similarity judgment, and improve the accuracy of similarity results.
[0066] The first text list is used to store the formatted response body text. All formatted response body texts are stored in the same text list, namely the first text list.
[0067] For example, S130 above may include the following steps:
[0068] For unformatted response body text, extract the response body text and store it in the first text list;
[0069] For the HTML-formatted response body text, extract the content between the start and end tags and store it in the first text list;
[0070] For the response body text in JSON format, extract the value and store it in the first text list;
[0071] For the response body text in XML format, extract the attribute values and tag content, and store them in the first text list.
[0072] Different format types can be processed using different patterning methods. For example, for unformatted response body text, the response body text can be extracted directly and stored as a single element in the first text list.
[0073] For the response body text in HTML format, all HTML-standardized tags are selected as innerHTML content, i.e., the actual user-visible content. For example, for Operation successful Extract only "Operation successful" and store it as an element in the first text list. As the starting tag, The opening and closing tags are used as a single element. In this embodiment, the text content between the opening and closing tags is treated as a single element. In practical applications, the HTML response body text can contain multiple elements.
[0074] The JSON format is in key-value form. In this embodiment of the application, for the response body text in JSON format, only the content of the value is extracted and stored in the first text list, with one value content corresponding to one element.
[0075] XML format includes attributes and content. For the response body text in XML format, only the attribute values and XML tag content can be extracted and stored in the first text list. For example, for<phone type="home"> 206-555-0144, where "type" is an attribute, "home" is the value of the attribute, and 206-555-0144 is the content of the XML tag.
[0076] In this embodiment of the application, after classifying the format of the response body text, the response body text of different formats is subjected to pattern processing to remove control elements that are irrelevant to user perception, so as to obtain text containing only actual content, which provides a basis for subsequent similarity processing.
[0077] In S140, there are many commonly used text similarity methods, such as cosine similarity, edit distance, and Jaccard similarity coefficient. These methods generally require word segmentation, and then the root words are used to calculate the relevant similarity. Some algorithms also require semantic analysis to analyze the text similarity in natural language.
[0078] Considering that the response content in IT application systems is generated by code, this application embodiment adopts a text similarity method (also known as a machine composite text similarity method) that is related to the number of elements contained in the first text list. That is, the text similarity method used is different depending on the number of elements contained in the first text list. In this way, an appropriate text similarity method can be selected according to the number of elements in the first text list. In addition, this text similarity method combines the characteristics of machine response text, which can improve the accuracy of similarity results.
[0079] The reference text list is used to store reference response body texts. The reference response body texts represent the texts that respond normally to the corresponding business request. When the response body texts of subsequent requests for the same business request are compared with the reference response body texts, i.e., the normal response body texts, the lower the similarity, the less similar they are to the normal response body texts. Based on the similarity, it can be determined whether the response is abnormal.
[0080] For details on the process of obtaining the response text and the text similarity method, please refer to the following examples.
[0081] In S150, the smaller the similarity (between 0 and 1), the less similar it is to the normal response text. For example, when the similarity is less than a preset similarity, an alarm can be issued to indicate that the user's business interaction data is abnormal. For example, the preset similarity can be set to 0.5, and the specific value can be set according to actual needs. This application embodiment does not limit the specific value.
[0082] For example, when issuing an alarm, it can be based on a single instance of similarity being lower than a preset similarity, or it can be based on an average similarity being lower than a preset similarity over a period of time, such as 5 minutes.
[0083] For example, when an alarm occurs, the request identifier and user identifier corresponding to the business interaction data can be displayed. This can effectively distinguish various unexpected faults in the user experience process, and at the same time, it can help maintenance personnel locate the cause of the alarm, make timely rectifications, and improve user satisfaction.
[0084] For example, when an alarm occurs, the alarm data can be stored in a unified alarm platform to facilitate subsequent alarm statistical analysis and improve system performance.
[0085] The request identifier can be obtained through a hash operation. For example, a hash operation can be performed based on the request path and parameters related to the business request to obtain a hash value. This hash value can then be used as the request identifier for the business request to uniquely identify it.
[0086] The user identifier can be selected from client_ip, x_forwarded_for, and account. For example, it can be selected in descending order of priority, such as account, x_forwarded_for, and client_ip; only one can be selected. Here, client_ip is the IP address of the client based on the Transmission Control Protocol (TCP); x_forwarded_for is the user's original IP address, which can be obtained from the HTTP request header; and account can be obtained from cookies or request parameters in the business request, depending on the application definition.
[0087] The text similarity method of this application will be described below through specific embodiments, such as... Figure 2 As shown, the data processing method may include the following steps:
[0088] S210. Obtain business interaction data, as well as the response type and response text of the business interaction data.
[0089] S220. Classify the format of the response body text according to the response type to obtain the format type of the response body text.
[0090] S230. Perform pattern processing on the response body text of each format type to obtain the first text list.
[0091] S240. Does the number of elements contained in the first text list exceed or equal to a preset threshold? If yes, then execute S250; otherwise, execute S280.
[0092] S250. For each element in the first text list, determine whether there is an element in the reference text list that is the same as the element.
[0093] S260. Based on the determined results, determine the similarity between the element and the reference text list.
[0094] S270. The average similarity of each element is determined as the similarity between the first text list and the reference text list.
[0095] S280. For each character of the first element, determine whether there exists a character identical to the character in each element of the reference text list.
[0096] The first element is any element in the first file list.
[0097] S290. Based on the determined results, determine the first similarity between the first element and the reference text list.
[0098] S2100. The mean of the first similarity values corresponding to each first element in the first file list is determined as the similarity between the first text list and the reference text list.
[0099] S2110. If the similarity is less than the preset similarity, issue an alarm to notify the user that the business interaction data is abnormal.
[0100] The processes S210-S230 and S2110 are the same as those S110-S130 and S150, and can be found in the descriptions of S110-S130 and S150 for details. The other steps are described in detail below:
[0101] In S240, taking the JSON-formatted response body text as an example, each element represents a certain attribute, such as a user profile query. The more elements there are, the more user attribute dimensions are included, which also means that different users will have more identical elements when querying. When there are many elements, there will be more characters appearing. If single-character comparison is used, even two completely dissimilar texts are more likely to find the same characters, thus affecting the final similarity result.
[0102] Based on the above considerations, for example, when the number of elements in the first text list is greater than or equal to a preset threshold, an element-based equality judgment method can be used, i.e., a comparison method based on single-element equality, to determine the similarity between the first text list and the reference text list. Since the response text is machine-generated text, not natural language, it has a certain degree of randomness. Therefore, the effective information of a single identical element is clearly deterministic; it is either identical or completely inconsistent, rarely resulting in string inclusion. This improves the accuracy of the similarity results. The preset threshold can be set according to actual needs, for example, it can be set to 5.
[0103] For example, when the number of elements in the first text list is less than a preset threshold, a character-level inclusion judgment method can be used, that is, the similarity between the first text list and the reference text list can be determined by comparing single characters.
[0104] The embodiments of this application employ an element-based equality judgment method or a character-level inclusion judgment method based on the number of elements in the first text list. This can effectively distinguish various unexpected faults in the user experience process without requiring extensive development or manual setting of a large number of keywords. This reduces workload and avoids phenomena such as missed monitoring or false monitoring caused by a large workload, thereby improving the accuracy of monitoring results.
[0105] In S250, when the number of elements in the first text list is greater than or equal to a preset threshold, an element-based equality judgment method can be used. Specifically, for each element in the first text list, it can be determined whether there is an element in the reference text list that is the same as (also called equal to) that element. Based on the result, the similarity between that element and the reference text list can be determined, and thus the similarity between the first text list and the reference text list can be obtained.
[0106] In S260, for each element in the first text list, if there is an element identical to that element in the reference text list, the similarity between that element and the reference text list can be denoted as S. i =1, otherwise S i =0, S i This represents the i-th element of the first text list.
[0107] For example, for the element "Operation successful" in the first text list, if the element "Operation successful" also exists in the reference text list, then the similarity between the element "Operation successful" in the first text list and the reference text list is recorded as 1; otherwise, it is recorded as 0. In this way, the similarity between each element and the reference text list can be obtained.
[0108] In S270, exemplarily, the average value AVG of the similarities corresponding to each element, where AVG = (S1 + S2 + S3 + … + S n ) / n, can be determined as the similarity between the first text list and the reference text list. Here, n is the number of elements in the first text list, and exemplarily, n is greater than or equal to 5.
[0109] In the embodiments of this application, when the number of elements in the first text list is greater than or equal to the preset threshold, a judgment method equal to a single element of the reference text list is adopted to determine the similarity between the first text list and the reference text list, which can effectively avoid the loose strategy of character inclusion when there are too many text elements in the response body, thus improving the accuracy of the similarity result.
[0110] For example, when the number of elements in the first text list is large, for example, for "Operation successful", if compared character by character, for example, for the character "操", if the reference text list has an element "体操", it is easy to determine that there is a character in the reference text list that is the same as "操", and thus the similarity of "操" is recorded as 1. However, in fact, "Operation successful" and "体操" are not the same. When the number of elements in the first text list is large, adopting the above judgment method of single-element equality can effectively avoid this loose strategy of character inclusion and improve the accuracy of the similarity result.
[0111] In S280, when the number of elements in the first text list is small, a character-level inclusion judgment method can be adopted. Exemplarily, for each character of each element, it can be determined whether there is a character in each element of the reference text list that is the same as this character. For example, for the element "你本次充值50.00元", which contains 11 characters in total, for each character, it can be searched in each element of the reference text list whether there is this character.
[0112] In S290, exemplarily, if this character exists in a certain element of the reference text list, the similarity of this character can be recorded as 1, otherwise it is recorded as 0. For example, for "你本次充值50.00元", for each character, it can be determined whether there is the same character in the first element of the reference text list. If it exists, the similarity is recorded as 1, otherwise the similarity is recorded as 0. Thus, the similarity between the element "你本次充值50.00元" and the first element of the reference text list can be obtained. Similarly, the similarity between the element "你本次充值50.00元" and other elements of the reference text list can be obtained.
[0113] Exemplarily, S1 = MAX(S 1-1 , S 1-2 , …, S 1-m ), where m is the number of elements in the reference text list, and S 1-mThis represents the similarity between the first element of the first text list and the m-th element of the reference text list. Thus, we can obtain S1, S2, ..., S... n , where n is the number of elements contained in the first text list, and n is less than 5.
[0114] In S2100, after the similarity of each element is determined, for example, the mean of the similarity of each element can be determined as the similarity between the first file list and the reference text list.
[0115] In this embodiment of the application, when the number of elements in the first text list is small, i.e. the amount of information is small, the inclusion judgment relationship at the character level can avoid misjudgment caused by inequality due to personalized data settings.
[0116] This application embodiment collects interaction data during the user experience process, classifies the format of the response text according to the response type, and then performs pattern processing on different categories to remove control elements irrelevant to user perception. The similarity between the patterned response text and the reference text is determined by the machine composite text similarity method. This method takes into account the characteristics of machine response text, can adapt to response text of various formats and lengths, and can also adapt to a certain degree of change in text content. It does not require precise matching of keywords in the response text to judge abnormal business interaction data, making it more dynamic and accurate.
[0117] The process of obtaining the reference response text is explained below through specific examples.
[0118] In some embodiments, prior to S120, the data processing method may further include the following steps:
[0119] Based on the first hash value, the response body text corresponding to the first hash value is retrieved from the first data structure. The first hash value is the hash value corresponding to the request identifier of the business interaction data. The first data structure is used to store the hash value and the response body text corresponding to the hash value.
[0120] If no response body text corresponding to the first hash value is found in the first data structure, the first response body text corresponding to the first hash value in the business interaction data is used as the reference response body text and stored in the reference text list.
[0121] If a response text corresponding to the first hash value is found in the first data structure, the similarity between the found response text and the first response text is determined.
[0122] Among the found response texts, those whose similarity to the first response text is greater than or equal to the similarity threshold are stored as reference response texts in the reference text list.
[0123] Specifically, the process for determining the first hash value can be found in the above embodiments, and will not be repeated here. The first data structure is used to store the hash value and the corresponding response body text. For example, it can be stored in key-value format. For instance, the hash value corresponding to the request identifier can be used as the key, and the corresponding response body text can be used as the value. The response body text stored in the first data structure is the normal response body text.
[0124] Once the hash value of the current business request is determined, the first data structure can be searched to determine whether there is a hash value in the first data structure that is the same as the first hash value. In this way, it can be determined whether there is a response body text in the first data structure that corresponds to the current business request.
[0125] If no hash value matching the first hash value is found in the first data structure, it indicates that this is the first occurrence of the current business request. In this case, the first response body text corresponding to the first hash value in the business interaction data, i.e., the response body text of the current business request, can be stored as a reference response body text in the reference text list. Simultaneously, the hash value of the current business request can be used as the key, and the response body text as the value, and saved to the first data structure for comparison by subsequent business requests with the same hash value. The first response body text is the response body text corresponding to the first hash value in the business interaction data.
[0126] If a hash value identical to the first hash value is found in the first data structure, it means that the current business request has already occurred before. Assuming that the response body text was abnormal when the business request first occurred due to system or other reasons, using the first response body text as a reference would affect the accuracy of subsequent similarity results.
[0127] To avoid using the first response text as a reference response text when an anomaly occurs, this application embodiment can determine the similarity between the response text of the found same business request and the response text of the current business request (i.e., the first response text). Response texts among the found response texts whose similarity to the first response text is greater than or equal to a similarity threshold are stored as reference response texts in a reference text list.
[0128] For example, if the current business request is the fourth occurrence, meaning it has already occurred three times before, we can calculate the similarity between the response body text of the first three business requests and the response body text of the fourth business request, and use the response body text with the highest similarity as the reference response body text.
[0129] The process for determining the similarity between two response texts can be found in the above embodiments, and will not be repeated here for the sake of brevity.
[0130] In this embodiment of the application, when the current business request appears for the first time, the corresponding response body text can be directly used as the reference response body text. If it is not the first time it appears, the similarity between the response body text corresponding to the current business request and the response body text corresponding to the previous appearance can be determined, and the response body text with the highest similarity can be used as the reference response body text. This can avoid the situation where the first response body text is used as the reference response body text, but happens to be abnormal. This can improve the accuracy of the reference response body text.
[0131] Based on the same inventive concept, this application also provides a data processing device, which is described below in conjunction with... Figure 3 The data processing apparatus provided in the embodiments of this application will be described in detail.
[0132] Figure 3 This is a structural diagram of a data processing apparatus provided in an embodiment of this application.
[0133] like Figure 3 As shown, the data processing apparatus may include:
[0134] The acquisition module 310 is used to acquire business interaction data, as well as the response type and response text of the business interaction data;
[0135] The classification module 320 is used to classify the format of the response body text according to the response type, and obtain the format type of the response body text;
[0136] Processing module 330 is used to perform pattern processing on the response body text of each format type to obtain a first text list, which is used to store the pattern-processed response body text.
[0137] The determination module 340 is used to determine the similarity between the first text list and the reference text list according to the text similarity method. The text similarity method is related to the number of elements contained in the first text list. The elements are the response body text after pattern processing. The reference text list is used to store the reference response body text.
[0138] The alarm module 350 is used to issue an alarm when the similarity is less than the preset similarity, so as to notify the user of abnormal business interaction data.
[0139] The data processing apparatus provided in this application embodiment acquires business interaction data, as well as the response type and response text of the business interaction data; classifies the format of the response text according to the response type to obtain the format type of the response text; then performs pattern processing on the response text of each format type to obtain a first text list; and determines the similarity between the first text list and a reference text list according to a text similarity method, wherein the text similarity method is related to the number of response texts contained in the first text list. In other words, this application embodiment obtains the first text list through classification and pattern processing, and executes an alarm strategy based on the similarity between the first text list and the reference text list, thus achieving effective monitoring of abnormal business interaction data without the need for extensive development of standardized methods or manual setting of numerous keywords, thereby reducing workload.
[0140] In some embodiments, the format type includes at least one of Hypertext Markup Language (HTML) format, Object Notation Format (JSON) format, Extensible Markup Language (XML) format, and no format.
[0141] In some embodiments, the processing module 330 is specifically used for:
[0142] For unformatted response body text, extract the response body text and store it in the first text list;
[0143] For the HTML-formatted response body text, extract the content between the start and end tags and store it in the first text list;
[0144] For the response body text in JSON format, extract the value and store it in the first text list;
[0145] For the response body text in XML format, extract the attribute values and tag content, and store them in the first text list.
[0146] In some embodiments, when the number of elements contained in the first text list is greater than or equal to a preset threshold, the text similarity method includes an element-based method for determining equi-associations.
[0147] Module 340 is specifically used for:
[0148] For each element in the first text list, determine whether there exists an element in the reference text list that is the same as the element;
[0149] Based on the determined results, determine the similarity between the element and the list of reference texts;
[0150] The average similarity of each element is used as the similarity between the first text list and the reference text list.
[0151] In some embodiments, when the number of elements contained in the first text list is less than a preset threshold, the text similarity method includes a character-level inclusion determination method;
[0152] Module 340 is specifically used for:
[0153] For each character in the first element, determine whether there is a character identical to the character in each element of the reference text list. The first element is any element in the first file list.
[0154] Based on the determined results, determine the first similarity between the first element and the list of reference texts;
[0155] The average of the first similarities corresponding to each first element in the first file list is determined as the similarity between the first text list and the reference text list.
[0156] In some embodiments, the data processing apparatus may further include:
[0157] The lookup module is used to search for the response body text corresponding to the first hash value in the first data structure before the classification module 320 classifies the response body text according to the response type and obtains the format type of the response body text. The first hash value is the hash value corresponding to the request identifier of the business interaction data. The first data structure is used to store the hash value and the response body text corresponding to the hash value.
[0158] Module 340 is also used for:
[0159] If no response body text corresponding to the first hash value is found in the first data structure, the first response body text corresponding to the first hash value in the business interaction data is used as the reference response body text and stored in the reference text list.
[0160] If a response text corresponding to the first hash value is found in the first data structure, the similarity between the found response text and the first response text is determined.
[0161] Among the found response texts, those whose similarity to the first response text is greater than or equal to the similarity threshold are stored as reference response texts in the reference text list.
[0162] In some embodiments, the acquisition module 310 is specifically used for:
[0163] Obtain interaction data;
[0164] The interaction data is filtered based on the response status code to obtain the interaction data with the preset response status code. The interaction data with the preset response status code is then identified as the business interaction data.
[0165] Figure 3 Each module in the illustrated device has the ability to implement Figures 1-2 The functions of each step and the corresponding technical effects are described briefly and will not be elaborated here.
[0166] Based on the same inventive concept, embodiments of this application also provide an electronic device, such as a mobile phone, tablet computer, laptop computer, PDA, etc. Figure 4 The electronic devices provided in the embodiments of this application will be described in detail.
[0167] like Figure 4 As shown, the electronic device may include a processor 410 and a memory 420 for storing computer program instructions.
[0168] Processor 410 may include a central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more integrated circuits that may be configured to implement the embodiments of this application.
[0169] Memory 420 may include mass storage for data or instructions. For example, and not limitingly, memory 420 may include a hard disk drive (HDD), a floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or a Universal Serial Bus (USB) drive, or a combination of two or more of these. In one instance, memory 420 may include removable or non-removable (or fixed) media, or memory 420 may be non-volatile solid-state memory. In one instance, memory 420 may be read-only memory (ROM). In one instance, the ROM may be a mask-programmed ROM, a programmable ROM (PROM), an erasable PROM (EPROM), an electrically erasable PROM (EEPROM), an electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these.
[0170] The processor 410 reads and executes computer program instructions stored in the memory 420 to achieve... Figure 1 and Figure 2 The method in the illustrated embodiment achieves... Figure 1 and Figure 2 The corresponding technical effects achieved by the methods in the illustrated embodiments are described briefly and will not be elaborated further here.
[0171] In one example, the electronic device may also include a communication interface 430 and a bus 440. Wherein, as... Figure 4 As shown, the processor 410, memory 420, and communication interface 430 are connected through bus 440 and complete communication with each other.
[0172] The communication interface 430 is mainly used to realize communication between various modules, devices and / or equipment in the embodiments of this application.
[0173] Bus 440 includes hardware, software, or both, that couples the components of an electronic device together. For example, and not as a limitation, bus 440 may include an Accelerated Graphics Port (AGP) or other graphics bus, an Extended Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hyper Transport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an Infinite Bandwidth Interconnect, a Low Pin Count (LPC) bus, a memory bus, a Microchannel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a Video Electronics Standards Association Local (VLB) bus, or other suitable buses, or combinations of two or more of these. Where appropriate, bus 440 may include one or more buses. Although specific buses are described and illustrated in embodiments of this application, any suitable bus or interconnect is contemplated herein.
[0174] After acquiring business interaction data, as well as the response type and response text of the business interaction data, the electronic device can execute the data processing method in the embodiments of this application, thereby achieving a combination of... Figure 1 and Figure 2 The described data processing methods and Figure 3 The data processing device described.
[0175] Furthermore, in conjunction with the data processing methods in the above embodiments, this application embodiment can provide a computer storage medium for implementation. The computer storage medium stores computer program instructions; when these computer program instructions are executed by a processor, they implement any of the data processing methods in the above embodiments.
[0176] It should be clarified that this application is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of this application is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of this application.
[0177] The functional blocks shown in the above-described block diagram can be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, they can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this application are programs or code segments used to perform the required tasks. Programs or code segments can be stored on a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried on a carrier wave. "Machine-readable medium" can include any medium capable of storing or transmitting information. Examples of machine-readable media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, etc. Code segments can be downloaded via computer networks such as the Internet, intranets, etc.
[0178] It should also be noted that the exemplary embodiments mentioned in this application describe methods or systems based on a series of steps or apparatus. However, this application is not limited to the order of the above steps; that is, the steps can be performed in the order mentioned in the embodiments, or in a different order, or several steps can be performed simultaneously.
[0179] The aspects of embodiments of this application have been described above with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It should be understood that each block in the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to create a machine such that these instructions, executable via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions / actions specified in one or more blocks of the flowchart illustrations and / or block diagrams. Such a processor can be, but is not limited to, a general-purpose processor, a special-purpose processor, a special application processor, or a field-programmable logic circuit. It is also understood that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can also be implemented by dedicated hardware performing the specified functions or actions, or can be implemented by a combination of dedicated hardware and computer instructions.
[0180] The above description is merely a specific implementation of this application. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, modules, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here. It should be understood that the protection scope of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and these modifications or substitutions should all be covered within the protection scope of this application.
Claims
1. A data processing method, characterized by, include: Obtain business interaction data, as well as the response type and response text of the business interaction data; The format type of the response body text is obtained by classifying the format of the response body text according to the response type. The response body text of each format type is processed into a pattern to obtain a first text list, which is used to store the patterned response body text. The similarity between the first text list and the reference text list is determined according to a text similarity method. The text similarity method is related to the number of elements contained in the first text list. The elements are the patterned response text. The reference text list is used to store reference response text. The text similarity method includes an element-based equal association judgment method and a character-level inclusion judgment method. The element-based equal association judgment method is used when the number is greater than or equal to a preset threshold. The character-level inclusion judgment method is used when the number is less than the preset threshold. If the similarity is less than a preset similarity, an alarm will be issued to alert the user that the business interaction data is abnormal.
2. The method of claim 1, wherein, The format type includes at least one of Hypertext Markup Language (HTML) format, Object Notation Format (JSON) format, Extensible Markup Language (XML) format, and no format.
3. The method of claim 2, wherein, The process of patterning the response body text of each format type yields a first text list, including: For unformatted response body text, extract the response body text and store it in the first text list; For the HTML formatted response body text, extract the content between the start tag and the end tag and store it in the first text list; For the response body text in JSON format, extract the value content and store it in the first text list; For the response body text in XML format, extract the attribute values and tag content, and store them in the first text list.
4. The method according to claim 1, characterized in that, When the number of elements contained in the first text list is greater than or equal to a preset threshold, the text similarity method includes an element-based method for determining equi-associations. Determining the similarity between the first text list and the reference text list using a text similarity method includes: For each element in the first text list, determine whether there exists an element in the reference text list that is identical to the element; Based on the determined results, the similarity between the element and the list of reference texts is determined; The average similarity of each element is determined as the similarity between the first text list and the reference text list.
5. The method of claim 1, wherein, If the number of elements contained in the first text list is less than a preset threshold, the text similarity method includes a character-level inclusion judgment method; Determining the similarity between the first text list and the reference text list using a text similarity method includes: For each character of the first element, determine whether there is a character identical to the first character in each element of the reference text list, wherein the first element is any element in the first text list; Based on the determination results, a first similarity between the first element and the list of reference texts is determined; The average of the first similarities corresponding to each first element in the first text list is determined as the similarity between the first text list and the reference text list.
6. The method of claim 1, wherein, Before classifying the response body text according to the response type to obtain the format type of the response body text, the method further includes: Based on the first hash value, the response body text corresponding to the first hash value is retrieved from the first data structure. The first hash value is the hash value corresponding to the request identifier of the business interaction data. The first data structure is used to store the hash value and the response body text corresponding to the hash value. If no response text corresponding to the first hash value is found in the first data structure, the first response text corresponding to the first hash value in the business interaction data is used as a reference response text and stored in the reference text list. If a response text corresponding to the first hash value is found in the first data structure, the similarity between the found response text and the first response text is determined. The response texts found that have a similarity greater than or equal to the first response text are stored as reference response texts in the reference text list.
7. The method according to any one of claims 1 to 6, characterized in that, The acquisition of business interaction data includes: Obtain interaction data; The interaction data is filtered according to the response status code of the interaction data to obtain the interaction data with the preset response status code, and the interaction data with the preset response status code is determined as the business interaction data.
8. A data processing apparatus, characterized by, include: The acquisition module is used to acquire business interaction data, as well as the response type and response text of the business interaction data; A classification module is used to classify the format of the response body text according to the response type, and to obtain the format type of the response body text; The processing module is used to perform pattern processing on the response body text of each format type to obtain a first text list, which is used to store the pattern-processed response body text. The determination module is used to determine the similarity between the first text list and the reference text list according to a text similarity method. The text similarity method is related to the number of elements contained in the first text list. The elements are the response text after pattern processing. The reference text list is used to store reference response text. The text similarity method includes an element-based equal association judgment method and a character-level inclusion judgment method. The element-based equal association judgment method is used when the number is greater than or equal to a preset threshold. The character-level inclusion judgment method is used when the number is less than the preset threshold. The alarm module is used to issue an alarm when the similarity is less than a preset similarity, so as to notify the user of abnormal business interaction data.
9. An electronic device, comprising: include: processor; Memory is used to store computer program instructions; When the computer program instructions are executed by the processor, the method as described in any one of claims 1-7 is implemented.
10. A computer-readable storage medium having computer program instructions stored thereon, characterized in that, When the computer program instructions are executed by the processor, the method as described in any one of claims 1-7 is implemented.
11. A computer program product, characterised in that, When the instructions in the computer program product are executed by the processor of the electronic device, the electronic device performs the method as described in any one of claims 1-7.