Method and device for identifying abnormal shipping data, electronic equipment and storage medium
By using multi-dimensional anomaly detection of waybill datasets and employing the COPOD algorithm or box plot algorithm to identify outliers in waybill information, the problem of incomplete waybill anomaly identification in existing technologies is solved, and rapid and automatic anomaly detection of waybill datasets is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SF TECH CO LTD
- Filing Date
- 2021-07-30
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies are unable to automatically identify anomalies in all information dimensions of waybills simultaneously, resulting in incomplete and untimely anomaly identification.
By acquiring information from multiple target waybills in the waybill dataset, applying the COPOD algorithm or box plot algorithm to determine outliers in the waybill information, and outputting the outlier information of the outlier waybill data, multi-dimensional anomaly detection of the waybill dataset is achieved.
It enables the automatic and rapid identification of abnormal waybill data and their abnormal information dimensions from a large dataset of waybills, improving the comprehensiveness and timeliness of waybill anomaly identification.
Smart Images

Figure CN115700673B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of logistics technology, specifically to a method, apparatus, electronic device, and computer-readable storage medium for identifying abnormal waybill data. Background Technology
[0002] With rapid economic development, the logistics industry is playing an increasingly important role in society. In the logistics field, waybill information is typically recorded across various dimensions, such as weight, quantity, sender and recipient addresses, order placement time, delivery time, and route information, to facilitate waybill information retrieval and management.
[0003] Current technologies for identifying abnormal waybill data primarily involve checking for anomalies during a specific operation or information recording process. However, this method mainly targets anomalies in a single dimension of the waybill information, making it difficult to comprehensively and promptly reflect anomalies across all information dimensions. Therefore, current technologies struggle to automatically and simultaneously identify anomalies in all information dimensions of a waybill. Summary of the Invention
[0004] This application provides a method, apparatus, electronic device, and computer-readable storage medium for identifying abnormal waybill data, aiming to solve the problem in the prior art that it is difficult to automatically identify abnormalities in various information dimensions of waybills simultaneously.
[0005] Firstly, this application provides a method for identifying anomalies in waybill data, the method comprising:
[0006] Based on the waybill information of each waybill data under each target information dimension, a waybill dataset to be identified is obtained, wherein the waybill dataset to be identified includes multiple target waybill information X. ij 1≤i≤n, 1≤j≤d, where n represents the number of waybill data items included in the waybill dataset, d represents the number of information dimensions included in each waybill data item, and the target waybill information X ij This represents the information of the i-th waybill data in the j-th dimension;
[0007] Based on the dataset of waybills to be identified and the preset anomaly detection strategy, determine the information X of each target waybill in the dataset of waybills to be identified. ij Outliers, where the target waybill information X in the j-th dimension. ij Outliers are based on multiple target waybill information X in the j-th dimension. 1j X 2j ... X nj Sure;
[0008] Based on the target waybill information X in the waybill dataset to be identifiedij The system identifies outliers and outputs anomaly information for the abnormal waybill data in the dataset to be identified. The abnormal waybill data refers to waybill data where the target waybill information in one or more information dimensions is abnormal. The anomaly information is used to reflect the abnormal information dimensions of the abnormal waybill data.
[0009] Secondly, this application provides a device for identifying abnormal waybill data, the device comprising:
[0010] The acquisition unit is used to acquire a dataset of waybills to be identified based on the waybill information of each waybill data under each target information dimension. The dataset of waybills to be identified includes multiple target waybill information X. ij 1≤i≤n, 1≤j≤d, where n represents the number of waybill data items included in the waybill dataset, d represents the number of information dimensions included in each waybill data item, and the target waybill information X ij This represents the information of the i-th waybill data in the j-th dimension;
[0011] The detection unit is used to determine the information X of each target waybill in the waybill dataset to be identified based on the dataset to be identified and a preset anomaly detection strategy. ij Outliers, where the target waybill information X in the j-th dimension. ij Outliers are based on multiple target waybill information X in the j-th dimension. 1j X 2j ... X nj Sure;
[0012] The output unit is used to output information X of each target waybill in the data set of waybills to be identified. ij The system identifies outliers and outputs anomaly information for the abnormal waybill data in the dataset to be identified. The abnormal waybill data refers to waybill data where the target waybill information in one or more information dimensions is abnormal. The anomaly information is used to reflect the abnormal information dimensions of the abnormal waybill data.
[0013] In one possible implementation of this application, the preset anomaly detection strategy is the COPOD algorithm, and the detection unit is specifically used for:
[0014] Based on the dataset of waybills to be identified and the COPOD algorithm, determine the left-tail empirical coefficient of each target waybill information Xij;
[0015] Based on the dataset of waybills to be identified and the COPOD algorithm, determine the right-tail empirical coefficient of each target waybill information Xij;
[0016] Based on the data set of waybills to be identified, the left-tail empirical coefficient of each target waybill information Xij, and the right-tail empirical coefficient of each target waybill information Xij, determine the skewness correction coefficient of each target waybill information Xij.
[0017] The outliers of each target waybill information Xij are determined based on the left-tail empirical coefficient, the right-tail empirical coefficient, and the skewness correction coefficient.
[0018] In one possible implementation of this application, the detection unit is specifically used for:
[0019] Based on the data set of waybills to be identified, determine the distribution function of waybill information in the j-th dimension;
[0020] Based on the waybill information distribution function of the j-th dimension and the COPOD algorithm, determine the left-tail empirical function of the j-th dimension;
[0021] Based on the left-tail empirical function of the j-th dimension and the target waybill information Xij, determine the left-tail empirical coefficient of each target waybill information Xij.
[0022] In one possible implementation of this application, the detection unit is specifically used for:
[0023] Based on the waybill information distribution function of the j-th dimension and the COPOD algorithm, determine the right-tailed empirical function of the j-th dimension;
[0024] Based on the right-tailed empirical function of the j-th dimension and each target waybill information Xij, determine the right-tailed empirical coefficient of each target waybill information Xij.
[0025] In one possible implementation of this application, the detection unit is specifically used for:
[0026] Based on the dataset of waybills to be identified and the COPOD algorithm, calculate the skewness value of each target waybill information Xij;
[0027] Based on the left-tail empirical coefficient, the right-tail empirical coefficient, and the skewness value of each target waybill information Xij, determine the skewness correction coefficient for each target waybill information Xij.
[0028] In one possible implementation of this application, the detection unit is specifically used for:
[0029] The maximum value among the left-tail empirical coefficient, the right-tail empirical coefficient, and the skewness correction coefficient for each target waybill information Xij is detected.
[0030] The maximum value is taken as the outlier value of each target waybill information Xij.
[0031] In one possible implementation of this application, the output unit is specifically used for:
[0032] Based on the outlier values of each target waybill information Xij in the waybill dataset to be identified, detect the abnormal target waybill information Xij whose outlier values in the waybill dataset to be identified are greater than a preset threshold.
[0033] Output the abnormal information of abnormal waybill data in the data set to be identified, wherein the abnormal information includes the abnormal target waybill information Xij and / or the abnormal value of the abnormal target waybill information Xij, and the abnormal target waybill information Xij is used to indicate that there is an abnormality in the target waybill information of the j-th dimension in the data set to be identified.
[0034] Thirdly, this application also provides an electronic device, which includes a processor and a memory, wherein the memory stores a computer program, and when the processor calls the computer program in the memory, it executes the steps in any of the waybill data anomaly identification methods provided in this application.
[0035] Fourthly, this application also provides a computer-readable storage medium having a computer program stored thereon, the computer program being loaded by a processor to execute the steps in the method for identifying waybill data anomalies.
[0036] This application utilizes multiple target waybill information X from the j-th dimension of the waybill dataset to be identified. 1j X 2j ... X nj Determine the target waybill information X in the j-th dimension. ij The outliers are identified to obtain the target waybill information X in the waybill dataset to be identified. ij Outliers; due to the abnormal values of each target waybill information X ij Outliers can, to some extent, reflect whether there are anomalies in the target waybill information across various information dimensions, thus enabling the identification of anomalies based on the target waybill information X in the to-be-identified waybill dataset. ij The system identifies outliers and outputs the anomaly information of the abnormal waybill data in the data set to be identified. Since the anomaly information of the abnormal waybill data reflects the abnormal information dimensions of the abnormal waybill data, through the embodiments of this application, on the one hand, it can automatically and quickly find the abnormal waybill data with anomalies from multiple waybill data included in a large number of waybill datasets; on the other hand, it can also automatically and quickly find the abnormal information dimensions in each information dimension of the abnormal waybill data. Attached Figure Description
[0037] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0038] Figure 1 This is a schematic diagram of a scenario for the waybill data anomaly identification and detection system provided in this application embodiment;
[0039] Figure 2 This is a flowchart illustrating a method for identifying abnormal waybill data provided in an embodiment of this application.
[0040] Figure 3 This is a schematic flowchart of an embodiment of step 203 provided in this application;
[0041] Figure 4 This is a schematic flowchart of an embodiment of step 202 provided in this application;
[0042] Figure 5 This is a schematic flowchart of an embodiment of step 401 provided in this application;
[0043] Figure 6 This is a schematic flowchart of an embodiment of step 402 provided in this application;
[0044] Figure 7 This is a schematic diagram of an embodiment of the waybill data anomaly identification device provided in this application.
[0045] Figure 8 This is a schematic diagram of an embodiment of the electronic device provided in this application. Detailed Implementation
[0046] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0047] In the description of the embodiments of this application, it should be understood that the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, features defined with "first" and "second" may explicitly or implicitly include one or more of the stated features. In the description of the embodiments of this application, "multiple" means two or more, unless otherwise explicitly specified.
[0048] To enable any person skilled in the art to implement and use this application, the following description is provided. In this description, details are set forth for purposes of explanation. It should be understood that those skilled in the art will recognize that this application can be implemented without using these specific details. In other instances, well-known processes will not be described in detail to avoid obscuring the description of the embodiments of this application with unnecessary detail. Therefore, this application is not intended to be limited to the embodiments shown, but is consistent with the broadest scope of the principles and features disclosed in the embodiments of this application.
[0049] This application provides a method, apparatus, electronic device, and computer-readable storage medium for identifying waybill data anomalies. The waybill data anomaly identification apparatus can be integrated into an electronic device, which can be a server or a terminal, etc.
[0050] The execution subject of the method for identifying abnormal waybill data in this application embodiment can be the waybill data abnormality identification device provided in this application embodiment, or different types of electronic devices such as server equipment, physical host, or user equipment (UE) that integrate the waybill data abnormality identification device. The waybill data abnormality identification device can be implemented in hardware or software. The UE can specifically be a terminal device such as a smartphone, tablet computer, laptop computer, handheld computer, desktop computer, or personal digital assistant (PDA).
[0051] This electronic device can operate independently or in a cluster. By applying the waybill data anomaly identification method provided in this application, it can automatically and quickly identify abnormal waybill data from a large dataset of waybill data. Furthermore, it can automatically and quickly identify abnormal information dimensions in each information dimension of the abnormal waybill data, thus achieving automatic and simultaneous identification of anomalies in various information dimensions of waybills.
[0052] See Figure 1 , Figure 1This is a schematic diagram of a scenario for a waybill data anomaly identification system provided in this application embodiment. The waybill data anomaly identification system may include an electronic device 100, which integrates a waybill data anomaly identification device. For example, the electronic device can obtain a waybill dataset to be identified based on waybill information of each waybill data under each target information dimension, wherein the waybill dataset to be identified includes multiple target waybill information X. ij 1≤i≤n, 1≤j≤d, where n represents the number of waybill data items included in the waybill dataset, d represents the number of information dimensions included in each waybill data item, and the target waybill information X ij Let X represent the waybill information of the i-th waybill data in the j-th dimension; based on the waybill dataset to be identified and the preset anomaly detection strategy, determine the target waybill information X in the waybill dataset to be identified. ij Outliers, where the target waybill information X in the j-th dimension. ij Outliers are based on multiple target waybill information X in the j-th dimension. 1j X 2j ... X nj Determine; based on the target waybill information X in the waybill dataset to be identified. ij The system identifies outliers and outputs anomaly information for the abnormal waybill data in the dataset to be identified. The abnormal waybill data refers to waybill data where the target waybill information in one or more information dimensions is abnormal. The anomaly information is used to reflect the abnormal information dimensions of the abnormal waybill data.
[0053] In addition, such as Figure 1 As shown, the waybill data anomaly identification system may also include a memory 200 for storing data, such as waybill information including the name, weight, quantity, sender and receiver address, order time, receipt time, and the time of each route change.
[0054] It should be noted that, Figure 1 The schematic diagram of the waybill data anomaly identification system shown is merely an example. The waybill data anomaly identification system and scenario described in this application embodiment are for the purpose of more clearly illustrating the technical solutions of this application embodiment and do not constitute a limitation on the technical solutions provided by this application embodiment. As those skilled in the art will know, with the evolution of waybill data anomaly identification systems and the emergence of new business scenarios, the technical solutions provided by the embodiments of this invention are also applicable to similar technical problems.
[0055] The following describes the method for identifying abnormal waybill data provided in the embodiments of this application. In the embodiments of this application, an electronic device is used as the execution subject. For the sake of simplicity and ease of description, the execution subject will be omitted in the subsequent method embodiments.
[0056] Reference Figure 2 , Figure 2 This is a flowchart illustrating a method for identifying anomalies in waybill data provided in an embodiment of this application. It should be noted that although the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than that shown here. The method for identifying anomalies in waybill data includes steps 201-203, wherein:
[0057] 201. Based on the waybill information of each waybill data under each target information dimension, obtain the waybill dataset to be identified.
[0058] The data set of waybills to be identified includes multiple target waybill information X. ij 1≤i≤n, 1≤j≤d, where n represents the number of waybill data items included in the waybill dataset, d represents the number of information dimensions included in each waybill data item, and the target waybill information X ij This represents the information of the i-th waybill data in the j-th dimension.
[0059] In the logistics industry, to facilitate waybill management, waybill information is typically recorded across various data dimensions. This information includes the waybill itself (name, weight, quantity, sender and recipient address, order time, delivery time, and dates of route changes), customer information (sender and recipient names, phone numbers, customer ratings, monthly settlement details, coupon usage information), and related barcode scanner operation and routing information.
[0060] In some embodiments, a waybill data may specifically refer to a single waybill, and the waybill dataset to be identified includes waybill information for n waybills in d dimensions, with the target waybill information X. ij This represents the waybill information of the i-th waybill data in the j-th dimension. Specifically, the waybill information of each waybill data point under each target information dimension can refer to the waybill information of each waybill among multiple waybills under multiple different target information dimensions. Based on the waybill information of each waybill among multiple waybills under multiple different target information dimensions, the waybill dataset X={X} to be identified can be obtained. ij}: Information on n waybills across d dimensions.
[0061] For example, the data set of waybills to be identified can be shown in Table 1 below. Table 1 shows that a single waybill data set refers to a specific waybill. The data set of waybills to be identified includes the waybill information of four waybills: waybill 1, waybill 2, waybill 3, and waybill 4. Each waybill includes waybill information with multiple target information dimensions, such as waybill information itself, customer information, and routing information.
[0062] Table 1
[0063] Various waybill data / dimensions Dimension 1: Information on the waybill itself Dimension 2: Customer Information Dimension 3: Routing Information Waybill 1 <![CDATA[X 11 ]]> <![CDATA[X 21 ]]> <![CDATA[X 31 ]]> Waybill 2 <![CDATA[X 12 ]]> <![CDATA[X 22 ]]> <![CDATA[X 32 ]]> Waybill 3 <![CDATA[X 13 ]]> <![CDATA[X 23 ]]> <![CDATA[X 33 ]]> Waybill 4 <![CDATA[X 14 ]]> <![CDATA[X 24 ]]> <![CDATA[X 34 ]]>
[0064] By setting the dataset of waybills to be identified as n waybills with d dimensions of waybill information, the target waybill information X is identified in step 202. ij The system can identify abnormal waybills from multiple waybill information and identify waybill information that contains abnormalities in abnormal waybills.
[0065] In some embodiments, a waybill data may also refer to an information profile of a specific waybill, such as the target waybill information X. ij Let X represent the waybill information of the i-th information profile in the j-th dimension. The waybill dataset to be identified includes the waybill information of n information profiles of a waybill in d dimensions. In this case, the waybill information of each waybill data under each target information dimension can specifically be the waybill information of each information profile of a waybill under multiple different target information dimensions. The waybill dataset X={X_i} to be identified can be obtained based on the waybill information of each information profile of a waybill under multiple different target information dimensions. ij}: The information profile of a certain waybill consisting of n information elements in d dimensions.
[0066] Each information profile includes multiple waybill information for that waybill, and one waybill information under each information profile serves as a target information dimension for that information profile.
[0067] For example, the data set of waybills to be identified can be shown in Table 2 below. Table 2 shows an information profile of a specific waybill. The data set of waybills to be identified includes the waybill information in three information profiles: waybill information, customer information, and routing information. Each information profile includes multiple waybill information for that waybill, and one waybill information under each information profile serves as a target information dimension for that information profile. For example, the waybill information itself includes three target information dimensions: name (X11), weight (X21), and quantity (X31).
[0068] Table 2
[0069] Various waybill data / dimensions Dimension 1 Dimension 2 Dimension 3 Image 1: Information on the waybill itself <![CDATA[Name X 11 > <![CDATA[Weight X 21 > <![CDATA[Quantity X 31 > Profile 2: Customer Information <![CDATA[The name of the sender and recipient X 12 > <![CDATA[Mobile phone number X 22 > <![CDATA[Customer rating X 32 > Image 3: Routing Information <![CDATA[X 13 ]]> <![CDATA[X 23 ]]> <![CDATA[X 33 ]]>
[0070] By setting the dataset of waybills to be identified as n information profiles of a certain waybill in d dimensions, the target waybill information X is identified in step 202. ij The outlier value can be used to identify abnormal waybill information from multiple waybill information of a certain waybill, such as an abnormal waybill.
[0071] Understandably, to facilitate subsequent data processing such as outlier detection, the target waybill information X...ij The information of the i-th waybill data in the j-th dimension is quantified or vectorized.
[0072] 202. Based on the dataset of waybills to be identified and the preset anomaly detection strategy, determine the information X of each target waybill in the dataset of waybills to be identified. ij Outliers.
[0073] Among them, the target waybill information X in the j-th dimension ij The outliers are based on multiple target waybill information X in the j-th dimension of the waybill dataset to be identified. 1j X 2j ... X nj Sure.
[0074] The preset anomaly detection strategy is used to detect outliers in each dimension (e.g., the j-th dimension) of the waybill data set to be identified.
[0075] In some embodiments, the preset anomaly detection strategy can be based on box plot-based outlier detection. A box plot, also known as a box-and-whisker plot, is a statistical graph used to display the distribution of a set of data. It is named for its box-like shape. It is frequently used in various fields, commonly in quality management. It is primarily used to reflect the characteristics of the original data distribution and can also be used to compare the distribution characteristics of multiple sets of data. The method for drawing a box plot is as follows: first, find the upper edge, lower edge, median, and two quartiles of a set of data; then, connect the two quartiles to draw the box; finally, connect the upper and lower edges to the box, with the median located in the middle of the box.
[0076] A quartile, also known as an interquartile range, is a statistical unit that divides all data into four equal parts from smallest to largest. It represents the values at the 25th and 75th percentiles of a sorted dataset. Quartiles divide all data into four equal parts, each containing 25% of the total data. The middle quartile is called the median (hereinafter referred to as Q2). Therefore, the quartiles commonly referred to are the values at the 25th percentile (lower quartile, hereinafter referred to as Q1) and the values at the 75th percentile (upper quartile, hereinafter referred to as Q3).
[0077] In addition, the interquartile range (IQR) refers to the difference between the upper quartile and the lower quartile, i.e., IQR = Q3 - Q1.
[0078] Box plots provide a standard for identifying outliers: outliers are defined as values less than (Q1 - 1.5IQR) or greater than (Q3 + 1.5IQR).
[0079] At this point, step 202 may specifically include: based on the preset quartile determination rule and the target waybill information X in the j=1, 2, ..., d dimensions. ij (Including target waybill information X) 1j X 2j ... X nj The upper and lower quartiles of the j-th dimensions (j=1, 2, ..., d) are determined respectively. Based on the upper and lower quartiles of the j-th dimensions (j=1, 2, ..., d), the interquartile range of the j-th dimensions (j=1, 2, ..., d) is calculated. According to the interquartile range of the j-th dimensions (j=1, 2, ..., d) and the preset upper and lower bounds of the outlier, the upper and lower bounds of the outlier are determined. Target waybill information X in the j-th dimensions that is less than the lower bound or greater than the upper bound is selected. ij As outliers in the target waybill information in the j=1, 2, ..., d dimensions; thus obtaining the target waybill information X in the waybill dataset to be identified. ij Outliers.
[0080] The preset formula for the upper bound of anomalies can be (Q3 + 1.5IQR), and the formula for the lower bound of anomalies can be (Q1 - 1.5IQR). It is understood that these formulas are merely examples; in actual applications, other formulas based on the interquartile range can be used. For instance, the upper bound formula could be (Q3 + 1IQR), and the lower bound formula could be (Q1 - 1IQR).
[0081] For example, for the waybill dataset to be identified in Table 1, which includes a first dimension (waybill information), a second dimension (customer information), and a third dimension (routing information), outlier detection based on box plots can be used. First, the upper and lower quartiles of the first dimension (waybill information) can be determined. Then, based on the upper and lower quartiles of the first dimension (waybill information), the interquartile range of the first dimension (waybill information) can be calculated. Finally, based on the first dimension (waybill information)... Using the interquartile range and preset outlier upper bound formulas (Q3 + 1.5IQR) and lower bound formulas (Q1 - 1.5IQR), the upper and lower bounds of the first dimension (waybill information) are calculated. Finally, target waybill information in the first dimension (waybill information) that is less than the lower bound formula (Q1 - 1.5IQR) or greater than the upper bound formula (Q3 + 1.5IQR) is taken as target waybill information X in the first dimension (waybill information). ij Outliers. Similarly, the target waybill information X in the second dimension (customer information) can be calculated. ij Outliers, target waybill information X in the third dimension (routing information) ij The outliers are identified to obtain the target waybill information X in the waybill dataset to be identified. ij Outliers.
[0082] In some embodiments, the preset anomaly detection strategy can also be anomaly detection based on the COPOD algorithm. Anomaly detection based on the COPOD algorithm will be described in detail later; for simplicity, it will not be repeated here.
[0083] 203. Based on the target waybill information X in the data set of waybills to be identified ij If an outlier is detected, the abnormal waybill information of the abnormal waybill data in the waybill dataset to be identified is output.
[0084] Abnormal waybill data refers to waybill data in the dataset of waybills to be identified where the target waybill information in one or more information dimensions is abnormal.
[0085] like Figure 3 As shown, step 203 may specifically include the following steps 301 to 302:
[0086] 301. Based on the target waybill information X in the waybill dataset to be identified... ij The system detects outliers in the data set of waybills to be identified, specifically outlier values greater than a preset threshold for the target waybill information X. ij .
[0087] Among them, abnormal target waybill information Xij This refers to the target waybill information whose abnormal value is greater than the preset threshold identified in step 202.
[0088] The specific value of the preset threshold can be set according to the actual situation. In this embodiment, the specific value of the preset threshold is not limited.
[0089] In some embodiments, a preset threshold can be set for each information dimension, that is, a preset threshold a can be set for the j-th dimension. j For example, a preset threshold a1, a2, a3, ..., ad can be set for each of the 1st, 2nd, 3rd, ..., dth dimensions. Then, based on the preset threshold of the j-th (j=1, 2, ..., d)th dimension and the outliers of the target waybill information in the j-th dimension, abnormal target waybill information X whose outliers in the j-th dimension are greater than the preset threshold is determined. ij Similarly, we can obtain the abnormal target waybill information X for outliers exceeding a preset threshold in the j=1, 2, ..., d dimensions. ij This allows us to obtain information X of abnormal target waybills in the data set to be identified whose outliers exceed a preset threshold. ij .
[0090] In some embodiments, a preset threshold can be set for each of the d information dimensions. Based on the outlier of the target waybill information in the j-th dimension and the preset threshold, abnormal target waybill information X in the j-th dimension whose outlier exceeds the preset threshold can be determined. ij Similarly, we can obtain the abnormal target waybill information X for outliers exceeding a preset threshold in the j=1, 2, ..., d dimensions. ij This allows us to obtain information X of abnormal target waybills in the data set to be identified whose outliers exceed a preset threshold. ij .
[0091] 302. Output the abnormal information of the abnormal waybill data in the data set to be identified.
[0092] The abnormal information includes the abnormal target waybill information X from the abnormal waybill data. ij And / or the abnormal target waybill information X ij The abnormal value, the abnormal target waybill information X ij This is used to indicate that there is an anomaly in the target waybill information of the j-th dimension in the waybill dataset to be identified.
[0093] In some embodiments, waybill data with abnormal information in one or more information dimensions of the waybill data to be identified can be output as abnormal waybill data, so as to quickly and comprehensively reflect the abnormal waybill data in multiple waybill data.
[0094] Furthermore, to more comprehensively reflect the anomalies in waybill data across various information dimensions, while outputting the abnormal waybill data in the to-be-identified waybill dataset, the anomaly information of the abnormal waybill data in the waybill dataset can also be output simultaneously. The anomaly information of the abnormal waybill data can be one or more abnormal target waybill information X that are abnormal within the abnormal waybill data. ij Abnormal target waybill information X from abnormal waybill data ij One or more outliers. This is to quickly and comprehensively reflect the abnormal waybill data in multiple waybill data sets, as well as the abnormality of waybill information in various information dimensions.
[0095] For example, in Table 1, for the four waybill data entries 1, 2, 3, and 4, if waybill 1 exceeds a preset threshold in the first dimension (waybill information itself), then waybill 1 is determined to be abnormal in the first dimension (waybill information itself); and waybill 1 can then be output as abnormal waybill data. Furthermore, the abnormal target waybill information X of waybill 1 in the first dimension (waybill information itself) can also be... 11 This is output as an anomaly information for abnormal waybill data. Furthermore, the abnormal target waybill information X of waybill 1 in the first dimension (waybill information itself) can be further processed. 11 Outliers are output as anomaly information in the abnormal waybill data.
[0096] As can be seen from the above, the embodiments of this application use multiple target waybill information X from the j-th dimension of the waybill dataset to be identified. 1j X 2j ... X nj Determine the target waybill information X in the j-th dimension. ij The outliers are identified to obtain the target waybill information X in the waybill dataset to be identified. ij Outliers; due to the abnormal values of each target waybill information X ij Outliers can, to some extent, reflect whether there are anomalies in the target waybill information across various information dimensions, thus enabling the identification of anomalies based on the target waybill information X in the to-be-identified waybill dataset. ij The system identifies outliers and outputs the anomaly information of the abnormal waybill data in the data set to be identified. Since the anomaly information of the abnormal waybill data reflects the abnormal information dimensions of the abnormal waybill data, through the embodiments of this application, on the one hand, it can automatically and quickly find the abnormal waybill data with anomalies from multiple waybill data included in a large number of waybill datasets; on the other hand, it can also automatically and quickly find the abnormal information dimensions in each information dimension of the abnormal waybill data.
[0097] The following example, using outlier detection based on the COPOD algorithm, illustrates how step 202 determines the information X of each target waybill in the waybill dataset to be identified. ij Outliers.
[0098] COPOD stands for Copula-Based Outlier Detection. The Copula function is a statistical probability function used to model multidimensional cumulative distributions and can be used to effectively model the dependencies between multiple random variables.
[0099] The algorithmic framework for COPOD is as follows:
[0100] Inputs: input data X
[0101] Outputs: Outlier scores O(X)
[0102] 1: for each dimension d do
[0103] 2: Compute left tail ECDFs: (Formula 1)
[0104] 3: Compute right tail ECDFs: (Formula 2)
[0105] 4:Compute the skewness coefficient according to Equation 11.
[0106] 5: end for
[0107] 6: for each i in 1, ...n do
[0108] 7: Compute empirical copulaobservations
[0109] 8: , (Formula 3)
[0110] 9: (Formula 4)
[0111] 10: if b d <0 otherwise (Formula 5)
[0112] 11: Calculate tail probabilities of X i , as follows:
[0113] 12: (Formula 6)
[0114] 13: (Formula 7)
[0115] 14: (Formula 8)
[0116] 15: Outlier Score O(x) i ) = max(p l , p r , p s}
[0117] 16: end for
[0118] 17: Return O(X) = [O(x1),… ,O(x d )] T
[0119] Where, input data X = (X 1i , X 2i , …X di ), i=1,…,n,X dn This refers to the nth data point in the d-th dimension of the input data.
[0120] In rows 2 and 3, formula (1) represents the left-tailed empirical coefficient distribution function in the d-th dimension, and formula (2) represents the right-tailed empirical coefficient distribution function in the d-th dimension.
[0121] Formulas (2) and (3) in lines 8 and 9 indicate the setting , .
[0122] The 10th row represents the empirical Copula observations for skewness correction. Calculation of: skewness value b on the d-th dimension d When <0, the empirical Copula observations for skewness correction equal Otherwise, the empirical Copula observations corrected for skewness. equal The skewness values of the d dimensions constitute the skewness matrix b, where b = [b1, ..., b2]. d The skewness values of all data points in the d-th dimension are the same. The skewness value b in the d-th dimension... dIt can be calculated using the following formula (9).
[0123] Formula (9)
[0124] In formula (9), x i This represents the sample value of the i-th sample dimension in the d-th dimension. b represents the average of all samples along the d-th dimension. i This represents the skewness value of the i-th sample dimension in the d-th dimension, where n represents the number of sample dimensions in the d-th dimension.
[0125] Lines 11 to 14 represent the calculation of data x according to formulas (6), (7), and (8) respectively. i Left-tail empirical coefficient p l Right-tail empirical coefficient p r skewness correction empirical coefficient p s In this context, formula (6) represents the calculation formula for the left tail empirical coefficient, formula (7) represents the calculation formula for the right tail empirical coefficient, and formula (8) represents the calculation formula for the skewness correction empirical coefficient.
[0126] Line 15 represents the output data x. i Left-tail empirical coefficient p l Right-tail empirical coefficient p r skewness correction empirical coefficient p s The maximum value in the data is taken as x. i Abnormal score values.
[0127] Line 17 represents the matrix corresponding to the outlier scores of output X.
[0128] like Figure 4 As shown, step 202 above may specifically include steps 401 to 404:
[0129] 401. Based on the dataset of waybills to be identified and the COPOD algorithm, determine the information X of each target waybill. ij The left-tail empirical coefficient.
[0130] like Figure 5 As shown, step 401 may specifically include steps 501 to 503:
[0131] 501. Based on the data set of waybills to be identified, determine the distribution function of waybill information in the j-th dimension.
[0132] Where 1≤j≤d. Specifically, the waybill information distribution function F for the j=1, 2, ..., d dimensions is calculated respectively. j (x)=P(X i ≤x).
[0133] For example, based on the dataset of waybills to be identified, X={X ij}, respectively, the distribution function F1(x)=P(X) of the waybill information for the j=1, 2, ..., d dimensions can be calculated. i ≤x), F2(x)=P(X) i ≤x), …, F d (x)=P(X i ≤x).
[0134] 502. Based on the waybill information distribution function of the j-th dimension and the COPOD algorithm, determine the left-tail empirical function of the j-th dimension.
[0135] The left-tailed empirical function of the j-th dimension refers to the left-tailed empirical coefficient distribution function of the j-th dimension calculated by the COPOD algorithm based on the above formula (1) and the waybill information distribution function of the j-th dimension.
[0136] Specifically, the dataset of waybills to be identified can be X={X ij As input to the COPOD algorithm, the COPOD algorithm can calculate the left-tail empirical function of the j-th dimension (j=1, 2, ..., d) based on the above formula (1) and the waybill information distribution function of the j-th dimension (j=1, 2, ..., d).
[0137] For example, based on the above formula (1) and the waybill information distribution function F of the d-th dimension. d (x)=P(X i If x ≤ x), then the left-tailed empirical function of the d-th dimension can be determined as follows: .
[0138] 503. Based on the left-tail empirical function of the j-th dimension and the information X of each target waybill ij Determine the information X for each target waybill. ij The left-tail empirical coefficient.
[0139] Specifically, the COPOD algorithm can be used based on the above formulas (1), (3), and (6), according to the left-tail empirical function of the j=1, 2, ..., d dimensions, and the target waybill information X in each of the j=1, 2, ..., d dimensions. ij (Including target waybill information X) 1j X 2j ... X nj The left-tail empirical coefficients of each target waybill information in the j=1, 2, ..., d dimensions are calculated respectively, thus obtaining the target waybill information X in the waybill dataset to be identified. ij The left-tail empirical coefficient.
[0140] For example, for the waybill dataset to be identified in Table 1, which includes the first dimension (waybill information), the second dimension (customer information), and the third dimension (routing information), firstly, the left-tailed empirical function of the first dimension (waybill information) can be determined using the above formula (1) in the COPOD algorithm. The second dimension (customer information) has the following left-tail empirical function: The third dimension (routing information) left-tail empirical function is: .
[0141] Then, based on the above formulas (1), (3) and (6) using the COPOD algorithm, that is, according to the left tail empirical function of the first dimension (the information of the waybill itself). Formulas (3) and (6) can be used to determine the information X of each target waybill in the first dimension. 11 X 12 X 13 X 14 The left-tailed empirical coefficients are denoted as p. l 11. p l 12. p l 13. p l 14. Based on the second dimension (customer information), the left-tail empirical function Formulas (3) and (6) can be used to determine the information X of each target waybill in the second dimension. 21 X 22 X 23 X 24 The left-tailed empirical coefficients are denoted as p. l 21. p l 22. p l 23. p l 24. Based on the third dimension (routing information), the left-tail empirical function Formulas (3) and (6) can determine the target waybill information X in the third dimension. 31 X 32 X 33 X 34 The left-tailed empirical coefficients are denoted as p. l 31. p l 32. p l 33. p l 34. This allows us to obtain the information X of each target waybill in the data set to be identified. ij The left-tail empirical coefficients are shown in Table 3 below.
[0142] Table 3
[0143] Various waybill data / dimensions Dimension 1: Information on the waybill itself Dimension 2: Customer Information Dimension 3: Routing Information Waybill 1 <![CDATA[p l 11]]> <![CDATA[P l 21]]> <![CDATA[p l 31]]> Waybill 2 <![CDATA[p l 12]]> <![CDATA[P l 22]]> <![CDATA[p l 32]]> Waybill 3 <![CDATA[p l 13]]> <![CDATA[P l 23]]> <![CDATA[p l 33]]> Waybill 4 <![CDATA[p l 14]]> <![CDATA[P l 24]]> <![CDATA[p l 34]]>
[0144] 402. Based on the dataset of waybills to be identified and the COPOD algorithm, determine the information X of each target waybill. ij The right-tail empirical coefficient.
[0145] like Figure 6 As shown, step 402 may specifically include steps 601 to 602:
[0146] 601. Based on the waybill information distribution function of the j-th dimension and the COPOD algorithm, determine the right-tail empirical function of the j-th dimension.
[0147] The right-tailed empirical function of the j-th dimension refers to the right-tailed empirical coefficient distribution function of the j-th dimension calculated by the COPOD algorithm based on the above formula (2) and the waybill information distribution function of the j-th dimension.
[0148] Specifically, the dataset of waybills to be identified can be X={X ij As input to the COPOD algorithm, the COPOD algorithm can calculate the right-tail empirical function of the j-th dimension (j=1, 2, ..., d) based on the above formula (2) and the waybill information distribution function of the j-th dimension (j=1, 2, ..., d).
[0149] For example, based on the above formula (2) and the waybill information distribution function F of the d-th dimension. d (x)=P(X i If x ≤ x), then the right-tailed empirical function of the d-th dimension can be determined as follows: .
[0150] 602. Based on the right-tailed empirical function of the j-th dimension and the information X of each target waybill ij Determine the information X for each target waybill. ij The right-tail empirical coefficient.
[0151] Specifically, the COPOD algorithm can be used based on the above formulas (2), (4), and (7), according to the right-tail empirical function of the j=1, 2, ..., d dimensions, and the target waybill information X in each of the j=1, 2, ..., d dimensions. ij (Including target waybill information X) 1j X 2j ... X nj The right-tail empirical coefficients of each target waybill information in the j=1, 2, ..., d dimensions are calculated respectively, thus obtaining the target waybill information X in the waybill dataset to be identified. ij The right-tail empirical coefficient.
[0152] For example, for the waybill dataset to be identified in Table 1, which includes the first dimension (waybill information), the second dimension (customer information), and the third dimension (routing information), firstly, the right-tailed empirical function of the first dimension (waybill information) can be determined using the above formula (2) in the COPOD algorithm. The right-tailed empirical function for the second dimension (customer information) is: The third dimension (routing information) right-tail empirical function is: .
[0153] Then, based on the above formulas (2), (4) and (7) using the COPOD algorithm, that is, according to the right tail empirical function of the first dimension (the information of the waybill itself). Formulas (4) and (7) can be used to determine the information X of each target waybill in the first dimension. 11 X 12 X 13 X 14 The right-tailed empirical coefficients are denoted as p. r 11. p r 12. p r 13. p r 14. Based on the right-tail empirical function of the second dimension (customer information) Formulas (4) and (7) can be used to determine the information X of each target waybill in the second dimension. 21 X 22 X 23 X 24 The right-tailed empirical coefficients are denoted as p. r 21. p r 22. p r 23. p r 24. Right-tail empirical function based on the third dimension (routing information) Formulas (4) and (7) can be used to determine the information X of each target waybill in the third dimension. 31 X 32 X 33 X 34 The right-tailed empirical coefficients are denoted as p. r 31. p r 32. p r 33. p r 34. This allows us to obtain the information X of each target waybill in the data set to be identified. ij The right-tail empirical coefficients are shown in Table 4 below.
[0154] Table 4
[0155] Various waybill data / dimensions Dimension 1: Information on the waybill itself Dimension 2: Customer Information Dimension 3: Routing Information Waybill 1 <![CDATA[p r 11]]> <![CDATA[P r 21]]> <![CDATA[p r 31]]> Waybill 2 <![CDATA[p r 12]]> <![CDATA[P r 22]]> <![CDATA[p r 32]]> Waybill 3 <![CDATA[p r 13]]> <![CDATA[P r 23]]> <![CDATA[p r 33]]> Waybill 4 <![CDATA[p r 14]]> <![CDATA[P r 24]]> <![CDATA[p r 34]]>
[0156] 403. Based on the dataset of waybills to be identified and the information X of each target waybill ij Left-tail empirical coefficient and information X of each target waybill ij The right-tail empirical coefficient is used to determine the information X of each target waybill. ij The skewness correction factor.
[0157] Step 403 may specifically include steps c1 to c2:
[0158] c1. Based on the dataset of waybills to be identified and the COPOD algorithm, calculate the information X of each target waybill. ij The skewness value.
[0159] Specifically, the target waybill information X in each of the j=1, 2, ..., d dimensions can be calculated according to the above formula (9). ij (Including target waybill information X) 1j X 2j ... X nj The skewness value.
[0160] For example, the dataset of waybills to be identified in Table 1 includes a first dimension (waybill information itself), a second dimension (customer information), and a third dimension (routing information). According to the above formula (9), the information X of each target waybill in the first dimension can be calculated. 11 X 12 X 13 X 14 The skewness values are denoted as b. 11 b 12 b 13 b 14 .
[0161] According to the above formula (9), the information X of each target waybill in the second dimension can be calculated. 21 X 22 X 23 X 24 The skewness values are denoted as b. 21 b 22 b 23 b 24 .
[0162] According to the above formula (9), the information X of each target waybill in the third dimension can be calculated. 31 X 32 X 33 X 34 The skewness values are denoted as b. 31 b 32 b 33 b 34 .
[0163] This allows us to obtain the information X of each target waybill in the data set of waybills to be identified. ij The skewness values are shown in Table 5 below.
[0164] Table 5
[0165] Various waybill data / dimensions Dimension 1: Information on the waybill itself Dimension 2: Customer Information Dimension 3: Routing Information Waybill 1 <![CDATA[b 11 ]]> <![CDATA[b 21 ]]> <![CDATA[b 31 ]]> Waybill 2 <![CDATA[b 12 ]]> <![CDATA[b 22 ]]> <![CDATA[b 32 ]]> Waybill 3 <![CDATA[b 13 ]]> <![CDATA[b 23 ]]> <![CDATA[b 33 ]]> Waybill 4 <![CDATA[b 14 ]]> <![CDATA[b 24 ]]> <![CDATA[b 34 ]]>
[0166] c2. Based on the target waybill information X ij Left-tail empirical coefficient, information on each target waybill X ij The right-tail empirical coefficient and the information X of each target waybill ij The skewness value is used to determine the information X of each target waybill. ij The skewness correction factor.
[0167] Specifically, the COPOD algorithm can be used based on the above formulas (3), (4), (5), and (8) to determine the target waybill information X in each of the j=1, 2, ..., d dimensions. ij (Including target waybill information X) 1j X 2j ... X nj ), and the target waybill information X in the j=1, 2, ..., d dimensions. ij The skewness value is used to calculate the skewness correction coefficient for each target waybill information in the j=1, 2, ..., d dimensions; thus, the target waybill information X in the waybill dataset to be identified is obtained. ij The skewness correction factor.
[0168] To facilitate understanding, we will continue with the example from step c1. For instance, using the formulas (3), (4), (5), and (8) above in the COPOD algorithm, based on the target waybill information X in the first dimension shown in Table 1... 11 X 12 X 13 X 14 And the target waybill information X in the first dimension shown in Table 5. 11 X 12 X 13 X 14 The skewness value can determine the information of each target waybill X in the first dimension. 11 X 12 X 13 X 14 The skewness correction coefficients are denoted as p. s 11. p s 12. p s 13. p s 14.
[0169] Based on the target waybill information X in the second dimension shown in Table 1 21 X 22 X 23 X 24 And the information X for each target waybill in the second dimension shown in Table 5. 21 X 22 X 23 X 24 The skewness value can determine the information X of each target waybill in the second dimension. 21 X 22 X 23 X 24 The skewness correction coefficients are denoted as p. s 21. p s 22. p s 23. p s twenty four.
[0170] Based on the target waybill information X in the third dimension shown in Table 1 31 X 32 X 33 X 34 And the target waybill information X in the third dimension shown in Table 5. 31 X 32 X 33 X 34 The skewness value can determine the information X of each target waybill in the third dimension. 31 X 32 X 33 X 34 The skewness correction coefficients are denoted as p. s 31. p s 32. p s 33. p s 34. This allows us to obtain the information X of each target waybill in the data set to be identified. ij The skewness correction coefficients are shown in Table 6 below.
[0171] Table 6
[0172] Various waybill data / dimensions Dimension 1: Information on the waybill itself Dimension 2: Customer Information Dimension 3: Routing Information Waybill 1 <![CDATA[p s 11]]> <![CDATA[P s 21]]> <![CDATA[p s 31]]> Waybill 2 <![CDATA[p s 12]]> <![CDATA[P s 22]]> <![CDATA[p s 32]]> Waybill 3 <![CDATA[p s 13]]> <![CDATA[P s 23]]> <![CDATA[p s 33]]> Waybill 4 <![CDATA[p s 14]]> <![CDATA[P s 24]]> <![CDATA[p s 34]]>
[0173] 404. Based on the information of each target waybill X ij The left-tail empirical coefficient, the right-tail empirical coefficient, and the skewness correction coefficient are used to determine the information X of each target waybill. ij Outliers.
[0174] Step 404 may specifically include: detecting the information X of each target waybill. ijThe maximum value among the left-tail empirical coefficient, the right-tail empirical coefficient, and the skewness correction coefficient is used as the value of each target waybill information X. ij Outliers.
[0175] Specifically, the target waybill information X ij The maximum value among the left-tail empirical coefficient, right-tail empirical coefficient, and skewness correction coefficient is used as the target waybill information X. ij Outliers.
[0176] For example, for the data set of waybills to be identified in Table 1, X={X ij}={X 11 X 12 X 13 X 14 X 21 X 22 X 23 X 24 X 31 X 32 X 33 X 34} can determine the information X of each target waybill in the waybill dataset to be identified. ij The left-tail empirical coefficients are shown in Table 3, and the information X of each target waybill is as follows. ij The right-tail empirical coefficients are shown in Table 4, and the information X of each target waybill is as follows. ij The skewness correction coefficients are shown in Table 6, which can be used to correct the skewness of the target waybill information X. 11 Left-tail empirical coefficient p l 11. Right-tail empirical coefficient p r 11 and skewness correction factor p s The maximum value in 11 (e.g., the maximum value p) s 11) As the target waybill information X 11 Outliers; target waybill information X 12 Left-tail empirical coefficient p l 12. Right-tail empirical coefficient p r 12 and skewness correction factor p s The maximum value in 12 (e.g., the maximum value p) s 12) As the target waybill information X 12 Outliers; ...; Target waybill information X 34 Left-tail empirical coefficient p l 34. Right-tail empirical coefficient p r 34 and skewness correction factor p s The maximum value in 34 (e.g., the maximum value p) s 34), as the target waybill information X 34 By identifying outliers, information X for each target waybill can be obtained. ijThe outliers are shown in Table 7.
[0177] Table 7
[0178] Various waybill data / dimensions Dimension 1: Information on the waybill itself Dimension 2: Customer Information Dimension 3: Routing Information Waybill 1 <![CDATA[p s 11]]> <![CDATA[P l 21]]> <![CDATA[p r 31]]> Waybill 2 <![CDATA[p l 12]]> <![CDATA[P s 22]]> <![CDATA[p r 32]]> Waybill 3 <![CDATA[p r 13]]> <![CDATA[P l 23]]> <![CDATA[p l 33]]> Waybill 4 <![CDATA[p r 14]]> <![CDATA[P r 24]]> <![CDATA[p s 34]]>
[0179] In this embodiment, the COPOD algorithm is used to perform calculations on a large dataset of waybills to be identified. Firstly, because the COPOD algorithm does not require any distance calculations between samples, its runtime overhead is low and its speed is fast, allowing even low-performance machines to meet the operational requirements of this waybill data anomaly identification method. Secondly, since the COPOD algorithm does not require model parameter tuning and only requires direct function calls, its algorithm complexity is low and it is relatively easy to execute, thus facilitating anomaly identification on large datasets of waybills to be identified. Thirdly, compared with mainstream anomaly detection algorithms LOF and Isolate Forest, the COPOD algorithm performs better and can detect anomalous data in the fastest time.
[0180] Therefore, by using the COPOD algorithm to identify anomalies in the waybill dataset and outputting the anomaly information of the abnormal waybill data in the dataset, waybill data can be monitored and identified more quickly and intelligently in the logistics field, and an early warning can be given for abnormal waybills and abnormal data.
[0181] To better implement the method for identifying waybill data anomalies in the embodiments of this application, based on the method for identifying waybill data anomalies, the embodiments of this application also provide a device for identifying waybill data anomalies, such as... Figure 7 The diagram shown is a schematic representation of an embodiment of the waybill data anomaly identification device 700 in this application. The waybill data anomaly identification device 700 includes:
[0182] The acquisition unit 701 is used to acquire a dataset of waybills to be identified based on the waybill information of each waybill data under each target information dimension, wherein the dataset of waybills to be identified includes multiple target waybill information X. ij 1≤i≤n, 1≤j≤d, where n represents the number of waybill data items included in the waybill dataset, d represents the number of information dimensions included in each waybill data item, and the target waybill information X ij This represents the information of the i-th waybill data in the j-th dimension;
[0183] Detection unit 702 is used to determine the information X of each target waybill in the waybill dataset to be identified based on the waybill dataset to be identified and a preset anomaly detection strategy. ij Outliers, where the target waybill information X in the j-th dimension. ij Outliers are based on multiple target waybill information X in the j-th dimension.1j X 2j ... X nj Sure;
[0184] Output unit 703 is used to output information X of each target waybill in the waybill dataset to be identified. ij The system identifies outliers and outputs anomaly information for the abnormal waybill data in the dataset to be identified. The abnormal waybill data refers to waybill data where the target waybill information in one or more information dimensions is abnormal. The anomaly information is used to reflect the abnormal information dimensions of the abnormal waybill data.
[0185] In one possible implementation of this application, the preset anomaly detection strategy is the COPOD algorithm, and the detection unit 702 is specifically used for:
[0186] Based on the dataset of waybills to be identified and the COPOD algorithm, determine the left-tail empirical coefficient of each target waybill information Xij;
[0187] Based on the dataset of waybills to be identified and the COPOD algorithm, determine the right-tail empirical coefficient of each target waybill information Xij;
[0188] Based on the data set of waybills to be identified, the left-tail empirical coefficient of each target waybill information Xij, and the right-tail empirical coefficient of each target waybill information Xij, determine the skewness correction coefficient of each target waybill information Xij.
[0189] The outliers of each target waybill information Xij are determined based on the left-tail empirical coefficient, the right-tail empirical coefficient, and the skewness correction coefficient.
[0190] In one possible implementation of this application, the detection unit 702 is specifically used for:
[0191] Based on the data set of waybills to be identified, determine the distribution function of waybill information in the j-th dimension;
[0192] Based on the waybill information distribution function of the j-th dimension and the COPOD algorithm, determine the left-tail empirical function of the j-th dimension;
[0193] Based on the left-tail empirical function of the j-th dimension and the target waybill information Xij, determine the left-tail empirical coefficient of each target waybill information Xij.
[0194] In one possible implementation of this application, the detection unit 702 is specifically used for:
[0195] Based on the waybill information distribution function of the j-th dimension and the COPOD algorithm, determine the right-tailed empirical function of the j-th dimension;
[0196] Based on the right-tailed empirical function of the j-th dimension and each target waybill information Xij, determine the right-tailed empirical coefficient of each target waybill information Xij.
[0197] In one possible implementation of this application, the detection unit 702 is specifically used for:
[0198] Based on the dataset of waybills to be identified and the COPOD algorithm, calculate the skewness value of each target waybill information Xij;
[0199] Based on the left-tail empirical coefficient, the right-tail empirical coefficient, and the skewness value of each target waybill information Xij, determine the skewness correction coefficient for each target waybill information Xij.
[0200] In one possible implementation of this application, the detection unit 702 is specifically used for:
[0201] The maximum value among the left-tail empirical coefficient, the right-tail empirical coefficient, and the skewness correction coefficient for each target waybill information Xij is detected.
[0202] The maximum value is taken as the outlier value of each target waybill information Xij.
[0203] In one possible implementation of this application, the output unit 703 is specifically used for:
[0204] Based on the outlier values of each target waybill information Xij in the waybill dataset to be identified, detect the abnormal target waybill information Xij whose outlier values in the waybill dataset to be identified are greater than a preset threshold.
[0205] Output the abnormal information of abnormal waybill data in the data set to be identified, wherein the abnormal information includes the abnormal target waybill information Xij and / or the abnormal value of the abnormal target waybill information Xij, and the abnormal target waybill information Xij is used to indicate that there is an abnormality in the target waybill information of the j-th dimension in the data set to be identified.
[0206] In practice, each of the above units can be implemented as an independent entity or can be arbitrarily combined to be implemented as the same or several entities. For the specific implementation of each of the above units, please refer to the previous method embodiments, which will not be repeated here.
[0207] Because the device for identifying abnormal waybill data can perform the functions described in this application, Figures 1 to 6 Corresponding to the steps in the method for identifying abnormal waybill data in any embodiment, this application can achieve the following: Figures 1 to 6For details on the beneficial effects that the method for identifying abnormal waybill data in any embodiment can achieve, please refer to the preceding description, which will not be repeated here.
[0208] Furthermore, to better implement the method for identifying abnormal waybill data in the embodiments of this application, based on the method for identifying abnormal waybill data, the embodiments of this application also provide an electronic device, see below. Figure 8 , Figure 8 This illustration shows a structural diagram of an electronic device according to an embodiment of this application. Specifically, the electronic device provided in this embodiment includes a processor 801, which executes a computer program stored in a memory 802 to implement, for example... Figures 1 to 6 The steps of the method for identifying abnormal waybill data in any embodiment correspond to the following; or, when the processor 801 executes the computer program stored in the memory 802, it implements the following: Figure 7 The functions of each unit in the corresponding embodiment.
[0209] For example, a computer program may be divided into one or more modules / units, one or more of which are stored in memory 802 and executed by processor 801 to complete the embodiments of this application. One or more modules / units may be a series of computer program instruction segments capable of performing a specific function, which describe the execution process of the computer program in a computer device.
[0210] The electronic device may include, but is not limited to, processor 801 and memory 802. Those skilled in the art will understand that the illustrations are merely examples of an electronic device and do not constitute a limitation on the electronic device. It may include more or fewer components than illustrated, or combine certain components, or different components. For example, the electronic device may also include input / output devices, network access devices, buses, etc., with processor 801, memory 802, input / output devices, and network access devices connected via a bus.
[0211] The processor 801 can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor. The processor is the control center of the electronic device, connecting various parts of the electronic device through various interfaces and lines.
[0212] The memory 802 can be used to store computer programs and / or modules. The processor 801 implements various functions of the computer device by running or executing the computer programs and / or modules stored in the memory 802 and by calling data stored in the memory 802. The memory 802 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, application programs required for at least one function (such as sound playback function, image playback function, etc.), etc.; the data storage area may store data created according to the use of the electronic device (such as audio data, video data, etc.). In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as hard disk, RAM, plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, at least one disk storage device, flash memory device, or other volatile solid-state storage device.
[0213] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the above-described waybill data anomaly identification device, electronic equipment, and its corresponding units can be found in, for example... Figures 1 to 6 The description of the method for identifying abnormal waybill data in any embodiment will not be repeated here.
[0214] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be performed by instructions, or by instructions controlling related hardware. These instructions can be stored in a computer-readable storage medium and loaded and executed by a processor.
[0215] Therefore, embodiments of this application provide a computer-readable storage medium storing a plurality of instructions that can be loaded by a processor to execute the present application. Figures 1 to 6 For the steps in the method for identifying abnormal waybill data in any embodiment, please refer to the following for specific operations: Figures 1 to 6 The description of the method for identifying abnormal waybill data in any embodiment will not be repeated here.
[0216] The computer-readable storage medium may include: read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.
[0217] Because of the instructions stored in the computer-readable storage medium, the present application can be executed as described above. Figures 1 to 6Corresponding to the steps in the method for identifying abnormal waybill data in any embodiment, this application can achieve the following: Figures 1 to 6 For details on the beneficial effects that the method for identifying abnormal waybill data in any embodiment can achieve, please refer to the preceding description, which will not be repeated here.
[0218] The foregoing has provided a detailed description of a method, apparatus, electronic device, and computer-readable storage medium for identifying abnormal waybill data according to embodiments of this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the embodiments above are only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. A method for identifying anomalies in waybill data, characterized in that, The method includes: Based on the waybill information of each waybill data under each target information dimension, a waybill dataset to be identified is obtained, wherein the waybill dataset to be identified includes multiple target waybill information X. ij 1≤i≤n, 1≤j≤d, where n represents the number of waybill data items included in the waybill dataset, d represents the number of information dimensions included in each waybill data item, d≥2, and the target waybill information X ij This represents the information of the i-th waybill data in the j-th dimension; Based on the dataset of waybills to be identified and the preset anomaly detection strategy, determine the information X of each target waybill in the dataset of waybills to be identified. ij Outliers, where the target waybill information X in the j-th dimension. ij Outliers are based on multiple target waybill information X in the j-th dimension. 1j X 2j ... X nj Sure; Based on the target waybill information X in the waybill dataset to be identified ij The abnormal values are identified, and the abnormal information of the abnormal waybill data in the data set to be identified is output. The abnormal waybill data refers to the waybill data with abnormal target waybill information. The abnormal information is used to reflect the abnormal information dimension of the abnormal waybill data. The preset anomaly detection strategy is the COPOD algorithm. The step involves determining the target waybill information X in the waybill dataset based on the waybill dataset to be identified and the preset anomaly detection strategy. ij Outliers include: Based on the dataset of waybills to be identified and the COPOD algorithm, determine the information X of each target waybill. ij The left-tail empirical coefficient; Based on the dataset of waybills to be identified and the COPOD algorithm, determine the information X of each target waybill. ij The right-tail empirical coefficient; Based on the dataset of waybills to be identified and the information X of each target waybill ij Left-tail empirical coefficient and information X of each target waybill ij The right-tail empirical coefficient is used to determine the information X of each target waybill. ij skewness correction factor; Detect information on each target waybill X ij The maximum value among the left-tail empirical coefficient, the right-tail empirical coefficient, and the skewness correction coefficient is used as the value of each target waybill information X. ij Outliers.
2. The method for identifying abnormal waybill data according to claim 1, characterized in that, The step involves determining the information X of each target waybill based on the dataset of waybills to be identified and the COPOD algorithm. ij The left-tailed empirical coefficients include: Based on the data set of waybills to be identified, determine the distribution function of waybill information in the j-th dimension; Based on the waybill information distribution function of the j-th dimension and the COPOD algorithm, determine the left-tail empirical function of the j-th dimension; Based on the left-tail empirical function of the j-th dimension and the information X of each target waybill ij Determine the information X for each target waybill. ij The left-tail empirical coefficient.
3. The method for identifying abnormal waybill data according to claim 2, characterized in that, The step involves determining the information X of each target waybill based on the dataset of waybills to be identified and the COPOD algorithm. ij The right-tailed empirical coefficients include: Based on the waybill information distribution function of the j-th dimension and the COPOD algorithm, determine the right-tailed empirical function of the j-th dimension; Based on the right-tail empirical function of the j-th dimension and the information X of each target waybill ij Determine the information X for each target waybill. ij The right-tail empirical coefficient.
4. The method for identifying abnormal waybill data according to claim 1, characterized in that, The step involves using the dataset of waybills to be identified and the information X of each target waybill. ij Left-tail empirical coefficient and information X of each target waybill ij The right-tail empirical coefficient is used to determine the information X of each target waybill. ij The skewness correction factors include: Based on the dataset of waybills to be identified and the COPOD algorithm, calculate the information X for each target waybill. ij skewness value; Based on the information of each target waybill X ij Left-tail empirical coefficient, information on each target waybill X ij The right-tail empirical coefficient and the information X of each target waybill ij The skewness value is used to determine the information X of each target waybill. ij The skewness correction factor.
5. The method for identifying abnormal waybill data according to any one of claims 1-4, characterized in that, The step involves using the target waybill information X from the waybill dataset to be identified. ij The system identifies outliers and outputs the anomaly information of the abnormal waybill data in the to-be-identified waybill dataset, including: Based on the target waybill information X in the waybill dataset to be identified ij The system detects outliers in the data set of waybills to be identified, specifically outlier values greater than a preset threshold for the target waybill information X. ij ; Output the anomaly information of the abnormal waybill data in the dataset to be identified, wherein the anomaly information includes the abnormal target waybill information X of the abnormal waybill data. ij And / or the abnormal target waybill information X ij The abnormal value, the abnormal target waybill information X ij This is used to indicate that there is an anomaly in the target waybill information of the j-th dimension in the waybill dataset to be identified.
6. A device for identifying abnormal waybill data, characterized in that, The device for identifying abnormal waybill data includes: The acquisition unit is used to acquire a dataset of waybills to be identified based on the waybill information of each waybill data under each target information dimension. The dataset of waybills to be identified includes multiple target waybill information X. ij 1≤i≤n, 1≤j≤d, where n represents the number of waybill data items included in the waybill dataset, d represents the number of information dimensions included in each waybill data item, d≥2, and the target waybill information X ij This represents the information of the i-th waybill data in the j-th dimension; The detection unit is used to determine the information X of each target waybill in the waybill dataset to be identified based on the dataset to be identified and a preset anomaly detection strategy. ij Outliers, where the target waybill information X in the j-th dimension. ij Outliers are based on multiple target waybill information X in the j-th dimension. 1j X 2j ... X nj Sure; The output unit is used to output information X of each target waybill in the data set of waybills to be identified. ij The abnormal values are identified, and the abnormal information of the abnormal waybill data in the data set to be identified is output. The abnormal waybill data refers to the waybill data with abnormal target waybill information. The abnormal information is used to reflect the abnormal information dimension of the abnormal waybill data. The preset anomaly detection strategy is the COPOD algorithm, and the detection unit is specifically used for: Based on the dataset of waybills to be identified and the COPOD algorithm, determine the information X of each target waybill. ij The left-tail empirical coefficient; Based on the dataset of waybills to be identified and the COPOD algorithm, determine the information X of each target waybill. ij The right-tail empirical coefficient; Based on the dataset of waybills to be identified and the information X of each target waybill ij Left-tail empirical coefficient and information X of each target waybill ij The right-tail empirical coefficient is used to determine the information X of each target waybill. ij skewness correction factor; Detect information on each target waybill X ij The maximum value among the left-tail empirical coefficient, the right-tail empirical coefficient, and the skewness correction coefficient is used as the value of each target waybill information X. ij Outliers.
7. An electronic device, characterized in that, It includes a processor and a memory, wherein the memory stores a computer program, and when the processor invokes the computer program in the memory, it executes the method for identifying waybill data anomalies as described in any one of claims 1 to 5.
8. A computer-readable storage medium, characterized in that, It stores a computer program, which is loaded by a processor to perform the steps in the method for identifying waybill data anomalies as described in any one of claims 1 to 5.