An abnormal user identification method and device based on time series data
By converting user transaction data into a time sliding window matrix and decomposing it into a trend matrix, and combining convolutional neural networks and autoencoders for processing, the efficiency and accuracy problems of identifying abnormal users in existing technologies are solved, and efficient abnormal user identification is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INDUSTRIAL AND COMMERCIAL BANK OF CHINA
- Filing Date
- 2022-07-22
- Publication Date
- 2026-06-12
AI Technical Summary
When identifying abnormal users, existing machine learning techniques suffer from several drawbacks. Traditional methods lose temporal information, while graph neural networks require a large amount of computation, resulting in poor identification performance and huge resource consumption.
By acquiring multidimensional transaction data from users over multiple set time periods, converting it into a time sliding window matrix and decomposing it into a trend matrix, and then using convolutional neural networks and autoencoders to process the data, the system identifies whether a user is normal or abnormal.
It achieves more accurate and efficient identification of abnormal users, reduces computing resource consumption, and improves identification accuracy.
Smart Images

Figure CN115170318B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the financial field, and in particular to a method and apparatus for identifying abnormal users based on time series data. Background Technology
[0002] With the rapid development of the internet and telecommunications industries, illegal and criminal activities are becoming increasingly serious. Meanwhile, with the development of big data technology, historical transaction data of customers is easily obtained, such as daily transaction amounts, number of transactions, inflows, outflows, and the number of inflows and outflows. However, labeling these customers with information, such as manually determining whether they are involved in illegal or criminal activities, requires a significant amount of manpower and expertise, making it extremely costly. Therefore, building a powerful learning system using machine learning technology is highly valuable for accurately identifying and predicting abnormal users involved in illegal and criminal activities.
[0003] However, current mainstream machine learning techniques still have shortcomings in identifying abnormal users, mainly in two aspects. First, traditional machine learning techniques, through feature engineering, construct attribute features using customer attribute data and then aggregate transaction data to construct behavioral features to characterize customers. However, when aggregating (such as summation, averaging, variance, and extrema) transaction data, a large amount of temporal information is lost (for example, the mean of mean-type features may be the same for sequences from largest to smallest and sequences from smallest to largest), resulting in poor model performance. Second, while graph neural network-based techniques can effectively consider the relationships between customers, such as transaction relationships, their drawback is the extremely high computational cost, especially in the financial field, where the number of customers is huge, and transactions between customers are often multiple and directed. Therefore, constructing a transaction graph between customers often consumes enormous resources and places high demands on computational performance, thus exhibiting certain limitations.
[0004] Therefore, there is an urgent need for an abnormal user identification method based on time series data, which can identify abnormal users more accurately and efficiently. Summary of the Invention
[0005] The purpose of this embodiment is to provide an abnormal user identification method and apparatus based on time series data, so as to identify abnormal users more accurately and efficiently.
[0006] To achieve the above objectives, this embodiment provides a method for identifying abnormal users based on time-series data, including:
[0007] Acquire multiple sets of multidimensional transaction data from users over several consecutive set time periods, with each set time period corresponding to a set of multidimensional transaction data;
[0008] The multiple sets of multidimensional transaction data are converted into matrix form to obtain a time sliding window matrix; each element in the time sliding window matrix represents a set of multidimensional transaction data.
[0009] The time sliding window matrix is decomposed to obtain a trend matrix; the trend matrix contains data that conforms to the trend from multiple sets of multidimensional trading data.
[0010] Based on the trend matrix, a series of consecutive scores corresponding to the user in multiple consecutive set time periods are obtained;
[0011] The consecutive scores are compared with a set threshold, and the user is identified as a normal user or an abnormal user based on the comparison results.
[0012] Preferably, before decomposing the time sliding window matrix to obtain the trend matrix, the method further includes:
[0013] The time sliding window matrix is nonlinearly transformed using a two-dimensional convolutional neural network.
[0014] Preferably, obtaining the user's consecutive scores for multiple consecutive set time periods based on the trend matrix further includes:
[0015] The trend matrix is converted into a trend sequence, where each element of the trend sequence represents data that conforms to the trend from a set of multidimensional trading data.
[0016] The trend sequence is subjected to a nonlinear transformation using a one-dimensional convolutional neural network.
[0017] The trend sequence after nonlinear transformation is calculated to obtain multiple scores corresponding to the user in multiple set time periods.
[0018] Preferably, the step of calculating the trend sequence after nonlinear transformation to obtain multiple scores corresponding to the user in multiple set time periods further includes:
[0019] The following formula is used to calculate the user's scores for different time periods:
[0020]
[0021] Among them, score i This represents the user's score within the i-th defined time period. This refers to the data that conforms to the trend in the multidimensional trading data corresponding to the i-th set time period in the trend sequence after nonlinear transformation. for The L2 norm.
[0022] Preferably, the step of decomposing the time sliding window matrix to obtain the trend matrix further includes:
[0023] The trend matrix is obtained by decomposing the time sliding window matrix using an autoencoder.
[0024] Preferably, the step of decomposing the time sliding window matrix into a trend matrix using an autoencoder further includes:
[0025] The time sliding window matrix is decomposed into an initial trend matrix and an initial deviation matrix, wherein the initial deviation matrix contains data that deviates from the trend from multiple sets of multidimensional trading data.
[0026] The initial trend matrix is encoded using the encoding function of an automatic encoder to obtain the encoded initial trend matrix;
[0027] The encoded initial trend matrix is decoded using the decoding function of an autoencoder to obtain a reconstructed matrix;
[0028] The initial trend matrix, the reconstructed matrix, and the initial deviation matrix are trained using the objective function of the autoencoder to obtain the trend matrix.
[0029] Preferably, the step of converting the multiple sets of multidimensional transaction data into matrix form to obtain a time sliding window matrix further includes:
[0030] Construct time series of multiple sets of multidimensional transaction data, wherein each element of the time series represents a set of multidimensional transaction data;
[0031] A time sliding window matrix is constructed iteratively according to a set time window.
[0032] On the other hand, this embodiment provides an abnormal user identification device based on time series data, the device comprising:
[0033] The acquisition module is used to acquire multiple sets of multidimensional transaction data of the user within several consecutive set time periods, where each set time period corresponds to a set of multidimensional transaction data.
[0034] The conversion module is used to convert the multiple sets of multidimensional transaction data into matrix form to obtain a time sliding window matrix; each element in the time sliding window matrix represents a set of multidimensional transaction data.
[0035] The decomposition module is used to decompose the time sliding window matrix into a trend matrix; the trend matrix contains data that conforms to the trend from multiple sets of multidimensional trading data.
[0036] The determining module is used to obtain multiple consecutive scores corresponding to a user in multiple consecutive set time periods based on the trend matrix.
[0037] The identification module is used to compare the consecutive scores with a set threshold, and identify the user as a normal user or an abnormal user based on the comparison results.
[0038] In another aspect, embodiments of this document also provide a computer device, including a memory, a processor, and a computer program stored in the memory, wherein the computer program, when executed by the processor, performs instructions of any of the methods described above.
[0039] In another aspect, the embodiments herein also provide a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor of a computer device, performs instructions for any of the methods described above.
[0040] As can be seen from the technical solutions provided in the embodiments above, these embodiments acquire multiple sets of multidimensional transaction data from a user over several consecutive set time periods, convert these data into a matrix form to obtain a time sliding window matrix, and then decompose the time sliding window matrix to obtain a trend matrix containing only data that conforms to the trend. Based on the trend matrix, multiple consecutive scores corresponding to the user over several consecutive set time periods can be obtained. By comparing these multiple consecutive scores with a set threshold, the user can be identified as a normal user or an abnormal user. Because this application reasonably acquires multiple set time periods and the trend matrix contains only data that conforms to the trend, the abnormal user identification method based on this application can more accurately and efficiently identify abnormal users.
[0041] To make the above and other objects, features and advantages of this document more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description
[0042] To more clearly illustrate the technical solutions in the embodiments or prior art described herein, the accompanying drawings used in the description of the embodiments or prior art will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this article. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0043] Figure 1 A flowchart illustrating an abnormal user identification method based on time series data provided in this embodiment is shown.
[0044] Figure 2 This document illustrates a flowchart of a process for converting multiple sets of multidimensional transaction data into a matrix form to obtain a time sliding window matrix, as provided in the embodiments of this document.
[0045] Figure 3A schematic diagram of the process for decomposing a time sliding window matrix into a trend matrix using an automatic encoder, as provided in the embodiments of this article, is shown.
[0046] Figure 4 This document illustrates a flowchart of an embodiment for obtaining a user's score within a set time period based on a trend matrix.
[0047] Figure 5 This document shows a schematic diagram of the module structure of an abnormal user identification device based on time series data, as provided in an embodiment of the invention.
[0048] Figure 6 A schematic diagram of the structure of the computer device provided in the embodiments of this article is shown.
[0049] Explanation of symbols in the attached drawings:
[0050] 100. Acquisition Module;
[0051] 200. Conversion module;
[0052] 300. Decomposition Module;
[0053] 400. Determine the module;
[0054] 500. Identification module;
[0055] 602. Computer equipment;
[0056] 604, Processor;
[0057] 606. Memory;
[0058] 608. Drive mechanism;
[0059] 610. Input / output module;
[0060] 612. Input devices;
[0061] 614. Output devices;
[0062] 616. Presentation equipment;
[0063] 618. Graphical User Interface;
[0064] 620. Network interface;
[0065] 622. Communication link;
[0066] 624. Communication bus. Detailed Implementation
[0067] The technical solutions in the embodiments described below will be clearly and completely described with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments described herein, and not all of the embodiments. Based on the embodiments described herein, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this document.
[0068] Currently, mainstream machine learning techniques have shortcomings in identifying abnormal users, mainly in two aspects. First, traditional machine learning techniques, through feature engineering, construct attribute features using customer attribute data and then aggregate transaction data to construct behavioral features to characterize customers. However, when aggregating transaction data (such as summation, averaging, variance, and extrema), a large amount of temporal information is lost (for example, the mean of a mean-type feature might be the same for sequences from largest to smallest and sequences from smallest to largest), resulting in poor model performance. Second, while graph neural network-based techniques can effectively consider the relationships between customers, such as transaction relationships, their drawback is the extremely high computational cost, especially in the financial field. The number of customers is vast, and transactions between customers are often multiple and directed. Therefore, constructing a transaction graph between customers often consumes enormous resources and places high demands on computational performance, thus exhibiting certain limitations.
[0069] To address the aforementioned issues, this paper presents an example of a method for identifying abnormal users based on time-series data. Figure 1 This is a flowchart illustrating an abnormal user identification method based on time-series data provided in this embodiment. This specification provides the operational steps of the method described in the embodiments or flowchart, but based on conventional or non-inventive labor, more or fewer operational steps may be included. The order of steps listed in the embodiments is merely one possible execution order among many and does not represent the only possible execution order. In actual system or device products, the methods shown in the embodiments or accompanying drawings can be executed sequentially or in parallel.
[0070] It should be noted that the terms "first," "second," etc., used in the specification, claims, and accompanying drawings herein are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, apparatus, product, or device that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices.
[0071] Reference Figure 1 The present application describes a method for identifying abnormal users based on time series data, comprising:
[0072] S101: Obtain multiple sets of multidimensional transaction data from the user over several consecutive set time periods, where each set time period corresponds to a set of multidimensional transaction data;
[0073] S102: Convert the multiple sets of multidimensional transaction data into matrix form to obtain a time sliding window matrix; each element in the time sliding window matrix represents a set of multidimensional transaction data.
[0074] S103: Decompose the time sliding window matrix to obtain a trend matrix; the trend matrix contains data that conforms to the trend from multiple sets of multidimensional trading data;
[0075] S104: Based on the trend matrix, obtain multiple consecutive scores corresponding to the user in multiple consecutive set time periods;
[0076] S105: Compare the consecutive scores with a set threshold, and identify the user as a normal user or an abnormal user based on the comparison results.
[0077] The set time period can be one day, one week, half a month, or one month, etc. Multiple consecutive set time periods can be several consecutive days, several consecutive weeks, or several consecutive months. However, to balance the efficiency and accuracy of identifying abnormal users, a set time period of one day is best. Taking several consecutive days as an example, each day corresponds to one set of multi-dimensional transaction data, resulting in multiple consecutive sets of multi-dimensional transaction data. For users, multi-dimensional transaction data includes transaction amount, number of transactions, number of fund inflows, number of fund outflows, balance, etc., where transaction amount and number of transactions are each one-dimensional transaction data within the multi-dimensional transaction data.
[0078] Since a set of multidimensional transaction data includes many data dimensions, and users have multiple sets of multidimensional transaction data within multiple consecutive set time periods, in order to clearly and completely record multiple sets of multidimensional transaction data, this embodiment of the application records multiple sets of multidimensional transaction data in the form of a matrix, resulting in a time sliding window matrix.
[0079] Reference Figure 2 The step of converting the multiple sets of multidimensional transaction data into matrix form to obtain a time sliding window matrix further includes:
[0080] S201: Construct time series of multiple sets of multidimensional transaction data, wherein each element of the time series represents a set of multidimensional transaction data;
[0081] S202: Construct a time sliding window matrix iteratively according to the set time window.
[0082] First, construct time series data for multiple sets of multidimensional transaction data, in the form of: T = <s1,s2,…,s C >, where C represents multiple consecutive set time periods, s i s represents the multidimensional transaction data corresponding to the i-th set time period. i ∈R D D represents the dimension of the transaction data, R D This includes transaction amount, number of transactions, number of inflows of funds, number of outflows of funds, balance, etc.
[0083] Time series T = <s1,s2,…,s C >Transform into a time sliding window matrix M, M∈R B×K×D Where B is the set time window, K is the number of columns in the time sliding window matrix, and K = C - B + 1. By iteratively constructing the set time window, a time sliding window matrix M with K columns can be obtained. The time sliding window matrix M can represent the shape and pattern of the time series. For example, M has the following form:
[0084]
[0085] The time sliding window matrix decomposition yields a trend matrix, which has the same form as the time sliding window matrix. However, the trend matrix only contains data that conforms to the trend from multiple sets of multidimensional trading data. The data that conforms to the trend in each set of multidimensional trading data is the data that conforms to the trend within a set time period. It is understood that during the trading process, a user may experience a sudden increase or decrease in trading volume on a certain day or week. However, such a sudden increase or decrease is just an occasional occurrence and not an abnormal trading. But such a sudden increase or decrease does not conform to the trend, and the trend matrix in this application does not contain the trading data of such sudden increases or decreases.
[0086] The trend matrix records data on users that conform to trends over multiple consecutive set time periods. Based on the trend matrix, multiple consecutive scores corresponding to users over multiple consecutive set time periods can be obtained. By comparing these multiple consecutive scores with set thresholds, users can be identified as normal or abnormal users based on the comparison results.
[0087] Specifically, since multiple consecutive set time periods correspond to multiple consecutive scores, if a certain number of consecutive scores are greater than a set threshold, the user is considered an abnormal user; otherwise, they are considered a normal user. For example, 10 consecutive set time periods correspond to 10 consecutive scores. If 5 out of 10 consecutive scores are greater than the set threshold, the user is considered an abnormal user. The exact number of consecutive scores can be determined based on the actual situation, but if a certain number of consecutive scores are greater than the set threshold, it indicates that there have been multiple abnormal transactions within the multiple consecutive set time periods.
[0088] In this application, multiple sets of multidimensional transaction data from a user over several consecutive set time periods are acquired. These data are then converted into a matrix to obtain a time-sliding window matrix. This time-sliding window matrix is further decomposed to obtain a trend matrix containing only data conforming to the trend. Based on the trend matrix, multiple consecutive scores corresponding to the user over the various set time periods can be obtained. By comparing these scores with a set threshold, the user is identified as either a normal user or an abnormal user. Because this application reasonably acquires multiple set time periods and the trend matrix contains only data conforming to the trend, the abnormal user identification method based on this application can identify abnormal users more accurately and efficiently.
[0089] In this application, the step of decomposing the time sliding window matrix to obtain the trend matrix further includes:
[0090] The time sliding window matrix is nonlinearly transformed using a two-dimensional convolutional neural network.
[0091] Specifically, the purpose of the nonlinear transformation is to smooth the data in the time sliding window matrix and remove noise. The time sliding window matrix after the nonlinear transformation is shown below:
[0092]
[0093] in, This is the linearly transformed time sliding window matrix, with the same dimensions as the time sliding window matrix M. Let W1 be any nonlinear transformation function, specifically a ReLU function, a tanh function, or a Sigmoid function, where W1 is the first weight and b1 is the first bias.
[0094] Solve for the values of W1 and b1 using the following objective function:
[0095]
[0096] in for L2 norm, for The L2 norm.
[0097] In this application, the step of decomposing the time sliding window matrix to obtain the trend matrix further includes:
[0098] The trend matrix is obtained by decomposing the time sliding window matrix using an autoencoder.
[0099] Specifically, refer to Figure 3 The step of using an autoencoder to decompose the time sliding window matrix into a trend matrix further includes:
[0100] S301: Decompose the time sliding window matrix into an initial trend matrix and an initial deviation matrix, wherein the initial deviation matrix contains data that deviates from the trend from multiple sets of multidimensional trading data;
[0101] S302: Encode the initial trend matrix using the encoding function of the automatic encoder to obtain the encoded initial trend matrix;
[0102] S303: Decode the encoded initial trend matrix using the decoding function of the autoencoder to obtain the reconstructed matrix;
[0103] S304: The initial trend matrix, the reconstructed matrix, and the initial deviation matrix are trained using the objective function of the autoencoder to obtain the trend matrix.
[0104] The time sliding window matrix contains multiple sets of multi-dimensional transaction data, including data that conforms to the trend and data that deviates from the trend. Data that deviates from the trend represents occasional spikes or drops in trading volume. While these spikes or drops are not inherently abnormal, they can affect the accuracy of identifying anomalous users. To ensure accurate identification of anomalous users, the time sliding window matrix needs to be decomposed. The resulting trend matrix is used to identify whether a user is normal or anomalous, while the resulting deviation matrix is discarded.
[0105] To ensure the accuracy of the time sliding window matrix decomposition, it is first decomposed into an initial trend matrix and an initial deviation matrix. Then, it is encoded by an autoencoder and decoded to obtain a reconstructed matrix. A highly accurate trend matrix is obtained by training with the initial trend matrix, the reconstructed matrix, and the initial deviation matrix. Since the time sliding window matrix is composed of a trend matrix and a deviation matrix, the deviation matrix is also obtained at the same time as the trend matrix.
[0106] Autoencoders are based on two-dimensional convolutional neural networks, and take the following form:
[0107] The encoding function of the automatic encoder is:
[0108] The decoding function of the auto encoder is:
[0109] The objective function of the autoencoder is:
[0110]
[0111]
[0112] Where L is the initial trend matrix and S is the initial deviation matrix. W can be any nonlinear transformation function, specifically the ReLU function, tanh function, or sigmoid function. e As the second weight, b e For the second bias, E AE (L) is the encoded initial trend matrix, W d As the third weight, b d For the third bias, D AE (E AE (L)) is the reconstruction matrix, λ1 is a hyperparameter, which can take values of 0.1, ||LD AE (E AE (L))||2 is LD AE (E AE The L2 norm of (L)) is given by ||S||1, which is the L1 norm of S.
[0113] In this application, reference is made to Figure 4 The step of obtaining multiple consecutive scores for a user within multiple set time periods based on the trend matrix further includes:
[0114] S401: Convert the trend matrix into a trend sequence, where each element of the trend sequence represents data that conforms to the trend in a set of multidimensional trading data;
[0115] S402: Perform a nonlinear transformation on the trend sequence using a one-dimensional convolutional neural network;
[0116] S403: Calculate the trend sequence after nonlinear transformation to obtain multiple scores corresponding to the user in multiple set time periods.
[0117] In contrast to the sequence-to-matrix conversion described above, the reverse operation of the trend matrix converts it into a trend sequence. Similarly, the purpose of the nonlinear transformation is to smooth the data in the trend sequence and remove noise. The difference is that a one-dimensional convolutional neural network is used here for the nonlinear transformation because the trend sequence is a one-dimensional sequence, while a two-dimensional convolutional neural network was used in the previous section because the time window matrix is a two-dimensional matrix.
[0118] The trend sequence after nonlinear transformation is shown below:
[0119]
[0120] Among them, T L ' is the trend sequence after nonlinear transformation, T L For trend sequences, Let W2 be any nonlinear transformation function, specifically a ReLU function, a tanh function, or a Sigmoid function. W2 is the fourth weight, and b2 is the fourth bias.
[0121] Solve for the values of W2 and b2 using the following objective function:
[0122]
[0123] stT=T L +T S ;
[0124] in for L2 norm, ||T S ||1 is T S The L1 norm of T S The deviation sequence is obtained by transforming the deviation matrix, λ2 is a hyperparameter which can be 0.1, and T is the time series.
[0125] Finally, the trend sequence after nonlinear transformation is obtained.
[0126] The step of calculating the trend sequence after nonlinear transformation to obtain multiple scores corresponding to the user in multiple set time periods further includes:
[0127] The following formula is used to calculate the user's scores for different time periods:
[0128]
[0129] Among them, score i This represents the user's score within the i-th defined time period. This refers to the data that conforms to the trend in the multidimensional trading data corresponding to the i-th set time period in the trend sequence after nonlinear transformation. for The L2 norm.
[0130] The above formula is used to obtain multiple consecutive scores for a user within several set time periods. By comparing these multiple consecutive scores with set thresholds, the user is identified as a normal user or an abnormal user based on the comparison results.
[0131] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties. Furthermore, the acquisition, storage, use, and processing of data in the technical solutions described in the embodiments of this application all comply with the relevant provisions of national laws and regulations.
[0132] Based on the above-described method for identifying abnormal users based on time-series data, this embodiment also provides an apparatus for identifying abnormal users based on time-series data. The apparatus may include a system (including a distributed system), software (application), module, component, server, client, etc., using the method described in this embodiment, combined with necessary hardware implementation. Based on the same innovative concept, the apparatuses in one or more embodiments provided in this embodiment are as described in the following embodiments. Since the implementation schemes and methods for solving the problem are similar, the implementation of the specific apparatus in this embodiment can refer to the implementation of the aforementioned method, and repeated details will not be elaborated further. As used below, the terms "unit" or "module" can refer to a combination of software and / or hardware that implements a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.
[0133] Specifically, Figure 5 This is a schematic diagram of the module structure of an embodiment of an abnormal user identification device based on time series data provided in this article. (Refer to...) Figure 5 As shown in the embodiments herein, an abnormal user identification device based on time series data includes: an acquisition module 100, a conversion module 200, a decomposition module 300, a determination module 400, and an identification module 500.
[0134] The acquisition module 100 is used to acquire multiple sets of multidimensional transaction data of the user in multiple consecutive set time periods, wherein each set time period corresponds to a set of multidimensional transaction data.
[0135] The conversion module 200 is used to convert the multiple sets of multidimensional transaction data into matrix form to obtain a time sliding window matrix; each element in the time sliding window matrix represents a set of multidimensional transaction data.
[0136] The decomposition module 300 is used to decompose the time sliding window matrix to obtain a trend matrix; the trend matrix contains data that conforms to the trend from multiple sets of multidimensional transaction data.
[0137] The determining module 400 is used to obtain multiple consecutive scores corresponding to the user in multiple consecutive set time periods based on the trend matrix.
[0138] The identification module 500 is used to compare the consecutive multiple scores with a set threshold, and identify the user as a normal user or an abnormal user based on the comparison results.
[0139] Reference Figure 6 As shown, based on the above-described method for identifying abnormal users based on time-series data, one embodiment of this document also provides a computer device 602, wherein the above method runs on the computer device 602. The computer device 602 may include one or more processors 604, such as one or more central processing units (CPUs) or graphics processing units (GPUs), each processing unit implementing one or more hardware threads. The computer device 602 may also include any memory 606 for storing any kind of information such as code, settings, data, etc. In one specific embodiment, a computer program is stored on the memory 606 and can run on the processor 604. When the computer program is run by the processor 604, it can execute instructions according to the above method. Non-limitingly, for example, the memory 606 may include any type of RAM, any type of ROM, flash memory device, hard disk, optical disk, etc. More generally, any memory can use any technology to store information. Further, any memory can provide volatile or non-volatile retention of information. Further, any memory can represent a fixed or removable component of the computer device 602. In one scenario, when processor 604 executes associated instructions stored in any memory or combination of memories, computer device 602 can perform any operation of the associated instructions. Computer device 602 also includes one or more drive mechanisms 608 for interacting with any memory, such as a hard disk drive, an optical disk drive, etc.
[0140] Computer device 602 may also include an input / output module 610 (I / O) for receiving various inputs (via input device 612) and providing various outputs (via output device 614). A specific output mechanism may include a presentation device 616 and an associated graphical user interface 618 (GUI). In other embodiments, the input / output module 610 (I / O), input device 612, and output device 614 may be omitted, and the device may function solely as a computer device within a network. Computer device 602 may also include one or more network interfaces 620 for exchanging data with other devices via one or more communication links 622. One or more communication buses 624 couple the components described above together.
[0141] Communication link 622 can be implemented in any way, such as via a local area network, a wide area network (e.g., the Internet), a point-to-point connection, or any combination thereof. Communication link 622 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.
[0142] Corresponding to Figures 1-4 In addition to the methods described above, this embodiment also provides a computer-readable storage medium storing a computer program that, when executed by a processor, performs the steps of the above-described methods.
[0143] This embodiment also provides a computer-readable instruction, wherein when a processor executes the instruction, the program therein causes the processor to perform the following: Figures 1 to 4 The method shown.
[0144] It should be understood that in the various embodiments of this document, the sequence number of each process does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this document.
[0145] It should also be understood that, in the embodiments herein, the term "and / or" is merely a description of the relationship between associated objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, and B existing alone. Additionally, the character " / " in this document generally indicates that the preceding and following associated objects have an "or" relationship.
[0146] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this document.
[0147] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0148] In the embodiments provided herein, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the couplings or direct couplings or communication connections shown or discussed may be indirect couplings or communication connections through some interfaces, devices, or units, or they may be electrical, mechanical, or other forms of connection.
[0149] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of the embodiments described herein, depending on actual needs.
[0150] Furthermore, the functional units in the various embodiments of this document can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0151] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this paper, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this paper. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0152] This document uses specific embodiments to illustrate the principles and implementation methods of this document. The descriptions of the embodiments above are only for the purpose of helping to understand the methods and core ideas of this document. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this document. Therefore, the content of this specification should not be construed as a limitation of this document.
Claims
1. A method for identifying abnormal users based on time series data, characterized in that, include: Acquire multiple sets of multidimensional transaction data from users over several consecutive set time periods, with each set time period corresponding to a set of multidimensional transaction data; The multiple sets of multidimensional transaction data are converted into matrix form to obtain a time sliding window matrix; each element in the time sliding window matrix represents a set of multidimensional transaction data. A two-dimensional convolutional neural network is used to perform a nonlinear transformation on the time sliding window matrix to smooth the data in the time sliding window matrix and remove noise; An autoencoder is used to decompose the nonlinearly transformed time sliding window matrix to obtain a trend matrix; the trend matrix contains data that conforms to the trend from multiple sets of multidimensional trading data. Based on the trend matrix, a series of consecutive scores corresponding to the user in multiple consecutive set time periods are obtained; The consecutive scores are compared with a set threshold, and the user is identified as a normal user or an abnormal user based on the comparison results. The step of using an autoencoder to decompose the nonlinearly transformed time sliding window matrix into a trend matrix further includes: The time sliding window matrix after nonlinear transformation is decomposed into an initial trend matrix and an initial deviation matrix, wherein the initial deviation matrix contains data that deviates from the trend from multiple sets of multidimensional trading data. The initial trend matrix is encoded using the encoding function of an automatic encoder to obtain the encoded initial trend matrix; The encoded initial trend matrix is decoded using the decoding function of an autoencoder to obtain a reconstructed matrix; The initial trend matrix, the reconstructed matrix, and the initial deviation matrix are trained using the objective function of the autoencoder to obtain the trend matrix.
2. The abnormal user identification method based on time series data according to claim 1, characterized in that, The step of obtaining multiple consecutive scores for a user within multiple consecutive set time periods based on the trend matrix further includes: The trend matrix is converted into a trend sequence, where each element of the trend sequence represents data that conforms to the trend from a set of multidimensional trading data. The trend sequence is subjected to a nonlinear transformation using a one-dimensional convolutional neural network. The trend sequence after nonlinear transformation is calculated to obtain multiple scores corresponding to the user in multiple set time periods.
3. The abnormal user identification method based on time series data according to claim 2, characterized in that, The step of calculating the trend sequence after nonlinear transformation to obtain multiple scores corresponding to the user in multiple set time periods further includes: The following formula is used to calculate the user's scores for different time periods: Among them, score i This represents the user's score within the i-th defined time period. This refers to the data that conforms to the trend in the multidimensional trading data corresponding to the i-th set time period in the trend sequence after nonlinear transformation. for The L2 norm.
4. The abnormal user identification method based on time series data according to claim 1, characterized in that, The step of converting the multiple sets of multidimensional transaction data into matrix form to obtain the time sliding window matrix further includes: Construct time series of multiple sets of multidimensional transaction data, wherein each element of the time series represents a set of multidimensional transaction data; A time sliding window matrix is constructed iteratively according to a set time window.
5. An abnormal user identification device based on time series data, characterized in that, The device includes: The acquisition module is used to acquire multiple sets of multidimensional transaction data of the user within several consecutive set time periods, where each set time period corresponds to a set of multidimensional transaction data. The conversion module is used to convert the multiple sets of multidimensional transaction data into matrix form to obtain a time sliding window matrix; each element in the time sliding window matrix represents a set of multidimensional transaction data. The nonlinear transformation module is used to perform a nonlinear transformation on the time sliding window matrix using a two-dimensional convolutional neural network to smooth the data in the time sliding window matrix and remove noise. The decomposition module is used to decompose the nonlinearly transformed time sliding window matrix into a trend matrix using an autoencoder; the trend matrix contains data that conforms to the trend from multiple sets of multidimensional trading data. The determining module is used to obtain multiple consecutive scores corresponding to a user in multiple consecutive set time periods based on the trend matrix. The identification module is used to compare the consecutive scores with a set threshold, and identify the user as a normal user or an abnormal user based on the comparison results. Specifically, the decomposition module is used to decompose the nonlinearly transformed time sliding window matrix into an initial trend matrix and an initial deviation matrix, wherein the initial deviation matrix contains data that deviates from the trend from multiple sets of multidimensional trading data. The initial trend matrix is encoded using the encoding function of an automatic encoder to obtain the encoded initial trend matrix; The encoded initial trend matrix is decoded using the decoding function of an autoencoder to obtain a reconstructed matrix; The initial trend matrix, the reconstructed matrix, and the initial deviation matrix are trained using the objective function of the autoencoder to obtain the trend matrix.
6. A computer device comprising a memory, a processor, and a computer program stored in the memory, characterized in that, When the computer program is run by the processor, it executes the instructions of the method according to any one of claims 1-4.
7. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is run by the processor of the computer device, it executes the instructions of the method according to any one of claims 1-4.