A method for detecting illegitimate accounts on Ethereum based on heterogeneous graph transformation networks
By constructing an account-centric heterogeneous information network and using a graph transformation network to automatically mine meta-paths, the problem of low efficiency in detecting illegitimate accounts on the Ethereum platform is solved, achieving efficient and accurate illegitimate account detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING INST OF TECH
- Filing Date
- 2023-02-23
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies struggle to effectively detect illegitimate accounts on blockchains, especially on the Ethereum platform. Traditional algorithms are unable to learn high-order features of graph structures and have low detection efficiency.
Construct a heterogeneous information network centered on accounts, use graph transformation networks to automatically mine meta-paths, use graph convolutional neural networks to detect illegal accounts, and classify them using the correlation information of nodes such as accounts, transactions, blocks, smart contracts, and balances.
It achieves efficient and accurate detection of illegitimate accounts on the Ethereum platform, with a classification accuracy of 95% and a single detection time of less than one minute, which is better than traditional methods.
Smart Images

Figure CN116415960B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of blockchain and machine learning technology, specifically relating to an illegal account detection method based on a heterogeneous graph transformation network on Ethereum. Background Technology
[0002] In recent years, blockchain security and regulation have attracted considerable attention. However, the lack of effective regulation in the blockchain market has led to various fraudulent techniques targeting blockchain, particularly in the financial investment sector, where scams luring investors with promises of high returns have emerged. Many investors, lacking understanding of blockchain technology and tempted by the appreciation of various cryptocurrencies, are easily misled by criminals, resulting in significant financial losses. Currently, due to the anonymity of scammers and certain characteristics of smart contracts, such as automatic execution, numerous fraudulent accounts exist on the blockchain, making the detection of illegitimate accounts particularly important. Current research on detecting illegitimate accounts on the blockchain utilizes deep learning and artificial intelligence.
[0003] Most existing research on blockchain fraud focuses on homogeneous nodes and their relationships, rather than the higher-order node information and links in heterogeneous networks. Most traditional algorithms use graph neural networks, with a fixed graph structure as the input matrix and manually specified meta-paths. This prevents the model from learning higher-order features of the graph structure, requiring more manual configuration. For example, existing research uses deep learning frameworks to detect account anomalies, using representation vectors as input to the network model for anomaly detection. Furthermore, some studies use graph structures to represent transaction network information, treating accounts as nodes and transactions between accounts as edges, but the graph structure only involves accounts. Given the massive number of accounts on platforms like Ethereum and the highly complex information involved, it is essential to develop a method for intelligently extracting useful information from these complex networks to identify fraudulent accounts. Summary of the Invention
[0004] In view of this, the purpose of this invention is to provide a method for detecting illegitimate accounts on Ethereum based on a heterogeneous graph transformation network, which can quickly achieve the task of detecting illegitimate accounts on Ethereum.
[0005] A method for detecting illegitimate accounts on Ethereum based on a heterogeneous graph transformation network includes the following steps:
[0006] Step 1: Obtain legitimate and illegitimate accounts as training data; this includes obtaining activity information for each account, including relevant transactions, the block in which they are located, account balance, and smart contracts they participate in.
[0007] Step 2: Construct a heterogeneous information network centered on accounts, specifically:
[0008] Step 2.1: Treat accounts, transactions, blocks, smart contracts, and balances as nodes in the network, and add corresponding tags to each node;
[0009] Step 2.2: Based on the association information between different types of nodes, construct adjacency matrices respectively, thereby constructing the edge dataset D. edges Specifically:
[0010] For the account-transaction adjacency matrix, if the current account participates in a transaction, the element at the corresponding position in the adjacency matrix is 1, otherwise it is 0, resulting in a sparse matrix with all values of 0 and 1.
[0011] For the transaction-block adjacency matrix, if the current transaction exists in a certain block, the value of the element at the corresponding position in the adjacency matrix is 1; otherwise, the value is 0.
[0012] Transpose the account-transaction adjacency matrix and the transaction-block adjacency matrix to obtain the transaction-account adjacency matrix and the block-transaction adjacency matrix, respectively. This yields the set D of edges between different entities formed by these four adjacency matrices. edges ;
[0013] Step 2.3: Construct the feature dataset D features Specifically:
[0014] Feature information of account nodes is collected, and an N×F feature matrix is used to store the account features, where N and F are the number of nodes in the heterogeneous network and the dimension of the account features, respectively. Account nodes are represented by account features, while the features of other non-account nodes are represented by the sum and average of the features of their associated account nodes. A composite feature matrix is constructed by concatenating the account node feature matrix and the non-account node feature matrix. Finally, the numerical features are scaled according to normalization and standardization operations to obtain the feature dataset D. features ;
[0015] Step 2.4: Establish a heterogeneous information network based on the information obtained in steps 2.1-2.3;
[0016] Step 3: Based on the heterogeneous network obtained in Step 2, use the graph transformation layer to obtain potential association information between accounts and other types of nodes, and construct a new potential association information matrix;
[0017] Step 4: Based on the potential association information obtained from the graph transformation layer in Step 3, and the datasets constructed in Step 2, the graph convolutional neural network is used as a feature extractor to calculate the node embeddings, complete the classification task, and thus realize the detection of illegal accounts.
[0018] Preferably, in the training data, all nodes are divided into training set L according to the account node ratio of 3:3:10. tr Validation set L val Test set L test .
[0019] Preferably, the legitimate account needs to be cross-referenced with the illegitimate account ID, and it needs to be ensured that any legitimate account obtained has not performed any documented illegitimate activities.
[0020] The present invention has the following beneficial effects:
[0021] This invention provides a method for detecting illegitimate accounts on Ethereum based on a heterogeneous graph transformation network. First, a network pattern of an account-centric heterogeneous information network describing account activity information is defined. Then, a graph transformation network is used to automatically mine meta-paths and calculate multi-hop connections. The resulting relation matrix is input into a convolutional neural network to obtain node embeddings, thereby detecting illegitimate accounts. This invention addresses the problem that existing account transaction networks only consider homogeneous nodes and their relationships, making it difficult to obtain high-order node information and connections. It provides a definition of an account-centric heterogeneous information network pattern that can be used to describe Ethereum account activity information, including account transactions, balances, and block information. Furthermore, this invention addresses the problem that existing Ponzi scheme account identification methods rely on manually specifying meta-paths and cannot automatically generate meta-paths to obtain new graph structures. It proposes using a graph transformation network to learn meta-paths to obtain a relation matrix, which is then used as input to a convolutional network, transforming the illegitimate account detection task into a node embedding classification task based on a graph neural network. Experimental results show that the proposed model achieves a node classification accuracy of up to 95%, outperforming other account detection schemes, with a single detection time of approximately one minute, demonstrating both high efficiency and accuracy. Attached Figure Description
[0022] Figure 1 This is a flowchart of the illegal account detection method based on heterogeneous graph transformation network of the present invention, which shows the main steps of the proposed method: data acquisition, heterogeneous network construction, and graph neural network classification and detection.
[0023] Figure 2 This is a diagram of a heterogeneous information network model centered on accounts. The network model described in the diagram includes various types of nodes such as accounts, transactions, smart contracts, blocks, and balances, and indicates the relationship patterns between these nodes.
[0024] Figure 3This is a schematic diagram of graph transformation network metapath mining. Detailed Implementation
[0025] The present invention will now be described in detail with reference to the accompanying drawings and embodiments.
[0026] To overcome the shortcomings of the Ethereum illegitimate account detection task, namely the difficulty in obtaining high-order information due to the homogeneity of the transaction network and the low efficiency of traditional machine learning classification algorithms, this invention proposes an illegitimate account detection method based on heterogeneous graph transformation networks. This method can make full use of heterogeneous networks to obtain high-order node information and links, and use graph transformation networks to obtain new graph structures, thereby improving classification efficiency and realizing the task of detecting illegitimate accounts on Ethereum.
[0027] To achieve the above objectives, this invention constructs a heterogeneous information network model centered on accounts. The node types in the network include accounts, transactions, blocks, smart contracts, and balances: Accounts primarily refer to external accounts on Ethereum; transactions are signed data packets sent from one account to another; blocks are on-chain data packets containing transactions and other data; smart contracts are code on the blockchain that is automatically executed when the terms in the contract are triggered; and balances refer to the account's balance information. The types of edges in the network model correspond to the relationships between different types of nodes.
[0028] The technical solution of the present invention is shown below, specifically comprising the following steps:
[0029] Step 1: Create training data labeled with normal accounts and abnormal accounts.
[0030] Step 1.1: Retrieve tagged account information from the Etherscam database, obtaining 800 account addresses marked as invalid.
[0031] Step 1.2: Using the functional tools developed by Sokolowska, select ordinary accounts that transacted within the range of blocks 3800000 and 3805000. After filtering out non-unique accounts, 800 unique accounts are obtained. These accounts were active during mid-July 2017, i.e., during this period of increased Ethereum network usage. It is required that all legitimate accounts be cross-referenced with illegitimate account IDs, and that any legitimate account obtained be free of any documented illegitimate activities.
[0032] Step 1.3: Pass all normal and illegal account addresses to Etherscan's API to obtain activity information for each account, including the corresponding transactions, the blocks they belong to, their account balances, and the smart contracts they participate in.
[0033] Step 2: Construct as follows Figure 2The diagram shows a heterogeneous information network G(o, r) centered on accounts, where o represents an object in the network and r represents the relationship between objects.
[0034] Step 2.1: Construct the label dataset D labels The training set L is divided into all nodes according to the account node ratio of 3:3:10. tr Validation set L val Test set L test That is, dataset D labels ={L tr L val L test The types of nodes in the network include accounts, transactions, blocks, smart contracts, and balances. Based on the account information and other activity information obtained in step 1, nodes are labeled, with illegal account nodes labeled 0, legitimate account nodes labeled 1, transaction nodes labeled 2, etc., resulting in the corresponding dataset L = {(x i y i ), y i ∈[0, classnum], i = 1, 2, ... N} (where x i Each node in the heterogeneous network is y, and x is x. i The corresponding label, classnum, corresponds to the number of node types in the network, and N is the number of nodes in the network.
[0035] Step 2.2: Construct the edge dataset D edges Based on the association information between different types of nodes, adjacency matrices are constructed respectively. For the account-transaction adjacency matrix, if the current account i participates in transaction j, then the element at the corresponding position in the adjacency matrix is the Account-Transaction. i,j The value is 1 otherwise, resulting in a sparse matrix Account-Transaction with all values of 0 and 1. Similarly, a transaction-block adjacency matrix is constructed.
[0036] Transpose the account-transaction adjacency matrix and the transaction-block adjacency matrix to obtain the transaction-account adjacency matrix and the block-transaction adjacency matrix, respectively. This yields the set D of edges between different entities formed by these four adjacency matrices. edges .
[0037] Step 2.3: Construct the feature dataset D featuresBased on the 42 account features presented in the paper "Detection of illicit accounts over the Ethereum blockchain," account feature information was collected. An N×F feature matrix was used to store the account features, where N and F represent the number of nodes in the heterogeneous network and the dimension of the account features, respectively. Account nodes were represented using account features, while the features of other non-account nodes were represented by the sum and average of the features of their associated account nodes. A composite feature matrix was constructed by concatenating the account node feature matrix and the non-account node feature matrix. Finally, the numerical features were scaled using normalization and standardization operations.
[0038] Step 3: Use a graph transformation layer to obtain meta-paths to capture relationships between nodes. Based on the heterogeneous network G(o,r) obtained in Step 2, use a graph transformation layer to obtain potential association information between accounts and other types of nodes, construct a new potential association information matrix, and learn a new graph structure. This step can be implemented using the method disclosed in "Ziniu Hu, Yuxiao Dong, Kuansan Wang, Yizhou Sun: Heterogeneous Graph Transformer. WWW 2020:2704-2710".
[0039] The specific expression is as follows:
[0040]
[0041] Where ConV represents the convolution process, It is the adjacency matrix of s-hop path counts, and the weights are normalized using the softmax function. is the weight parameter, representing the multiplicity of the convolutional layer.
[0042] Specifically, it includes the following steps:
[0043] Step 3.1: Extract the meta-paths in the heterogeneous information network graph through the graph transformation network, and perform convolution on the adjacency matrix and weight matrix of different edge types in the heterogeneous information network graph in the first graph transformation layer;
[0044] Step 3.2: As Figure 3 As shown, the output of the first graph transformation layer is used to generate an adjacency matrix based on metapaths through matrix multiplication, i.e.
[0045] Step 3.3: Stack multiple graph transformation layers. The input to the second and subsequent graph transformation layers is the output of the previous layer and the original edge type adjacency matrix. The convolutional layers in the second and subsequent graph transformation layers work in the same way as the first graph transformation layer, calculating a new weight matrix for all edge types in each channel and generating meta-paths based on the adjacency matrix for each layer, i.e., t i ∈τ et , τ et The set represents the edge types, and β represents the edge weight. In the i-th transformation layer, t i Type weight;
[0046] Step 3.4: The meta-path is the path connected to edges of different types. The adjacency matrix is generated by multiplying the adjacency matrices of each edge type along the path after convolution, and can be represented as follows:
[0047] Step 3.5: Obtain the importance score for each metapath based on the cumulative product of the weights of all edge types along the path.
[0048] Step 4: Input the graph structure information of the learned metapath into the graph convolutional neural network to generate node embeddings, and perform classification and detection of account nodes. Based on the association information obtained from the graph transformation layer in Step 3, and the datasets constructed in Step 2, the graph convolutional neural network is used as a feature extractor to calculate the node embeddings, complete the classification task, and thus achieve the detection of illegal accounts.
[0049] The specific expression of the model is as follows:
[0050]
[0051] Where || denotes the combination operation, and ConV represents the number of convolution channels. The degree matrix represents the adjacency matrix. Tensor A (s) The adjacency matrix of the s-th channel, M F Let M represent the characteristic matrix. W This represents a trainable weight matrix shared across channels.
[0052] In summary, the above are merely preferred embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for detecting illegitimate accounts on Ethereum based on a heterogeneous graph transformation network, characterized in that, Includes the following steps: Step 1: Obtain legitimate and illegitimate accounts as training data; this includes obtaining activity information for each account, including relevant transactions, the block in which they are located, account balance, and smart contracts they participate in. Step 2: Construct a heterogeneous information network centered on accounts, specifically: Step 2.1: Treat accounts, transactions, blocks, smart contracts, and balances as nodes in the network, and add corresponding tags to each node; Step 2.2: Based on the association information between different types of nodes, construct adjacency matrices respectively, thereby constructing the edge dataset D. edges Specifically: For the account-transaction adjacency matrix, if the current account participates in a transaction, the element at the corresponding position in the adjacency matrix is 1, otherwise it is 0, resulting in a sparse matrix with all values of 0 and 1. For the transaction-block adjacency matrix, if the current transaction exists in a certain block, the value of the element at the corresponding position in the adjacency matrix is 1; otherwise, the value is 0. Transpose the account-transaction adjacency matrix and the transaction-block adjacency matrix to obtain the transaction-account adjacency matrix and the block-transaction adjacency matrix, respectively. This yields the set D of edges between different entities formed by these four adjacency matrices. edges ; Step 2.3: Construct the feature dataset D features Specifically: Feature information of account nodes is collected, and an N×F feature matrix is used to store the account features, where N and F are the number of nodes in the heterogeneous network and the dimension of the account features, respectively. Account nodes are represented by account features, while the features of other non-account nodes are represented by the sum and average of the features of their associated account nodes. A composite feature matrix is constructed by concatenating the account node feature matrix and the non-account node feature matrix. Finally, the numerical features are scaled according to normalization and standardization operations to obtain the feature dataset D. features ; Step 2.4: Establish a heterogeneous information network based on the information obtained in steps 2.1-2.3; Step 3: Based on the heterogeneous network obtained in Step 2, use the graph transformation layer to obtain potential association information between accounts and other types of nodes, and construct a new potential association information matrix; Step 4: Based on the potential association information obtained from the graph transformation layer in Step 3, and the datasets constructed in Step 2, the graph convolutional neural network is used as a feature extractor to calculate the node embeddings, complete the classification task, and thus realize the detection of illegal accounts.
2. The method for detecting illegitimate accounts on Ethereum based on heterogeneous graph transformation networks as described in claim 1, characterized in that, In the training data, all nodes are divided into training set L according to the account node ratio of 3:3:
10. tr Validation set l val Test set L test .
3. The method for detecting illegitimate accounts on Ethereum based on heterogeneous graph transformation networks as described in claim 1, characterized in that, The legitimate account needs to be cross-referenced with the illegitimate account ID, and it must be ensured that any legitimate account obtained has not performed any documented illegitimate activities.