Social media bot detection method based on dynamic heterogeneous graph and related apparatus

By constructing a dynamic heterogeneous graph network and utilizing a relational graph transformer and a semantic attention network, the heterogeneity and dynamism issues in social media bot detection are addressed, resulting in more efficient detection of social media bot accounts.

CN115718831BActive Publication Date: 2026-06-12PEOPLE CN CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PEOPLE CN CO LTD
Filing Date
2022-12-08
Publication Date
2026-06-12

Smart Images

  • Figure CN115718831B_ABST
    Figure CN115718831B_ABST
Patent Text Reader

Abstract

The application discloses a social media robot detection method based on a dynamic heterogeneous graph and related devices, 1) social media modeling based on a dynamic heterogeneous information network; 2) social media influence heterogeneity modeling based on a relationship graph transformer; 3) cross-relationship aggregation nodes based on a semantic attention network; 4) social media robot account detection and multi-task aggregation based on a graph neural network. The social media robot detection method based on the dynamic heterogeneous graph discloses a first dynamic and heterogeneous perception social media robot account detection model and algorithm, and the best effect is obtained on a benchmark dataset. The application enhances the robustness of the social media robot detection algorithm by modeling the relationship and influence heterogeneity widely existing in the social media, obtains high-quality social media element representation, can realize various downstream tasks, has the characteristics of sufficient information utilization and effective detection, and has the advantages of practical application.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of social network analysis, specifically to a method and apparatus for detecting social media bots based on dynamic heterogeneous graphs. Background Technology

[0002] Social media is becoming an indispensable part of people's daily lives. Every day, tens of millions of users from around the world log onto various social media platforms to browse news websites, share their lives, or participate in discussions on certain topics. With the increasing popularity of online social media, a new phenomenon called social media bot accounts has emerged. Unlike real social media users managed by humans, social media bot accounts refer to users who are manipulated by automated programs or application programming interfaces (APIs) to automatically post content on social media. Operators of these accounts often use multiple bot accounts to achieve malicious purposes, and this behavior seriously threatens the clean ecosystem of social media. In the past decade, social media bot accounts have become increasingly active in election interference, spreading fake news, and disseminating extremist ideologies. Given the negative impact of malicious social media bot accounts on society, the need for effective social media bot account detection models is becoming increasingly urgent.

[0003] Early research on social media bot account detection largely relied on feature engineering and traditional machine learning classification algorithms, extracting features from social media text and user information and inputting them into classification algorithms. With the rise of deep learning, an increasing number of social media bot account detection algorithms utilize neural networks to improve model performance. Methods such as recurrent neural networks, self-supervised learning, and graph neural networks have all been used for bot account detection and have achieved initial success. However, these methods fail to consider the intrinsic heterogeneity of social media network structures and leverage this heterogeneity to distinguish subtle differences between novel social media bot accounts and real users. Furthermore, these methods do not account for the dynamic changes that social networks undergo over time. Summary of the Invention

[0004] The purpose of this invention is to provide a social media bot detection method and related apparatus based on dynamic heterogeneous graphs, in order to solve the problem of failing to take into account the intrinsic heterogeneity of social media network structure and to use heterogeneity to identify subtle differences between novel social media bot accounts and real users. In addition, it also solves the problem that existing methods do not take into account the dynamic changes of social networks over time.

[0005] To achieve the above objectives, the present invention adopts the following technical solution:

[0006] Social media bot detection methods based on dynamic heterogeneous graphs include:

[0007] Based on heterogeneous information network modeling of social media, this paper takes users, tweets, topics and communities of social media as network nodes, and takes different types of interaction relationships between nodes as heterogeneous edges in the network. It uses feature engineering and pre-trained language models to encode user information and social media text information, and splices timestamp information to obtain the initial features of each node in the graph neural network.

[0008] A relational graph transformer is used to model the heterogeneity of social media relationships and influence. The attention head is calculated based on the initial features of each node, and the query, key, and value values ​​under different relationships and node pairs are calculated. The attention values ​​between different nodes are calculated based on the query and key to model the heterogeneity of relationships. Dynamic dependency information is preserved through relative time encoding. The attention values ​​and neighbor node values ​​are aggregated to obtain the node representation under a specific relationship.

[0009] Based on the cross-relational aggregation node representation of semantic attention network, the node representation after different relations of the aggregation node is obtained, so as to preserve the relationship brought about by the heterogeneity of social media;

[0010] The final node representation is obtained through several layers of graph neural network. The social media bot account is predicted for user type nodes through the output layer and softmax layer. At the same time, the detection is optimized by using position detection and community similarity measurement.

[0011] Furthermore, in modeling social media based on heterogeneous information networks, users, tweets, topics, and communities are treated as heterogeneous nodes v in the network, and different types of interactions between users, tweets, topics, and communities are treated as heterogeneous edges e in the network, with set R. V ,R E Representing the types of nodes and relationships in a heterogeneous information network, respectively, using... Let ψ represent the mapping functions from nodes and edges to their corresponding types, and let time function t be used to stamp each node with a timestamp. The social media dynamic heterogeneous graph network G is defined as follows:

[0012]

[0013] Furthermore, feature engineering is used to encode node metadata, and a pre-trained language model is used to encode user information and semantic data of social media text information. The resulting encoded representations are then concatenated with timestamp information to obtain the feature vector x of each node. i Using a fully connected neural network to process the feature vector x of node i i The transformation is performed, and the result is used as the initial feature x of the points in the graph neural network. i (0) The calculation formula is as follows:

[0014] x i (0) =σ(W I ·x i +b I )

[0015] In the formula, W I b I σ represents the learnable parameters of the model, and σ represents the nonlinear activation function. Different types of nodes represent the use of different linear mapping functions.

[0016] Furthermore, in the process of modeling the heterogeneity of social media influence using relational graph transformers, a graph neural network structure containing transformers and operating in a heterogeneous information network is proposed. First, the values ​​of query, key, and value corresponding to the c-th attention head under relation r and node i are calculated, as shown in the following formula:

[0017]

[0018] In the formula, q, k, v are the query, key, and value values ​​in the attention mechanism, (l) represents the l-th layer graph neural network, and all W and b are learnable parameters in the model for different relations and attention heads;

[0019] Subsequently, the heterogeneity of attention relationships between different nodes is modeled by calculation, as shown below:

[0020]

[0021] In the formula, This represents the attention weight between nodes i and j. Let d denote the exponential dot product function, where d is the hidden layer dimension of each attention head, and N is the number of attention heads. r (i) represents the set of neighbors of node i in relation r;

[0022] Next, the relative time encoding (RTE) is used to model the time difference relationship between the node's neighbors and the node itself, and this time difference is added to the original value of node j for time augmentation. The specific calculation formula is as follows:

[0023] ΔT(i,j)=|t i -t j |

[0024]

[0025]

[0026] RTE(ΔT(i,j))=W T ·Base(ΔT(i,j))+BT

[0027]

[0028] In the formula, t i W represents the timestamp corresponding to node i. T B T These are the parameters for linear transformation;

[0029] Next, using the following formula, messages are aggregated in the node's neighbors and attention heads to obtain the node representation under relation r:

[0030]

[0031] In the formula, Let C be the hidden layer representation of node i under relation r at layer l, and C be the total number of attention heads.

[0032] Then, a gating mechanism is used to smooth the learning of the obtained results, first calculating the numerical level of the gate. Then represent the learned nodes and input Using gate operations, the specific calculation process is as follows:

[0033]

[0034]

[0035] In the formula, [·,·] represents vector concatenation operation, W A b A The model can learnable parameters are represented by ⊙, which is the symbol for the Hadamard product. The representation vector learned by node i under relation r at the l-th level.

[0036] Furthermore, in the process of cross-relation aggregation node representation based on semantic attention network, the importance of each relation is first obtained from the perspective of all nodes globally. The importance of each relation is normalized using the softmax function, and the calculation method is shown in the following formula:

[0037]

[0038]

[0039] In the formula, Let r represent the weight of relation r in the d-th attention head, and V be the set of nodes in the heterogeneous information network; This represents the semantic attention vector of the d-th attention head in the l-th layer; and These are learnable parameters in semantic attention networks. This represents the weight of the normalized relation r in the d-th attention head;

[0040] Subsequently, the node representations under different relation subgraphs are aggregated using the calculated relation weights. The aggregation process is as follows:

[0041]

[0042] In the formula, The node representing the l-th layer indicates the result. This represents the result of the relational graph transformers, where D is the number of attention heads in the semantic attention network.

[0043] Furthermore, in the process of detecting social media bot accounts based on graph neural networks, each layer of the graph neural network in the model includes a relational graph transformer and a semantic attention network. After passing through L layers of graph neural networks, the final node representation x is obtained. () Account classification prediction is performed on social media user nodes, and the social robot detection task is further optimized by supervised tweet stance detection and unsupervised community similarity detection of tweet nodes, topic nodes, and community nodes.

[0044] Furthermore, firstly, an output layer and a softmax layer are used for social media bot account detection and classification. The model's prediction result for user i is calculated as follows:

[0045]

[0046] In the formula, For the final representation of all user nodes, W and b are all learnable parameters of the model; this module is trained using supervised user annotations, and the loss function is as follows:

[0047]

[0048] In the formula, Y represents the labeled set of social media users, y i For labeling;

[0049] Then, considering that social bots may post tweets that contradict mainstream opinions and are emotionally charged and extreme in order to provoke social network users on specific events in order to interfere with public opinion; we use the tweet nodes and topic event nodes in the graph to detect stances; and we target specific stance c and tweet node x. i ,x j Linear transformations are used to map node representations to a position-sensitive linear space. and Where α c and β cLet c be a learnable linear mapping function for position c, and optimize it using the following loss function:

[0050]

[0051] In the formula, y i,j,c This is an indicator function; if the position of nodes i and j is c, the value is 1, otherwise it is 0.

[0052] Next, based on the similarity of nodes of the same type within adjacent communities, community similarity is measured through contrastive learning. Positive samples are adjacent nodes of the same type, while negative samples are multi-hop or different types of samples. By optimizing the loss function to bring in positive samples and push away negative samples, unsupervised community similarity measurement is achieved. The loss function used is as follows:

[0053]

[0054] In the formula, P i and N i These are the positive and negative sample sets for node i, respectively, where Q is a hyperparameter and σ(·) is the sigmoid function.

[0055] Ultimately, the loss function used in the social robot detection model is as follows:

[0056]

[0057] In the formula, λ1 and λ2 are the hyperparameters of the control position detection module and the community similarity measurement module, θ is the trainable parameter in all models, and λ is a hyperparameter representing the weight of the regularization term.

[0058] Furthermore, a social media bot detection system based on dynamic heterogeneous graphs includes:

[0059] The social media modeling module is used to model social media based on heterogeneous information networks. It treats social media users, tweets, topics, and communities as network nodes, and the different types of interaction relationships between nodes as heterogeneous edges in the network. It uses feature engineering and pre-trained language models to encode user information and social media text information, and splices timestamp information to obtain the initial features of each node in the graph neural network.

[0060] The node computation module is used to obtain attention values ​​between different nodes through the initial features of each node to model the heterogeneity of relationships. It uses a relationship graph transformer to model the heterogeneity of social media relationships and influence, preserves dynamic dependency information through relative time encoding, and computes node representations.

[0061] The node representation module is used to aggregate node representations across relationships based on semantic attention networks, preserving relationships arising from the heterogeneity of social media.

[0062] The prediction module is used to obtain the final node representation through several layers of graph neural network. It predicts social media bot accounts for user type nodes through the output layer and softmax layer, and optimizes the detection by using stance detection and community similarity measurement.

[0063] Furthermore, a computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements steps such as those of a social media bot detection method based on dynamic heterogeneous graphs.

[0064] Furthermore, a computer-readable storage medium stores a computer program that, when executed by a processor, implements steps such as those of a social media bot detection method based on dynamic heterogeneous graphs.

[0065] Compared with the prior art, the present invention has the following technical effects:

[0066] The purpose of this invention is to provide a social media bot detection method based on dynamic heterogeneous graphs. By constructing a dynamic heterogeneous information network of social media, this method utilizes a relationship graph transformer to model the heterogeneity of relationships and influence in social media, and aggregates nodes across relationships using a semantic attention network to achieve social media bot account detection and classification. Furthermore, the social media bot detection task can be optimized through stance detection and community similarity measurement. This invention proposes that the social media bot account detection algorithm should model the widespread heterogeneity of relationships and influence in social media and consider the dynamic characteristics of social networks to enhance the robustness of the algorithm. Simultaneously, the heterogeneity-aware social media bot account detection framework based on a dynamic graph structure can learn high-quality social media user representations, giving this invention significant advantages over other social media bot detection methods. Attached Figure Description

[0067] Figure 1 This is a flowchart of the social robot detection method based on dynamic heterogeneous information networks of the present invention.

[0068] Figure 2 This is a schematic diagram illustrating the heterogeneity of relationships and influence in social media as described in this invention. Detailed Implementation

[0069] The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples. It should be noted that the embodiments described herein are only for explaining the present invention and are not intended to limit the present invention. Furthermore, the technical features involved in the embodiments of the present invention can be combined with each other unless otherwise specified.

[0070] The specific implementation process of this invention includes social media modeling based on heterogeneous information networks, social media influence heterogeneity modeling based on relational graph transformers, cross-relational aggregation nodes based on semantic attention networks, and social media bot account detection based on graph neural networks.

[0071] The purpose of this invention is to provide a social media bot detection method based on dynamic heterogeneous graphs. By constructing a dynamic social media heterogeneous information network, it utilizes a relationship graph transformer to model the heterogeneity of relationships and influence on social media, and aggregates nodes across relationships based on a semantic attention network to achieve social media bot account detection and classification. At the same time, it can optimize the social bot detection task through stance detection and community similarity measurement.

[0072] First, social media is modeled based on heterogeneous information networks. Social media users, tweets, topics and communities are used as network nodes, and different types of interaction relationships between nodes are used as heterogeneous edges in the network. Feature engineering and pre-trained language models are used to encode user information and social media text information, and timestamp information is spliced ​​together to calculate the initial features of each node in the graph neural network.

[0073] Then, a relational graph transformer is used to model the heterogeneity of social media relationships and influence, retaining dynamic dependency information through relative time encoding and computing node representations;

[0074] Next, based on the cross-relationship aggregation node representation of the semantic attention network, the diverse relationships brought about by the heterogeneity of social media are preserved;

[0075] Finally, the final node representation is obtained through several layers of graph neural network. The social media bot account is predicted for user type nodes through the output layer and softmax layer. At the same time, the detection is further optimized by using position detection and community similarity measurement.

[0076] Figure 1 This is a flowchart of the social robot detection method based on dynamic heterogeneous information networks of the present invention.

[0077] Social Media Modeling Based on Dynamic Heterogeneous Information Networks

[0078] This invention uses social media users and their tweets, topics they participate in, and communities they join as heterogeneous nodes v in the network, and the different types of interactions between users, tweets, topics, and communities as heterogeneous edges e in the network, with a set R. V ,R E Representing the types of nodes and relationships in a heterogeneous information network, respectively, using... Let ψ represent the mapping functions from nodes and edges to their corresponding types, and let time function t be used to mark each node with a timestamp. The social media dynamic heterogeneous graph network G proposed in this invention is defined as follows:

[0079]

[0080] Subsequently, this invention utilizes feature engineering to encode node metadata and employs a pre-trained language model to encode semantic data such as user information and social media text information. The resulting encoded representations are then concatenated with timestamp information to obtain the feature vector x for each node. i Using a fully connected neural network to process the feature vector x of node i i The transformation is performed, and the result is used as the initial feature x of the points in the graph neural network. i (0) The calculation formula is as follows:

[0081] x i (0) =σ(W I ·x i +b I )

[0082] In the formula, W I b I σ represents the learnable parameters of the model, and σ represents the nonlinear activation function. Different types of nodes represent the use of different linear mapping functions.

[0083] 2. Heterogeneous Modeling of Social Media Influence Based on Relationship Graph Transformer

[0084] This invention proposes a graph neural network structure that includes transformers and operates in a heterogeneous information network to model the heterogeneity of relationships and influence in social media, while also considering the relational dependencies of dynamic graphs.

[0085] This invention first calculates the query, key, and value values ​​corresponding to the c-th attention head under relation r and node i, using the following formula:

[0086]

[0087] In the formula, q, k, v are the query, key, and value values ​​in the attention mechanism, (l) represents the l-th layer graph neural network, and all W and b are learnable parameters in the model for different relations and attention heads.

[0088] Subsequently, this invention models the heterogeneity of relationships by calculating the attention values ​​between different nodes, as shown below:

[0089]

[0090] In the formula, This represents the attention weight between nodes i and j. Let d denote the exponential dot product function, where d is the hidden layer dimension of each attention head, and N is the number of attention heads. r (i) represents the set of neighbors of node i in relation r.

[0091] Subsequently, to preserve the dependencies generated by nodes in different time periods, this invention utilizes Relative Time Encoding (RTE) to model the time difference relationship between node neighbors and the node itself, and then adds this time difference to the original value of node j to perform time augmentation. The specific calculation formula is as follows:

[0092] ΔT(i,j)=|t i -t j |

[0093]

[0094]

[0095] RTE(ΔT(i,j))=W T ·Base(ΔT(i,j))+B T

[0096]

[0097] In the formula, t i W represents the timestamp corresponding to node i. T B T These are the parameters for linear transformation;

[0098] Next, the present invention uses the following formula to aggregate messages in the node's neighbors and attention heads to obtain the node representation under relation r:

[0099]

[0100] In the formula, Let C represent the hidden layer representation of node i under relation r at layer l, and let C be the total number of attention heads.

[0101] Then, the present invention uses a gating mechanism to learn the smooth representation of the obtained results, first calculating the numerical level of the gate. Then represent the learned nodes and input Using gate operations, the specific calculation process is as follows:

[0102]

[0103]

[0104] In the formula, [·,·] represents vector concatenation operation, W A b A The model can learnable parameters are represented by ⊙, which is the Hadamard product operator. The representation vector learned by node i under relation r at the l-th level.

[0105] Based on this graph-related transformer architecture, node representations can be obtained to model the heterogeneity of relationships and influence in social media.

[0106] Relationship heterogeneity and influence heterogeneity in social media, such as Figure 2 As shown.

[0107] 3. Cross-relational aggregation nodes based on semantic attention networks

[0108] This invention first obtains the importance weights of each relationship from the perspective of all nodes globally. The importance of each relation is normalized using the softmax function, and the calculation method is shown in the following formula:

[0109]

[0110]

[0111] In the formula, Let r represent the weight of relation r in the d-th attention head, and V be the set of nodes in the heterogeneous information network. This represents the semantic attention vector of the d-th attention head in the l-th layer. and These are learnable parameters in semantic attention networks. This represents the weight of the normalized relation r in the d-th attention head.

[0112] Subsequently, this invention aggregates node representations under different relation subgraphs using the calculated relation weights. The aggregation process is shown in the following equation:

[0113]

[0114] In the formula, The node representing the l-th layer indicates the result. This represents the result of the relational graph transformers, where D is the number of attention heads in the semantic attention network.

[0115] Therefore, this invention realizes the aggregation of node information across users and across relational subgraphs, preserves the diverse relationships brought about by the heterogeneity of social media, and achieves dynamic processing of the intrinsic heterogeneity of social media.

[0116] 4. Social Media Bot Account Detection Based on Graph Neural Networks

[0117] The graph neural network in the model proposed in this invention includes a relational graph transformer and a semantic attention network in each layer. After passing through L layers of graph neural networks, the final node representation x is obtained. () Subsequently, this invention performs account classification prediction on social media user nodes, and further optimizes the social robot detection task by performing supervised tweet stance detection and unsupervised community similarity detection on tweet nodes, topic nodes, and community nodes.

[0118] This invention utilizes an output layer and a softmax layer for social media bot account detection and classification. The prediction result for user i is calculated using the following formula:

[0119]

[0120] In the formula, For the final representation of all social media user nodes, W and b are all learnable parameters of the model; this module is trained with supervised user annotations, and the loss function is as follows:

[0121]

[0122] In the formula, Y represents the labeled set of social media users, y i For labeling.

[0123] Furthermore, considering that social bots may publish tweets that deviate from the mainstream and are emotionally charged and extreme in order to manipulate public opinion and incite social network users on specific events, we use the tweet nodes and topic event nodes in the graph to detect stances; for specific stance c and tweet node x i ,x j Linear transformations are used to map node representations to a position-sensitive linear space. and Where α c and β c Let c be a learnable linear mapping function for position c, and optimize it using the following loss function:

[0124]

[0125] In the formula, y i,j,c This is an indicator function; if the position of nodes i and j is c, the value is 1, otherwise it is 0.

[0126] Furthermore, considering that social bots will follow each other to create confusion in order to effectively spread large amounts of the same harmful information or evade general feature-based monitoring models; at the same time, users will also follow users similar to themselves; and based on the similarity of nodes of the same type within adjacent communities, community similarity is measured through comparative learning. Positive samples are adjacent, same-type nodes, while negative samples are multi-hop or different-type samples. By optimizing the loss function to bring in positive samples and push away negative samples, unsupervised community similarity measurement is achieved. The loss function used is as follows:

[0127]

[0128] In the formula, P i and N i These are the positive and negative sample sets for node i, respectively, where Q is a hyperparameter and σ(·) is the sigmoid function.

[0129] Ultimately, the loss function used in the social robot detection model is as follows:

[0130]

[0131] In the formula, λ1 and λ2 are the hyperparameters of the control position detection module and the community similarity measurement module, θ is the trainable parameter in all models, and λ is a hyperparameter representing the weight of the regularization term.

[0132] The hyperparameter settings of the graph neural network model used in this invention are shown in Table 3.

[0133]

[0134] In another embodiment of the present invention, a social media bot detection system based on dynamic heterogeneous graphs is provided, which can be used to implement the above-mentioned social media bot detection method based on dynamic heterogeneous graphs. Specifically, the system includes:

[0135] The social media modeling module is used to model social media based on heterogeneous information networks. It treats social media users, tweets, topics, and communities as network nodes, and the different types of interaction relationships between nodes as heterogeneous edges in the network. It uses feature engineering and pre-trained language models to encode user information and social media text information, and splices timestamp information to obtain the initial features of each node in the graph neural network.

[0136] The node computation module is used to obtain attention values ​​between different nodes through the initial features of each node to model the heterogeneity of relationships. It uses a relationship graph transformer to model the heterogeneity of social media relationships and influence, preserves dynamic dependency information through relative time encoding, and computes node representations.

[0137] The node representation module is used to aggregate node representations across relationships based on semantic attention networks, preserving relationships arising from the heterogeneity of social media.

[0138] The prediction module is used to obtain the final node representation through several layers of graph neural network. It predicts social media bot accounts for user type nodes through the output layer and softmax layer, and optimizes the detection by using stance detection and community similarity measurement.

[0139] The module division in this embodiment of the invention is illustrative and represents only one logical functional division. In actual implementation, other division methods may be used. Furthermore, the functional modules in the various embodiments of the invention can be integrated into a single processor, exist as separate physical entities, or be integrated into a single module. The integrated modules described above can be implemented in hardware or as software functional modules.

[0140] In another embodiment of the present invention, a computer device is provided, comprising a processor and a memory. The memory stores a computer program, which includes program instructions. The processor executes the program instructions stored in the computer storage medium. The processor may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. It is the computing and control core of the terminal, suitable for implementing one or more instructions, specifically suitable for loading and executing one or more instructions in the computer storage medium to achieve a corresponding method flow or corresponding function. The processor described in this embodiment of the present invention can be used for the operation of a social media robot detection method based on dynamic heterogeneous graphs.

[0141] In another embodiment of the present invention, a storage medium is provided, specifically a computer-readable storage medium (Memory), which is a memory device in a computer device used to store programs and data. It is understood that the computer-readable storage medium here can include both the built-in storage medium in the computer device and extended storage media supported by the computer device. The computer-readable storage medium provides storage space that stores the operating system of the terminal. Furthermore, the storage space also stores one or more instructions suitable for loading and execution by a processor, which can be one or more computer programs (including program code). It should be noted that the computer-readable storage medium here can be high-speed RAM or non-volatile memory, such as at least one disk storage device. The processor can load and execute one or more instructions stored in the computer-readable storage medium to implement the corresponding steps of the social media bot detection method based on dynamic heterogeneous graphs in the above embodiments.

[0142] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0143] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0144] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0145] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0146] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the scope of protection of the claims of the present invention.

Claims

1. A social media bot detection method based on dynamic heterogeneous graphs, characterized in that, include: Based on heterogeneous information network modeling of social media, this paper takes users, tweets, topics and communities of social media as network nodes, and takes different types of interaction relationships between nodes as heterogeneous edges in the network. It uses feature engineering and pre-trained language models to encode user information and social media text information, and splices timestamp information to obtain the initial features of each node in the graph neural network. A relational graph transformer is used to model the heterogeneity of social media relationships and influence. The attention head is calculated based on the initial features of each node, and the query, key, and value values ​​under different relationships and node pairs are calculated. The attention values ​​between different nodes are calculated based on the query and key to model the heterogeneity of relationships. Dynamic dependency information is preserved through relative time encoding. The attention values ​​and neighbor node values ​​are aggregated to obtain the node representation under a specific relationship. Based on the semantic attention network, the node representations are aggregated across relations to obtain the node representations after different relations of the aggregated nodes, so as to preserve the relations brought about by the heterogeneity of social media; The final node representation is obtained through several layers of graph neural network. The social media bot account is predicted for user type nodes through the output layer and softmax layer. At the same time, the detection is optimized by using position detection and community similarity measurement.

2. The social media bot detection method based on dynamic heterogeneous graphs according to claim 1, characterized in that, In modeling social media based on heterogeneous information networks, users, tweets, topics, and communities are considered as nodes exhibiting network heterogeneity. Different types of interactions between users, tweets, topics, and communities are considered as heterogeneous edges in the network. , in sets Representing the types of nodes and relationships in a heterogeneous information network, respectively, using... These represent mapping functions for nodes and edges to their corresponding types, respectively, and are expressed using time functions. Each node is timestamped, creating a heterogeneous graph network of social media dynamics. The definition is as follows: 。 3. The social media bot detection method based on dynamic heterogeneous graphs according to claim 2, characterized in that, Feature engineering is used to encode node metadata, and a pre-trained language model is used to encode user information and semantic data of social media text information. The resulting encoded representations are then concatenated with timestamp information to obtain the feature vectors of each node. Using fully connected neural networks to The transformation is performed, and the result is used as the initial features of the nodes in the graph neural network. The calculation formula is as follows: In the formula, , The representative model can learn parameters. This represents a non-linear activation function; different types of nodes represent different linear mapping functions.

4. The social media bot detection method based on dynamic heterogeneous graphs according to claim 1, characterized in that, In the process of heterogeneous modeling of social media influence using graph transformers, a graph neural network structure containing transformers and operating in a heterogeneous information network is proposed; firstly, the first... A person's attention is focused on relationships and nodes The formula for calculating the corresponding query, key, and value is as follows: In the formula, For the query, key, and value in the attention mechanism, Indicates the first Layered graph neural networks, all of them and These are the learnable parameters in the model for different relationships and attention heads; Subsequently, the heterogeneity of attention relationships between different nodes is modeled by calculation, as shown below: In the formula, Representative node and Attention weights between them Represents the exponential dot product function. For each attention head, the hidden layer dimension, Represents a node In relationship The set of neighbors; Next, the relative time encoding (RTE) is used to model the time difference relationship between the node's neighbors and the node itself, and this time difference is added to the original value of node j to perform time augmentation. The specific calculation formula is as follows: In the formula, This represents the timestamp corresponding to node i. These are the parameters for linear transformation; Next, using the following formula, messages are aggregated in the node's neighbors and attention heads to obtain the relationship. The following node representation: In the formula, Representative node In the Hierarchical Relationship The hidden layer below represents, For the total number of attention heads; Then, a gating mechanism is used to smooth the learning of the obtained results, first calculating the numerical level of the gate. Then, the learned node representations and input Using gate operations, the specific calculation process is as follows: In the formula, This represents a vector concatenation operation. , The representative model can learn parameters. The symbol for the Hadama product is used. Representative node In the Hierarchical Relationship The vector obtained from the learning process.

5. The social media bot detection method based on dynamic heterogeneous graphs according to claim 4, characterized in that, In the process of cross-relation aggregation node representation based on semantic attention network, the importance of each relation is first obtained from the perspective of all nodes globally. The importance of each relation is normalized using the softmax function, and the calculation method is shown in the following formula: In the formula, Representing relations In the The weights in each attention head A set of nodes in a heterogeneous information network; Representing the The first attention point in the Semantic attention vectors in the layer; and These are learnable parameters in semantic attention networks. Represents the normalized relation In the Weights in each attention head; Subsequently, the calculated relation weights are used to aggregate the node representations under different relation subgraphs. The aggregation process is as follows: In the formula, Representing node i The nodes of the layer represent the results. This represents the result of relational graph transformers. This represents the number of attention heads in the semantic attention network.

6. The social media bot detection method based on dynamic heterogeneous graphs according to claim 5, characterized in that, In the process of detecting social media bot accounts based on graph neural networks, each layer of the graph neural network in the model includes a relational graph transformer and a semantic attention network. After passing through L layers of graph neural networks, the final node representation is obtained. ; Account classification prediction is performed on social media user nodes, and the social robot detection task is further optimized by supervised tweet stance detection and unsupervised community similarity detection for tweet nodes, topic nodes, and community nodes.

7. The social media bot detection method based on dynamic heterogeneous graphs according to claim 6, characterized in that, First, a social media bot account detection and classification model is performed using an output layer and a softmax layer. The model then analyzes user... The prediction results are calculated as follows: In the formula, For the final representation of user node i, all and These are the learnable parameters of the model; this module uses supervised user-annotated training, and the loss function is shown below: In the formula, Y represents the labeled set of social media users. For labeling; Then, considering that social bots may post tweets that deviate from the mainstream and are emotionally charged and extreme in order to manipulate public opinion and incite social network users on specific events, we use the tweet nodes and topic event nodes in the graph to detect stances; targeting specific stances... and tweet nodes Linear transformations are used to map node representations to a position-sensitive linear space. and ,in and Each is aimed at a position The learnable linear mapping function is optimized using the following loss function: In the formula, For the indicator function, if the user node The position is The value is 1 if it is 1, otherwise it is 0. Next, based on the similarity of nodes of the same type within adjacent communities, community similarity is measured through contrastive learning. Positive samples are adjacent nodes of the same type, while negative samples are multi-hop or different types of samples. By optimizing the loss function to bring in positive samples and push away negative samples, unsupervised community similarity measurement is achieved. The loss function used is as follows: In the formula, and These are for nodes The positive sample set and the negative sample set, For hyperparameters, It is the sigmoid function; Ultimately, the loss function used in the social robot detection model is as follows: In the formula, and To control the hyperparameters of the stance detection module and the community similarity measurement module, For trainable parameters in all models, It is a hyperparameter representing the weight of the regularization term.

8. A social media bot detection system based on dynamic heterogeneous graphs, characterized in that, include: The social media modeling module is used to model social media based on heterogeneous information networks. It treats social media users, tweets, topics, and communities as network nodes, and the different types of interaction relationships between nodes as heterogeneous edges in the network. It uses feature engineering and pre-trained language models to encode user information and social media text information, and splices timestamp information to obtain the initial features of each node in the graph neural network. The node computation module is used to obtain attention values ​​between different nodes through the initial features of each node to model the heterogeneity of relationships. It uses a relationship graph transformer to model the heterogeneity of social media relationships and influence, preserves dynamic dependency information through relative time encoding, and computes node representations. The node representation module is used to aggregate node representations across relationships based on semantic attention networks, preserving relationships arising from the heterogeneity of social media. The prediction module is used to obtain the final node representation through several layers of graph neural network. It predicts social media bot accounts for user type nodes through the output layer and softmax layer, and optimizes the detection by using stance detection and community similarity measurement.

9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the social media bot detection method based on dynamic heterogeneous graphs as described in any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the social media bot detection method based on dynamic heterogeneous graphs as described in any one of claims 1 to 7.