A social robot swarm detection method based on high-density subgraph detection
By constructing a relation tensor for social networks and utilizing a high-density subgraph mining method, the problem of difficulty in detecting collaborative social robot groups in existing technologies is solved, achieving efficient identification of social robot groups and observation of abnormal behavior.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIHANG UNIV
- Filing Date
- 2023-08-24
- Publication Date
- 2026-06-26
AI Technical Summary
Existing social bot detection methods struggle to effectively detect collaborative social bot groups, especially in social networks. A single detection method may fail to distinguish between bots and real users, and existing methods are ineffective in detecting complex collaborative behaviors.
We construct relationship tensors for social networks, calculate suspicion scores using user interaction data through high-density subgraph mining, detect social bot groups, and design tensor construction and high-density subgraph mining modules to mine high-density subgraphs to identify abnormal collaborative behavior.
It can effectively detect dense social robot groups in social networks, observe abnormal collaborative behavior in the short term, and improve the accuracy and efficiency of detection.
Smart Images

Figure CN117194809B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of social network technology, and in particular to a method for detecting social robot groups based on high-density subgraph detection. Background Technology
[0002] Social bots are automated social media accounts managed by software. Some bots can play a positive role, such as automatically replying and providing emergency assistance, but others are used for malicious purposes, such as spreading fake news or rumors. Therefore, it is essential for social media platform systems to detect the presence of social bots on their social networks to reduce the likelihood of the platform being attacked or suffering financial losses.
[0003] With the continuous improvement of automation technology and deception methods, social bots have become increasingly adept at mimicking the behavior of real users, making the difference between them and human operators increasingly subtle. Furthermore, to reduce the risk of detection, these bots tend to engage in malicious behavior in a collaborative manner and mimic interactions between normal users (such as liking, commenting, and sharing). This makes it possible for a single detection method to fail to detect these social bots and mistakenly identify them as genuine users. Existing methods for detecting social bots mainly include feature engineering-based methods, machine learning-based methods, and graph-based methods. Feature engineering-based methods transform raw data into more representative and discriminative feature vectors by mining and extracting key features from the data, thereby identifying anomalous users. However, this method relies heavily on domain knowledge and requires extensive data preprocessing. Machine learning-based methods can address the issues of scalability and high overhead, but they are heavily dependent on high-quality labeled data. Graph-based methods model the structure of social networks and can measure the connectivity and influence of social media accounts. These methods employ graph theory algorithms and can not only detect social bots but also predict victims and identify trustworthy users. However, these methods may have high overhead and may not be practical for real-world networks, performing well only on relatively small or synthetic datasets. It should be noted that the methods mentioned above often fail to achieve ideal results when detecting cooperative groups of robots, because these robots may exhibit complex cooperative behaviors, making detection more difficult.
[0004] To address the problem that existing research cannot detect collaborative social robot groups, this invention proposes to construct a relation tensor for social networks, fully utilize the interaction relationships in social networks, and use a high-density subgraph mining method to find high-density subgraphs with high suspicion, thereby identifying social robot groups. Summary of the Invention
[0005] To address this, this invention first proposes a social robot group detection method based on high-density subgraph detection. The method involves first inputting the social network graph content into a tensor construction module, transforming the social relationship network into a relation tensor, extracting text content features to calculate repetition rate and similarity, then calculating a suspicion score, and finally calculating high-density subgraph information. This yields a tensor constructed based on the social network, which is then input into a high-density subgraph mining module. The high-density subgraph mining module extracts the source user groups from the high-density subgraph, ultimately detecting highly dense social robot groups within the social network and observing abnormal collaborative behavior among these groups over a short period.
[0006] The social network graph content includes user account information, like relationship data, and comment and repost data containing text content. Each user is represented by a unique identifier, and the like, repost, and comment relationships can be represented by directed edges and weights.
[0007] The tensor construction module is specifically configured as follows: the input is the content of a social network graph, and the social network is transformed into a relation tensor R, where R(A1,A2,…,A…). n (X) represents an attribute A1, A2, ..., A with n dimensions. n The relation tensor R has six dimensions (source, target, date, like, comment, share) and a non-metric attribute (score) to represent the source user, target user, date, number of likes, number of comments, number of shares, and suspicion score, respectively. For a directed social relationship in a social network, a tuple is added to R to record the relationship between the source user and the target user, and the time and number of operations for each type of relationship are accumulated. Then, the suspicion score (Score) of the tuple is calculated.
[0008] The suspiciousness score uses three weight values: user weight w u The weight of an account is determined by the completeness of the user's information; the less complete the account's profile, the higher the weight. Interaction weight is also considered. r The interaction weight is derived from the frequency of interaction between the source user and the target user, and is obtained by weighting and summing the number of likes, comments, and shares; the text weight w t The suspiciousness score is derived from comments and reposts posted by the original user. A pre-trained natural language processing model is used to calculate text similarity and repetition rate as weights. The final suspiciousness score is obtained by multiplying these three weights together, as shown in the following formula:
[0009] Score = w u ×w r ×w t
[0010] The user sets the relevant parameters for the high-density subgraph mining module, including the number of high-density subgraphs to be found (k), the subgraph density coefficient (λ), and the size range of the high-density subgraphs (S). min and S max k determines the number of high-density subgraphs found by the high-density subgraph mining module, which affects the final output number of social robot groups. The subgraph density coefficient λ can control the density of the high-density subgraphs mined, and can observe some cooperative behaviors among the detected robot groups.
[0011] In the high-density subgraph mining module, the size range of the high-density subgraph specifies the upper and lower bounds of its size, ensuring that the module finds neither too many nor too few nodes in the high-density subgraph. The subgraph density is required to be no less than a lower bound ρ related to λ. th (λ), the formula is as follows, where m is the number of source users in the subgraph and n is the number of target users in the subgraph.
[0012]
[0013] The high-density subgraph mining module takes as input a relation tensor R constructed by the tensor construction module and outputs a social robot community. Specifically:
[0014] Let R(A1,A2,…,A) n (X) is a set of n-dimensional attributes A1, A2, ..., A n The relationship between a nonnegative metric attribute X and a relation R can be represented as an n-directional tensor, with a subgraph of R denoted as B. n For R n A subset of B, the mass of which is denoted as M B The size is defined as S B In B, A n The range of values is denoted as B. n B n One of the values can be denoted as a n a n ∈B n For a subgraph of B, if its nth dimension A n The value of each is a n Such a subgraph is denoted as B(a). n );
[0015] The high-density subgraph mining module comprises two loops. The outer loop executes k times, where k is a hyperparameter defined as the number of times the outer loop of the high-density subgraph mining module is executed. Each iteration yields a high-density subgraph mined from the social network. The inner loop stops executing when the subgraph size falls below a certain threshold. In each iteration, a portion of the subgraphs are deleted from the tensor, thereby increasing the suspicion of the remaining subgraphs. If the remaining subgraphs meet the constraints, they are saved to a snapshot list.
[0016] The outer loop includes the following steps: First, copy the relation tensor R to obtain B. Then, build and maintain a min-heap H for each dimension of B. n This min-heap maintains B. n The sequence, sorted by key values is Therefore H n The top value is to make The smallest attribute value a n Then the inner loop is executed. After the inner loop finishes executing, the outer loop calculates the suspicion level ρ for each high-density sub-image B' in the snapshot list. susp (B',R) yields the high-density subgraph B with the highest suspicion level. * Then delete B from R. * This molecular diagram completes one cycle, then restarts, until k cycles have been executed.
[0017] The inner loop includes the following steps: First, calculate the size S of B. B Subgraph density ρ B and the lower bound ρ of the subgraph density constrained by λ th (λ), then compare, if S B ≤S max And ρ B >ρ th If (λ) is found, a snapshot of B is saved to the snapshot list; otherwise, it is not saved. Then, a snapshot of B is generated from the min-heap H. n The top values are denoted as a'1, a'2, ..., a' n For each a' i Calculate the suspicion ρ of the remaining subgraph after removing tuples containing this attribute from B. susp (BB(a′ i ),R), and find a' that maximizes the degree of suspicion. i , recorded as Next, delete the subgraph containing that attribute value from B. and from the min-heap H n Delete This completes one inner loop. Now, recalculate the size S of B. B If the size of B is S B ≥S min If the current block is smaller than the preset lower bound, the loop restarts. Otherwise, it means the current block is smaller than the preset lower bound, and it is impossible to continue mining the high-density subgraph, so the loop ends.
[0018] The technical effects to be achieved by this invention are as follows:
[0019] This invention utilizes a designed high-density subgraph mining algorithm to detect highly dense social bot groups in social networks and observe abnormal collaborative behavior among these groups over short periods. The method proposed in this invention has the following characteristics:
[0020] 1. This invention designs a framework for social robot group detection that includes two sub-modules. It makes full use of user interaction behavior on social media, calculates suspicion scores based on user social behavior data, constructs a relation tensor, and then mines high-density subgraphs with high suspicion that may have abnormal behavior patterns, such as close collaboration between robots, to discover social robot groups.
[0021] 2. This invention designs a framework for social robot group detection that includes two modules. The tensor construction module constructs relationship tensors from social networks and designs a suspiciousness index related to social robot group collaboration. The high-density subgraph mining module, based on the suspiciousness index and combined with social relationship data between users, iteratively filters high-density subgraphs with suspicious interactive behaviors to detect social robot groups with abnormal collaborative behavior in the social network in a short period of time. Attached Figure Description
[0022] Figure 1 Architecture of a social robot group detection method based on high-density subgraph detection; Detailed Implementation
[0023] The following are preferred embodiments of the present invention, which are described in conjunction with the accompanying drawings. However, the present invention is not limited to these embodiments.
[0024] This invention proposes a method for detecting social robot groups based on high-density subgraph detection.
[0025] To reduce the risk of detection, social bots tend to engage in malicious behavior in a collaborative manner and mimic interactions between normal users (such as liking, commenting, and sharing). This results in them leaving more traces of automation and coordinated synchronization, forming a high-density subgraph structure within social networks.
[0026] The two modules proposed in this invention will be illustrated with examples below.
[0027] The input to the tensor construction module is a social network, including user account information, like relationship data, and comment and repost data containing text content. Each user can be represented by a unique identifier, and like, repost, and comment relationships can be represented using directed edges, weights, etc. The output is a tensor constructed based on the social network. In this invention, tensors are designed to represent relationship data in a social network, including interactive information such as following, liking, reposting, and commenting between users. By organizing this relationship data into tensor form, computation and processing can be more convenient.
[0028] The tensor construction module accepts input data including user account information, like relationship data, and comment and repost data containing text content. Each user can be represented by a unique identifier, and the like, repost, and comment relationships can be represented using directed edges, weights, etc.
[0029] Next, this module transforms the social network into a relation tensor R. The relation tensor R is a multidimensional data structure used to store all relation information within the social network; the dimensions of R are shown in Table 1.
[0030] Table 1 Explanation of the dimensions of tensor construction
[0031]
[0032] Transform the social network into a relation tensor R. A relation tensor R is a multidimensional data structure used to store all relation information within the social network, R(A1, A2, ..., A...). n (X) represents an attribute A1, A2, ..., A with n dimensions. n The relation tensor R designed in this invention has six dimensional attributes: source, target, date, like, comment, and share, as well as a non-metric attribute: score, representing the source user, target user, date, number of likes, number of comments, number of shares, and suspicion score, respectively. For a directed social relationship in a social network, a tuple is added to R to record the relationship between the source user and the target user, and the time and number of operations for each type of relationship are accumulated.
[0033] For a directed social relationship, add a tuple t(Source, Target, Date, 0, 0, 0, 0, 0) to R. Source is the user identifier of the user who initiated the relationship, Target is the user identifier of the user the relationship points to, and Date is the date corresponding to the data. Increment the value of the corresponding social relationship dimension of t by 1. For example, if the relationship is "Alice liked Bob's post on May 29th", the corresponding tuple would be t("Alice","Bob","May 29th",1,0, 0, 0). If a directed social relationship already exists in R representing a social relationship between the source user and the target user on the same day, simply increment the value of the corresponding social relationship dimension of t by 1. After this process, the relationship tensor R is initially constructed.
[0034] Next, we need to calculate the Score for each tuple in R. The Score represents the degree of suspicion regarding the relationship between the source user and the target user. The following is one method for calculating the Score:
[0035] ① Calculate the source user weight w u The incompleteness of the source user's account information, calculated as 100 minus the completeness of the profile, is used as a base weight for a user. The closer the weight is to 100, the more the user behaves like a social bot, and therefore the higher the suspicion score is when subsequently measuring suspicious social bot groups.
[0036] ② Calculate the interaction weight w r Assign a weight to likes, comments, and shares, then sum the frequency of these three interactions using a weighted average. For example, the base weight for a comment is 4, for a share it's 2, and for both likes and follows it is 1.
[0037] ③ Calculate the text weight w t Using existing natural language processing models, we obtain the text content of comments and reposts from source users, calculate the repetition rate and similarity of the text, and obtain the text weight (e.g., 1 to 100). The higher the value, the more repetitive and monotonous the text content is, and the more it resembles the content posted by a social bot.
[0038] ④ Multiply the weights obtained from ①②③ together to get the tuple's suspicion score. That is, Score = w u ×w r ×w t .
[0039] The score calculated by the above process is stored in the Score dimension of the tensor, thus completing the transformation from social network to relation tensor.
[0040] Next, the tensor construction module allows users to configure parameters related to the high-density subgraph mining module. These parameters include the number of high-density subgraphs to be found, k, the subgraph density coefficient, λ, and the size range S of the high-density subgraphs. min and S max These parameters can be adjusted according to actual needs.
[0041] Subgraph density is an indicator used to measure the tightness of connections between nodes in a subgraph. The subgraph density coefficient λ defined here is to ensure that the density of the subgraph obtained by the mining algorithm is not too sparse. The subgraph density is calculated in the middle of the algorithm. If the subgraph density is not less than a theoretical value related to λ, the density of the subgraph is considered appropriate and can be retained.
[0042] At this point, all steps of the tensor construction module have been completed. Next is the high-density subgraph mining module.
[0043] In the high-density subgraph mining module, an algorithm needs to be executed iteratively to remove a portion of the subgraphs. First, the loop count is initialized to 0, and a snapshot list, used to store subgraphs that meet the constraints during the mining process, is initialized to empty. Then, the original relation tensor R is copied to obtain a replica B. To improve the algorithm's efficiency, a min-heap H needs to be built and maintained for each dimension of B. n The sorting rule for each min-heap is M(B n Smaller values are listed first.
[0044] Next, we proceed with the process of iteratively mining high-density subgraphs. First, we determine whether the size of the current subgraph B is less than the previously set upper bound S. max Meanwhile, the density ρ of the subgraph B Is it greater than the theoretical density ρ constrained by λ? th (λ). If this condition is met, the current subgraph can be saved to the snapshot list. The density of a subgraph is defined as the number of edges in the subgraph divided by the number of nodes in the subgraph. In subgraph B, let the range of the Source dimension be m = R. source |, The size of the value range for the Target dimension is n = |R Tar get |, the number of edges (i.e., the number of tuples) in the subgraph is e. Then we have the following formula:
[0045]
[0046]
[0047] Where λ is a decimal between [0,1]. The larger λ is, the stricter the theoretical subgraph density constraint is, and the greater the subgraph density of the resulting high-density subgraph.
[0048] Then, from the min-heap H of B in each dimension nThe top values are denoted as a'1, a'2, ..., a' n For each a' i Calculate the suspicion ρ of the remaining subgraph after removing tuples containing this attribute from B. susp (BB(a' i ),R), and find a' that maximizes the degree of suspicion. i , recorded as The formula for the suspiciousness of subgraph B in relation R is as follows:
[0049]
[0050] After the above traversal process, the attribute value that most significantly increases the suspicion level of the remaining subgraph after deletion was found. Remove the tuple containing the attribute value from B. Next, calculate the size S of B. B If S B ≥S min If it succeeds, the high-density subgraph mining process can continue. Otherwise, the high-density subgraph mining process ends.
[0051] After the above process of mining high-density subgraphs, the snapshot list contains a series of high-density subgraphs with sizes and densities within the specified limits. Next, for each subgraph B' in the snapshot list, its suspiciousness ρ is calculated. susp (B',R), and record the high-density subgraph B with the highest suspicion level. * Then delete this subgraph from R, i.e., R = RB. * This completes one round of the iterative algorithm, finding a high-density subgraph B. * The loop count is incremented by 1. If the loop count is still less than k, the next iteration of the loop algorithm is executed.
[0052] After executing the k-round iterative algorithm, the high-density subgraph mining module has obtained a total of k high-density subgraphs. The Source dimension of each high-density subgraph forms a source user group, which is the group of social bots with high suspicion. Therefore, by extracting the set of source user groups from these high-density subgraphs, the social bot group that this module is looking for is obtained.
Claims
1. A method for detecting social robot groups based on high-density subgraph detection, characterized in that: First, the social network graph content is input into the tensor construction module. By converting the social relationship network into a relationship tensor, text content features are extracted to calculate the repetition rate and similarity. Then, the suspicion score is calculated. Finally, the high-density subgraph information is calculated. This process yields the relationship tensor constructed based on the social network, which is then input into the high-density subgraph mining module. The high-density subgraph mining module inputs the relationship tensor to obtain the source user groups from the high-density subgraphs. Ultimately, it detects high-density social bot groups in the social network and observes and outputs bots exhibiting abnormal collaborative behavior among these groups in a short period. By inputting the relationship tensor into the high-density subgraph mining module, the source user groups are obtained from the high-density subgraphs. Combined with the social relationship data between users, high-density subgraphs with suspicious interaction behavior are iteratively filtered to obtain high-susceptibility social bot groups. The social network graph content includes user account information, like relationship data, and comment and forwarding data containing text content. Each user is represented by a unique identifier, and the like, forwarding, and comment relationships can be represented by directed edges and weights. The tensor construction module is specifically structured as follows: the input is the content of a social network graph, which is then transformed into relation tensors. , Indicates a kind of Dimensional attributes Nonnegative metric properties Relationship, relation tensor It has six dimensions: source, target, date, likes, comments, and shares, as well as a non-metric attribute: score. These represent the source user, target user, date, number of likes, number of comments, number of shares, and suspicion score, respectively. For a directed social relationship in a social network, the direction... Add a tuple to record the relationship between the source user and the target user, and accumulate the time and number of operations for various types of relationships. Then calculate the suspiciousness score of the tuple. The suspiciousness score uses three weight values: user weight. The weight of an account's weight comes from the completeness of the user's information; the less complete the account's profile, the higher the weight. Interaction weight is also important. The interaction weight is derived from the frequency of interaction between the source user and the target user, and is obtained by weighting and accumulating the number of likes, comments, and shares; text weight. The suspiciousness score is derived from comments and reposts posted by the original user. A pre-trained natural language processing model is used to calculate text similarity and repetition rate as weights. The final suspiciousness score is obtained by multiplying these three weights together, as shown in the following formula: And set the relevant parameters for the high-density sub-map mining module, including the number of high-density sub-maps to be found. Subgraph density coefficient High-density submap size range and In the high-density subgraph mining module, the size range of the high-density subgraph specifies the upper and lower bounds of its size, and the subgraph density is required to be no less than one level higher than the maximum density of the subgraph. Related subgraph density lower bound The formula is as follows, where Let n be the number of source users in the subgraph, and n be the number of target users in the subgraph. ; The high-density subgraph mining module is input as a relation tensor constructed by the tensor construction module. The output is a group of social robots, specifically: remember It has Dimensional attributes Nonnegative metric properties Relationship, relationship It can be represented as To a tensor, the subgraph of R is denoted as , for a subset of The quality is denoted as , size defined ,exist middle, The range is denoted as , One of the values can be denoted as , ,for A subgraph, if its th subgraph Dimensions Such a subgraph is denoted as ; The high-density sub-graph mining module includes two layers of loops; The outer loop includes the following steps: First, the relation tensor... Copy, get Then on Build and maintain a min-heap for each dimension. This min-heap maintains what The sequence, sorted by key values is ,therefore The top value is to make Minimum attribute value ; Then the inner loop is executed. After the inner loop finishes executing, the outer loop iterates through each high-density subgraph in the snapshot list. Calculate its suspiciousness The high-density subgraph with the highest suspicion level was obtained. , and then from Delete This molecular diagram completes one cycle, then restarts the process, until it is executed... The next loop; The inner loop includes the following steps: First calculate scale Subgraph density and the recipient constrained subgraph density lower bound Then compare, if and Save one copy. Snapshots are added to the snapshot list, otherwise they are not saved, and then retrieved from the min-heap. The top value is denoted as For each Calculate the suspiciousness of the remaining subgraph after removing tuples containing this attribute from B. And find the value that maximizes the suspicion level. , recorded as Then delete the subgraph containing that attribute value from B. and from min-heap Delete This completes one inner loop iteration; now we calculate again. scale If the size of B If the current block is smaller than the preset lower bound, the loop will restart; otherwise, it means that the current block is smaller than the preset lower bound and cannot continue to mine the high-density subgraph, so the loop ends.