A method and system for discovering social network communities based on interactive behavior
By extracting interactive behavior features and calculating behavior weights from social networks, and combining them with friend relationships, a modularity maximization algorithm is used to segment communities. This solves the problem of existing technologies failing to effectively utilize interactive behavior and improves the efficiency and accuracy of community discovery.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING UNIV OF TECH
- Filing Date
- 2022-07-22
- Publication Date
- 2026-06-30
Smart Images

Figure CN115239509B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer technology, and more specifically to a method and system for discovering social network communities based on interactive behavior. Background Technology
[0002] The development of Web 2.0 technology has led to the rapid rise of online social media. Online social networks such as Twitter, Facebook, Weibo, and Instagram have attracted a large number of users. People use social media to obtain information, make new friends, share their thoughts, and participate in entertainment and shopping activities. According to the DIGITAL 2021: GLOBAL OVERVIEW REPORT report, there are 4.2 billion social media users worldwide, an increase of 490 million in the past 12 months, representing a year-on-year growth of over 13%. This growth has accelerated significantly since the COVID-19 pandemic. These figures demonstrate that social media has become an increasingly important part of people's lives. Meanwhile, researchers from numerous fields, including computer scientists, physicists, sociologists, and economists, have focused their attention on social media research.
[0003] Among numerous research areas, identifying communities in networks has become a key focus. Communities allow us to discover groups of interacting individuals and the relationships between them. In social networks, a community is a group of users who share similar interests, consume similar content, or interact in various ways. The task of identifying communities in networks is called community discovery. Community discovery is crucial because it allows us to gain a deeper understanding of networks and potentially drives many practical applications, such as predicting unobserved or future connections between users, recommending friends that social network users might be interested in, and deploying targeted recommendations for specific user groups to achieve greater impact.
[0004] Over the past few decades, numerous community detection algorithms have been proposed, based on different ideas and technologies. From the perspective of the information used for segmentation, methods for community segmentation or detection in social networks can be mainly divided into three categories:
[0005] 1) Based on network structure, this category mainly uses the topological structure between network nodes, i.e., the friendship relationships between users. This type of method performs well in terms of algorithm complexity and result validity, and has already been put into practical use in many scenarios.
[0006] 2) User attribute-based segmentation: In social networks, user attribute information such as gender, age, region, and interests is relatively easy to obtain. This attribute information forms a second dimension in the social network representation, in addition to structural information. Some algorithms ignore the network's topological structure information and use node attribute information for community discovery.
[0007] 3) Community discovery methods combining structural and attribute information. Structural and attribute-based methods only utilize the network's organizational and node attributes, respectively, without using all the information within the network. Combining structural and attribute information enriches the knowledge of social network users and more clearly demonstrates the reasons for community formation.
[0008] None of the three methods mentioned above focus on user interactions, which provide a wealth of connection information. Summary of the Invention
[0009] To address the aforementioned technical problems in existing technologies, this invention provides a method and system for discovering social network communities based on interactive behavior, which discovers communities based on users' friend relationships and interactive behaviors.
[0010] This invention discloses a method for discovering social network communities based on interactive behavior. The method includes: extracting behavioral features based on the interactive behavior between users in the social network; obtaining behavioral weights between users based on the behavioral features; obtaining edge weights between users based on the quantified value of the friendship relationship between users and the behavioral weights; obtaining the modularity of the community based on the edge weights between users; and dividing the social network communities based on the maximum modularity algorithm.
[0011] Preferably, the method for obtaining the behavior weight between two users includes:
[0012] Based on the number of interactions and the interval between interactions between the first user and the second user, the probability of interaction behavior between the first user and the second user is obtained;
[0013] Based on the number of interactions between the first user and the second user, and the number of interactions between the first user and the third user, the interaction divergence of the first user relative to the second user is obtained;
[0014] Based on the probability of the interaction behavior and the interaction divergence, the interaction index is obtained;
[0015] Behavioral weights are obtained using logistic regression and the interaction index.
[0016] Preferred algorithms for maximizing modularity include:
[0017] Try adding a node or first community to one or more second communities of neighboring nodes;
[0018] Determine whether the modularity benefit of the second community increases or exceeds the first threshold;
[0019] If so, merge the node or the first community into the second community with the highest modularity benefit;
[0020] If not, maintain the original community structure.
[0021] Preferably, the interactive behavior includes any of the following actions or a combination thereof:
[0022] Browse, reply, forward, mention (@), like, dislike, search, favorite, click on shared hyperlinks, and quote;
[0023] The term "friendship" includes adding friends or following someone.
[0024] This invention also provides a system for implementing the above-described social network community discovery method, comprising a relationship building module, a behavioral feature extraction module, a weight fusion module, and a community discovery module.
[0025] The relationship building module is used to obtain the friend relationships between users and quantify the friend relationships;
[0026] The behavior feature extraction module is used to extract behavior features based on the interaction behavior between users in the social network, and to obtain the behavior weights between users based on the behavior features.
[0027] The weight fusion module is used to fuse the quantitative value of the friendship relationship and the behavioral weight to obtain the edge weight between users;
[0028] The community discovery module is used to obtain the modularity of a community based on the edge weights between the users; and to divide the social network communities based on the maximum modularity algorithm.
[0029] Preferably, the system further includes a data acquisition module, which is used to acquire friend relationships and interaction behaviors in the social network.
[0030] In this invention, the quality of community segmentation is effectively improved by introducing behavioral information of social network users; the interaction index is used to represent features such as the number of user interaction behaviors and the interval time, effectively preserving multiple feature information of user interaction behaviors without increasing data dimensions; by merging the basic connection graph and the behavioral connection graph into a weighted network graph, the complexity of heterogeneous network community segmentation is avoided by constructing heterogeneous networks while preserving network information, thus ensuring the efficiency of the community segmentation algorithm.
[0031] Compared with existing technologies, the beneficial effects of this invention are as follows: user interaction behavior plays an important supporting role in the formation of communities; by introducing user interaction behavior and combining it with the social network topology structure formed by user friendship relationships, community discovery in social networks is achieved through a modularity maximization algorithm, which helps to improve the efficiency of existing community discovery technologies. Attached Figure Description
[0032] Figure 1 This is a flowchart of the social network community discovery method based on interactive behavior according to the present invention;
[0033] Figure 2 This is a flowchart of Example 1;
[0034] Figure 3 This is the system logic block diagram of Embodiment 2. Detailed Implementation
[0035] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0036] The present invention will now be described in further detail with reference to the accompanying drawings:
[0037] A social network community discovery method based on interaction behavior, such as Figure 1 As shown, the method includes:
[0038] Step 101: Extract behavioral features based on user interactions in social networks.
[0039] It should be noted that user interactions and friendships must be obtained through legal means, such as with the user's authorization. These interactions include any of the following actions or combinations thereof: mentioning, liking, reposting, forwarding, clicking on shared hyperlinks, and chatting.
[0040] Step 102: Obtain the behavioral weights between users based on the behavioral characteristics. The behavioral weights are the quantification of the behavioral characteristics.
[0041] Step 103: Obtain the edge weights between users based on the quantified values of friend relationships and behavioral weights. For example, combine the quantified values of friend relationships and behavioral weights. Users can be represented by nodes; friend relationships between users can be represented by edges between nodes, for example, the friend relationship between the first user u and the second user v is represented by the edge (u, v). A relationship connection graph and a feature connection graph can be constructed based on friend relationships and behavioral features to represent the connections between nodes.
[0042] Step 104: Obtain the modularity of the community based on the edge weights between the users.
[0043] Step 105: Divide social network communities based on the maximum modularity algorithm.
[0044] Inter-user interactions play a crucial supporting role in the formation of communities. By introducing inter-user interactions and combining them with the social network topology formed by user friendships, and using a modularity maximization algorithm to discover communities in social networks, we can enhance our understanding of community structures and improve the efficiency of existing community discovery technologies.
[0045] Example 1
[0046] In step 102, as Figure 2 Methods for obtaining behavioral weights between two users include:
[0047] Step 201: Behavioral Feature Quantification. Behavioral feature quantification includes:
[0048] Step 301: Based on the number of interactions and the interaction intervals between the first user and the second user, obtain the probability of interaction behavior between the first user and the second user. The behavior probability is expressed as the probability that the time interval of the interaction behavior falls within a certain threshold range.
[0049] Wherein, the behavior probability Mn is expressed as:
[0050] Mn = CDF n -CDF n-1 (41)
[0051]
[0052] Among them, CDF n Represented as the threshold value T n The cumulative distribution function, T n Let |Δt| represent the threshold, and |Δt| represent the number of interaction intervals between users, i.e., the number of interactions minus 1. r |Δt r≤t represents the number of interaction intervals Δt less than the threshold t, k represents the sequence number of the interaction intervals, and n represents the sequence number of the threshold value. The set of intervals Δt can be represented as {Δt1, Δt2, ..., Δt}. k The thresholds are defined as follows: There is a certain time interval between when the first user performs an action (e.g., posting a Weibo message) and when the second user responds (e.g., likes or replies). This interval is usually measured in seconds; the set of thresholds can be represented as {T1, T2, ..., T...}. n The interaction interval is classified by threshold. The interval time can be regarded as multiple distributed data within the range of the threshold value. For example, the minimum possible value of the data obtained by T1 is 1 second. n This refers to the maximum range of values (in seconds) for the statistics. For example, on Twitter, you can count time intervals such as 30s, 600s, 3600s, and 86400s. These selected values are the range t. If the statistics are to be calculated up to 2 days, then T is the maximum range. n It is 172800s. CDF0 is defined as zero.
[0053] Step 302: Based on the number of interactions between the first user and the second user, and the number of interactions between the first user and the third user, obtain the interaction divergence of the first user relative to the second user. Social network users often interact with multiple users; if two users have a close relationship, they will interact more frequently, while interacting with other users will be less frequent.
[0054] Inter-divergence DI uv Represented as:
[0055]
[0056] Where, |N u | represents the number of interactions between the first user u and the third user within a specific time period, |Seq uv | represents the number of interactions between the first user u and the second user v. Log represents the common logarithm, and the third user is any user other than the first and second users.
[0057] Formula 5 can be rewritten as:
[0058]
[0059] Step 303: Obtain the interaction index based on the interaction behavior probability and interaction divergence.
[0060] The interaction index BI uv Represented as:
[0061]
[0062] Where, βn β is a coefficient representing the probability of a behavior. n The set can be represented as {β1, β2, ..., β...} n The probability of behavior varies within different threshold ranges, and empirical values can be used.
[0063] Step 202: Logistic Regression: Obtain behavioral weights based on the logistic regression method and the interaction index.
[0064] Behavioral weights are represented as follows:
[0065] W behavior (u, v) = W behavior <u,v> +W behavior <v,u> (8)
[0066]
[0067] Correspondingly,
[0068] Where x represents the interaction index BI uv θ T W represents the transpose matrix of θ, where the parameter θ is learned using the maximum likelihood method of logistic regression. The maximum likelihood method is existing technology and will not be described further in this application. base <u,v> =1 indicates that there is a friend relationship between the first user u and the second user v in the social network, W base <u,v> =0 indicates that there is no friendship relationship. W behavior <u,v> W represents the behavioral weight from the first user to the second user. behavior <v,u> This is represented as the behavioral weight in the direction from the second user to the first user, meaning the behavioral weight is the sum of the behavioral weights in both directions.
[0069] In step 103, the quantified value of the friendship relationship between users can be expressed as:
[0070]
[0071] Among them, W base (u, v) is the quantified value of the friendship relationship between the first user u and the second user v, a is a constant, E represents the set of friendship relationships in the social network, u→v∈E means that the first user u and the second user v have a friendship relationship, and the edges between them belong to the edge set E. This indicates that the edge between the first user u and the second user v does not belong to the edge set E.
[0072] For example, in Weibo and Twitter, the following behavior is unidirectional; for user u, the directional quantization value W... base<u,v>It can be represented as:
[0073]
[0074] If you follow someone on both sides or add them as a friend, such as by "Add Friends" on Facebook, a two-way relationship is established simultaneously, which can be represented as:
[0075] W base (u, v) = W base <u,v> +W base <v,u> (32)
[0076] Or it can be expressed as:
[0077]
[0078] In step 103, the edge weights between users are represented as follows:
[0079] W(u, v) = (1-α)W base (u, v) + αW behavior (u, v) (9)
[0080] Where W(u, v) represents the edge weight between the first user u and the second user v, W base (u, v) represents the quantized value of the friendship relationship between the first user u and the second user v, i.e., the bidirectional quantized value of the edge (u, v), where α is a hyperparameter and W... behavior (u, v) represents the behavior weights.
[0081] In step 104, the modularity Q can be expressed as:
[0082]
[0083]
[0084] k u =∑ v W(u, v) (12)
[0085] k v =∑ u W(u, v) (13)
[0086] Among them, c u =c v This indicates that the first user u and the second user v are in the same community. Let k represent the probability of the existence of edge (u, v) in a random social network, m represent the number of edges in social network A, and k represent the number of edges in social network A. u k represents the sum of the weights of all edges connected to the first user u, i.e., the degree of node u; vThe module degree Q is represented as the sum of the edge weights (of all) connected to the second user v, i.e., the degree of node v. However, it is not limited to this; the module degree Q can also include quantified values of network structure and user attributes.
[0087] In step 105, the algorithm for maximizing modularity includes the Louvain algorithm, which is highly efficient and suitable for community partitioning in large-scale networks. The Louvain algorithm's community partitioning process mainly includes the following steps:
[0088] (1) Treat each node in the network as a community, then consider the neighboring nodes of each community, and try to add the first community to the second community where its neighboring node is located. At the same time, calculate the modularity gain brought by this action, and finally select the neighboring node with the largest modularity gain to join its community. If the modularity gain is not positive, the community structure remains unchanged.
[0089] (2) Fold (merge) the communities formed in step 1 into a single node, and repeat step 1 until the community affiliation of each node no longer changes. The sum of the edge weights between adjacent points of two communities will be used as the new weights after the two communities are folded into a single node.
[0090] Example 2
[0091] This embodiment discloses a system for implementing the above-described social network community discovery method, such as... Figure 3 As shown, it includes a relationship building module 1, a behavioral feature extraction module 2, a weight fusion module 3, and a community discovery module 4.
[0092] The relationship building module 1 is used to obtain the friend relationships between users and quantify the friend relationships;
[0093] The behavior feature extraction module 2 is used to extract behavior features based on the interaction behavior between users in the social network, and to obtain the behavior weights between users based on the behavior features;
[0094] The weight fusion module 3 is used to fuse the quantitative value of friend relationships and behavioral weights to obtain the edge weights between users;
[0095] Community discovery module 4 is used to obtain the modularity of the community based on the edge weights between the users; and to divide the social network communities based on the maximum modularity algorithm.
[0096] The system also includes a data acquisition module 5, which is used to acquire friend relationships and interaction behaviors in the social network, namely, social network user friend relationships and social network user behavior data. Among them, social network user friend relationships refer to the long-term connections between users formed by actions such as following and adding friends, which are called basic connections. User behavior data includes browsing, replying, forwarding, mentioning (@), liking, disliking, searching, collecting, clicking on shared hyperlinks, and referencing, etc. These operations form a temporary connection between users, which we call a behavioral connection. Each user action has a corresponding timestamp, which is used to extract user behavioral features. This application quantifies and fuses behavioral features and friend relationships into edge weights, and uses the edge weights as parameters for calculating modularity, but the quantization and fusion methods are not limited to this.
[0097] The above are merely preferred embodiments of the present invention and are not intended to limit the present invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for discovering social network communities based on interactive behavior, characterized in that, The method includes: Based on user interactions in social networks, behavioral features are extracted; wherein, the interactive behaviors include any of the following actions or combinations thereof: browsing, replying, forwarding, mentioning, liking, disliking, searching, saving, clicking on shared hyperlinks, and quoting; Based on the aforementioned behavioral characteristics, the behavioral weights between users are obtained; By combining the quantified value of the friend relationship between users and the behavior weight, the edge weight between users is obtained; wherein, the friend relationship includes adding friends or following, and the edge weight between users is represented as follows: (9) in, Indicated as the first user u Second User v Edge weights between them For the first user u Second User v The quantification value of a friendship relationship, i.e., the edge ( Quantization value of ) For hyperparameters, Represented as behavioral weights; Based on the edge weights among the users, the modularity of the community is obtained; where the modularity Q is represented as: (10) (11) (12) in, Indicates the first user u Second User v In the same community, Represented as edges in a random community network The probability of existence m Represented as a social network A The number of sides in the middle Indicates connection to the first user u The sum of edge weights; Social network communities are segmented based on the maximum modularity algorithm; the maximum modularity algorithm includes: Try adding a node or first community to one or more second communities of neighboring nodes; Determine whether the modularity benefit of the second community increases or exceeds the first threshold; If so, merge the node or the first community into the second community with the highest modularity benefit; If not, maintain the original community structure.
2. The social network community discovery method according to claim 1, characterized in that, Methods for obtaining behavioral weights between two users include: Based on the number of interactions and the interval between interactions between the first user and the second user, the probability of interaction behavior between the first user and the second user is obtained; Based on the number of interactions between the first user and the second user, and the number of interactions between the first user and the third user, the interaction divergence of the first user relative to the second user is obtained; Based on the probability of the interaction behavior and the interaction divergence, the interaction index is obtained; Behavioral weights are obtained using logistic regression and the interaction index.
3. The social network community discovery method according to claim 2, characterized in that, The probability of the interaction behavior Mn Represented as: (41) (4) in, Represented as a field value T n The cumulative distribution function, This represents the number of user interaction intervals. Indicates the interaction interval time The number of time intervals less than the threshold t, where k represents the sequence number of the interaction interval; Inter-divergence Represented as: (5) in, Indicates the first user within a specific time period The number of interactions with third-party users Indicates the first user u Second User v The number of interactions; The interaction index Represented as: (6) in, It is represented as a coefficient indicating the probability of the behavior; The behavior weights are represented as follows: (8) (7) in, x Represented as the interaction index , The transpose of θ is represented by the parameter. The maximum likelihood method using logistic regression was used to learn the results. Indicates the first user in the social network u Second User v They are friends. This indicates that there is no friendship between them. This represents the behavioral weights from the first user to the second user.
4. The social network community discovery method according to claim 3, characterized in that, The friendship relationship between users is represented as follows: (3) in, For the first user u Second User v A quantitative value for friendship. a Let E be a constant, and let E represent the set of friend relationships in the social network. Indicated as the first user u Second User v They are friends, and the edges between them belong to the edge set. E .
5. A system for implementing the social network community discovery method as described in any one of claims 1-4, characterized in that, It includes a relationship building module, a behavioral feature extraction module, a weight fusion module, and a community discovery module. The relationship building module is used to obtain the friend relationships between users and quantify the friend relationships; The behavior feature extraction module is used to extract behavior features based on the interaction behavior between users in the social network, and to obtain the behavior weights between users based on the behavior features. The weight fusion module is used to fuse the quantitative value of the friendship relationship and the behavioral weight to obtain the edge weight between users; The community discovery module is used to obtain the modularity of the community based on the edge weights between the users; And based on the maximum modularity algorithm, social network communities are segmented.
6. The system according to claim 5, characterized in that, It also includes a data acquisition module, which is used to acquire friend relationships and interaction behaviors in social networks.