A flexi-clique mining method and a mining system based on power threshold adaptive growth
By optimizing the initial seed selection and growth control through multiple sub-mixing and power-threshold adaptive growth algorithms, the problem of low efficiency in agglomerative subgraph discovery in existing technologies is solved, and efficient agglomerative subgraph discovery in multi-scale networks is achieved, improving search efficiency and solution quality.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DONGHUA UNIV
- Filing Date
- 2026-04-13
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods for discovering agglomerative subgraphs struggle to simultaneously satisfy structural compactness and computational scalability in large-scale networks. The existing Flexi-clique model suffers from low search efficiency and rigid repair strategies, making it unsuitable for multi-scale network structures.
A variety of sub-mixing strategies are used to select the initial seed, combined with an power-threshold adaptive growth algorithm, including strict and soft growth stages. Subgraphs are optimized through mini-batch deletion and node backfilling to achieve adaptive agglomerative subgraph discovery.
It significantly improves search efficiency and solution quality, can efficiently discover agglomerative subgraphs in multi-scale networks, reduces memory usage and runtime, and is highly adaptable and stable.
Smart Images

Figure CN122240887A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of graph data mining technology, specifically involving a method for discovering agglomerative subgraphs in complex networks, and in particular a Flexi-clique mining method and system based on power threshold adaptive growth. It can be applied to application scenarios that require dense subgraph discovery of large-scale graph data, such as social network analysis, bioinformatics, and recommendation systems. Background Technology
[0002] With the rapid development of the internet and data acquisition technologies, a large amount of complex graph-structured data has been generated in scenarios such as social networks, knowledge graphs, and communication and transaction networks. How to automatically identify densely structured and frequently interacting groups of nodes in large-scale networks is one of the fundamental problems in tasks such as community discovery, anomaly detection, relationship mining, and recommendation. To address this need, cohesive subgraph discovery has formed a relatively systematic research framework. However, challenges remain regarding efficiency and stability when dealing with ultra-large-scale graphs and scenarios requiring a balance between "structural density" and "computational scalability." Existing cohesive subgraph discovery techniques can generally be categorized into the following three types:
[0003] The traditional clique model requires that any two nodes in a subgraph be directly connected, guaranteeing the highest cohesion through complete connectivity. When discovering a cohesive subgraph, it is only necessary to find the largest fully connected subgraph in the original graph. However, this method relies on strict connectivity requirements, making it difficult to find sizable instances in real-world large-scale sparse networks and limiting its support for complex graph structures.
[0004] The k-plex-based relaxation model allows each node to have at most k missing connections, thus discovering larger-scale subgraphs by relaxing connectivity constraints. In agglomerative subgraph discovery, the allowed number of missing connections k is set to balance subgraph size and cohesion. However, the degree constraint of this model increases linearly with the subgraph size, becoming too lenient in small subgraphs and too strict in large subgraphs, making it difficult to flexibly adapt to multi-scale network structures.
[0005] Models based on other relaxation concepts, including s-clique, s-club, and quasi-clique, relax clique requirements in different ways, such as distance constraints, diameter restrictions, and density thresholds. When discovering condensed subgraphs, an appropriate relaxation degree is selected based on specific application requirements. However, these models often suffer from problems such as parameter sensitivity, high computational complexity, or difficulty in capturing specific structures.
[0006] However, most existing methods for discovering agglomerative subgraphs do not adequately consider the flexibility required by constraints, failing to meet the discovery needs in multi-scale network structures. For example, in social networks, it's necessary to simultaneously discover small-scale close-knit groups and large-scale community structures; or in bioinformatics, when analyzing protein-protein interaction networks, this method can help identify functional modules at different scales, effectively supporting complex network analysis. While Flexi-clique models have been proposed, which address the constraint flexibility issue to some extent by introducing sublinearity constraints, their solution algorithm, NPA, relies heavily on traditional stripping strategies, resulting in low efficiency when the network size is large or the structure is complex, and limitations such as local optima traps and rigidity in repair strategies exist. Therefore, proposing an efficient agglomerative subgraph discovery method that satisfies sublinearity constraints is of great significance. Summary of the Invention
[0007] The purpose of this invention is to provide a Flexi-clique mining method and system based on a power threshold growth algorithm that satisfies sublinear constraints, in order to solve the problems of weak search guidance, rigid repair strategies, and difficulty in adapting to multi-scale network structures in the prior art.
[0008] A Flexi-clique mining method based on power-threshold adaptive growth includes the following steps:
[0009] S1: Obtain graph data and generate an initial seed set;
[0010] S2: Performs parallel seed exploration and adaptive growth;
[0011] S3: Perform refined repair and backfill optimization on the growth results;
[0012] S4: Merge all seed exploration results and output the optimal Flexi-clique.
[0013] Preferably, the specific process of step S1 of the present invention is as follows: First, the system loads the input graph data, parses the node set V and edge set E, and constructs a memory representation of the graph structure; then, it calculates the basic topological features of each node, including the degree deg(v) and the core number core(v); it selects the initial seed using a multi-sub-mixing strategy: configures the seed quantity parameter topk, sets the degree weight α and the core number weight β, α+β=1; it calculates the comprehensive score score(v) = α×normalized(deg(v)) + β×normalized(core(v)) for each node v, where normalized() is the normalization function; after sorting by score in descending order, it selects the top k nodes as the initial seed set.
[0014] Preferably, the specific process of step S2 of the present invention is as follows: Each seed independently starts a power-threshold growth algorithm; during initialization, the current subgraph H is set as the seed node, and the power-threshold parameter τ, τ∈(0,1); the growth process adopts an adaptive two-stage mechanism: the strict growth stage requires the number of connections between the new node v and H. If there is no growth for N consecutive rounds, the system switches to a soft growth phase, relaxing the constraint to a connection number ≥ θ(|H|+1) - Δ, where Δ is an adaptive relaxation factor. Growth iterations continue until no new nodes satisfy the condition. During this process, the subgraph size |H| is monitored in real-time and dynamically calculated. .
[0015] Preferably, the specific process of step S3 of the present invention is as follows: First, calculate the defect degree of each node in the current H: defect(v) = max(0, θ - deg_H(v)); adopt a mini-batch deletion strategy, and remove the K nodes with the highest defect degree each time; after deletion, update H and |H|, and recalculate θ; then, check the set of deleted nodes, calculate the number of connections between each node u and the current H, and if the number of connections is ≥ θ(|H|+1), then add it back to H; after repair, re-verify whether H satisfies the constraints, and if it does, growth can be triggered again.
[0016] Preferably, the specific process of step S4 of the present invention is as follows: collect all candidate Flexi-clique subgraphs obtained from seed exploration; the merging strategy is based on multi-index sorting: prioritize the solution with the largest subgraph size |H|; if the sizes are the same, compare the average degree of nodes or edge density within the subgraph; the final output subgraph needs to be verified for connectivity and reviewed for degree constraints; the output includes the optimal subgraph node set, size and related statistical information.
[0017] A mining system implementing the mining method of the present invention includes the following modules:
[0018] The data preprocessing module is responsible for selecting high-quality initial seed nodes from the original graph data, using multiple sub-mixing strategies, and combining topological features such as node degree and core number to generate an initial seed set.
[0019] Growth control module: Based on the power threshold growth algorithm, it implements an adaptive two-stage growth mechanism, including two modes: strict growth and soft growth, and dynamically adjusts the constraints added to the nodes;
[0020] Repair module: Optimizes the subgraph generated during the growth process through a small-batch deletion strategy and node backfilling mechanism, thereby improving the quality and stability of the solution;
[0021] Results aggregation module: Collects all candidate subgraphs obtained from seed exploration and finally outputs a subgraph including the optimal subgraph node set and size statistics.
[0022] The technical solution of this invention has the following advantages compared with the prior art:
[0023] 1) Enhanced global exploration capabilities with multiple sub-guides
[0024] By initiating searches from different regions of the network through various sub-selection methods such as "degree / core number / hybrid", the dependence on a single initial subgraph is reduced, and the probability of getting trapped in local optima is decreased. Under multiple real datasets and different τ settings, the overall running efficiency is significantly faster than NPA, usually by several to tens of times, and in some scenarios by hundreds of times, depending on the dataset and τ.
[0025] 2) Adaptive growth mechanism enhances multi-scale adaptability and stability
[0026] A two-stage growth strategy of "strict growth + soft growth switching" is adopted: when growth is smooth, strict constraints are maintained to ensure the feasibility and quality of the solution; when stagnation occurs, the constraints are relaxed appropriately to overcome local bottlenecks, so as to better adapt to different τ intervals and network structures with mixed sizes; experiments show that under the premise that the solution quality is basically consistent with NPA, slightly larger solutions can be obtained in some scenarios.
[0027] 3) Refined repair and backfilling reduce the risk of excessive deletion and optimize resource usage.
[0028] The "small batch deletion + backfill" repair strategy is introduced to avoid structural damage and drastic fluctuations in solution size caused by a one-time full deletion. At the same time, through fine-grained control of candidate management and repair process, the memory increment can be significantly reduced on most datasets, and the stability and executability of large graph experiments can be improved. Attached Figure Description
[0029] Figure 1 This is a schematic diagram of the overall algorithm flow of the present invention.
[0030] Figure 2 This is graph data from one embodiment of the present invention. Detailed Implementation
[0031] To make the objectives, technical solutions, and advantages of this invention clearer, the following detailed description of the invention is provided in conjunction with specific embodiments.
[0032] This invention provides a Flexi-clique mining method based on power-threshold adaptive growth, the method comprising:
[0033] 1. Employ multiple sub-hybrid guidance strategies to enhance global exploration capabilities;
[0034] 2. An adaptive two-stage growth mechanism is designed to address the sublinear constraint characteristics of Flexi-clique;
[0035] 3. A refined repair and backfill strategy is adopted to address the dynamic changes in the subgraph structure.
[0036] This invention designs an efficient power-threshold growth algorithm (PTG) to support flexible agglomerative subgraph discovery. By optimizing seed selection, growth control, and repair mechanisms, it significantly improves search efficiency and solution quality. This invention can meet the needs of multi-scale subgraph mining in complex dynamic networks, such as community detection in social networks and functional module identification in bioinformatics. Compared with existing methods, this invention has significant advantages in terms of time efficiency and memory usage, and is highly applicable and stable.
[0037] Here, Flexi-clique refers to a connected subgraph that satisfies sublinearity constraints, requiring that the degree of each node is at least 1. Where H is a subgraph and τ is a user-defined parameter, the degree constraint threshold is adjusted using a power function to achieve a sublinear change in constraint strength as the subgraph size increases. The multi-scale network structure refers to a network containing dense regions of varying sizes, requiring the algorithm to simultaneously capture the strict connectivity of small subgraphs and the loose structure of large subgraphs.
[0038] The aforementioned multi-seed hybrid guidance strategy refers to a method for dynamically selecting the initial seed based on topological features such as node degree and number of cores. Specifically, it includes a degree-first strategy (prioritizing nodes with high height), a core-first strategy (prioritizing nodes with high core count), and a hybrid strategy (balancing degree and core count). The seed selection process is optimized through an adaptive weight adjustment mechanism to avoid the search getting trapped in local optima.
[0039] The adaptive two-stage growth mechanism refers to a dynamic adjustment method that combines strict growth and soft growth. The strict growth stage requires newly added nodes to fully satisfy the current degree constraint; when growth stagnates, it automatically switches to the soft growth stage, allowing for moderate relaxation of constraints (such as lowering the degree threshold). ), where relaxation factor Calculated dynamically based on the number of consecutive failures:
[0040]
[0041] The refined repair and backfilling mechanism refers to a method that optimizes subgraph quality through mini-batch deletion and node reintroduction. Mini-batch deletion removes only the K nodes with the highest defect rates at a time. This reduces damage to the subgraph structure; the backfill mechanism sets a budget B, monitors deleted nodes, and re-adds them to the subgraph when their connection conditions improve, thereby improving the stability of the solution.
[0042] A mining system based on the power-threshold adaptive growth Flexi-clique mining method includes the following modules:
[0043] The data preprocessing module is responsible for selecting high-quality initial seed nodes from the original graph data, using multiple sub-mixing strategies, and combining topological features such as node degree and core number to generate an initial seed set.
[0044] Growth control module: Based on the power threshold growth algorithm, it implements an adaptive two-stage growth mechanism, including two modes: strict growth and soft growth, and dynamically adjusts the constraints added to the nodes;
[0045] Repair module: Optimizes the subgraph generated during the growth process through a small-batch deletion strategy and node backfilling mechanism, thereby improving the quality and stability of the solution;
[0046] Results aggregation module: Collects all candidate subgraphs obtained from seed exploration and finally outputs a subgraph including the optimal subgraph node set and size statistics.
[0047] like Figure 1 As shown, a Flexi-clique mining method based on power-threshold adaptive growth includes the following steps:
[0048] S1: Obtain graph data and generate an initial seed set.
[0049] The specific process is as follows: This step is executed by the data preprocessing module. First, the system loads the input graph data, parses the node set V and edge set E, and constructs a memory representation of the graph structure. Next, it calculates the basic topological features of each node, including degree (deg(v) - the number of neighbors) and core (v) (obtained through the k-core decomposition algorithm). Based on these features, a multi-sub-mixing strategy is used to select the initial seed: configure the seed quantity parameter topk, set the degree weight α and core weight β (usually α+β=1), and calculate the comprehensive score score(v) = α×normalized(deg(v)) + β×normalized(core(v)) for each node v, where normalized() is the normalization function. After sorting by score in descending order, the top k nodes are selected as the initial seed set. To ensure diversity, an alternating selection strategy is used to balance the proportion of nodes with high degree and high core. For example, with Figure 2Taking a given 10-node graph as an example, the system loads a node set V={a,b,...,j} and 15 edges. The node degrees are calculated as follows: deg(f)=4, deg(h)=4, deg(a)=3, deg(g)=3, deg(i)=3, etc.; the core numbers are obtained through k-core decomposition as follows: core(f)=3, core(h)=3, core(g)=3, etc. With α=0.5, β=0.5, and topk=5, after calculating the comprehensive score, a seed sequence S=[f, h, a, g, i] is selected. This seed set covers different dense regions in the graph, laying the foundation for parallel exploration.
[0050] S2: Performs parallel seed exploration and adaptive growth.
[0051] The specific process is as follows: This step is executed by the growth control module. Each seed independently starts a PTG instance. During initialization, the current subgraph H is set as the seed node, and the power threshold parameter τ (usually τ∈(0,1)) is used. The growth process adopts an adaptive two-stage mechanism: the strict growth stage requires a certain number of connections between the new node v and H. If there is no growth for N consecutive rounds (e.g., N=3), then switch to the soft growth phase, relaxing the constraint to a connection number ≥ θ(|H|+1) - Δ, where Δ is an adaptive relaxation factor. Growth iterations continue until no new nodes satisfy the condition. During this process, the subgraph size |H| is monitored in real time and dynamically calculated. For example, taking seed f as an example, initially H={f}, τ=0.7, |H|=1, θ=1. In the strict growth phase, check the neighbors e, g, h, i of f, and their connection number with H is 1 ≥ θ(2)=1. Add them all, and H becomes {e,f,g,h,i}, |H|=5. At this point, since the growth was successful, soft growth was not triggered. Verifying the node degrees revealed that the degrees of e, g, and i (1,2,2) < 3, which did not meet the constraint, so the process transitioned to the repair phase.
[0052] S3: Perform refined repair and backfill optimization on the growth results.
[0053] The specific process is as follows: The repair module first calculates the defect degree of each node in the current H: defect(v) = max(0, θ - deg_H(v)). A mini-batch deletion strategy is adopted, removing the K nodes with the highest defect degrees each time (K is configurable, e.g., K=1). After deletion, H and |H| are updated, and θ is recalculated. Subsequently, the set of deleted nodes is checked, and the number of connections between each node u and the current H is calculated. If the number of connections ≥ θ(|H|+1), it is added back to H (backfilling). After repair, H is re-verified to ensure it meets the constraints. If it does, growth can be triggered again. For example, H={e,f,g,h,i}, θ=3. Defect degrees: defect(e)=2, defect(g)=1, defect(i)=1. The node with the highest defect degree, e, is deleted, H is updated to {f,g,h,i}, |H|=4. Verify that all nodes have a degree ≥ 2, satisfying the constraint. Check the deleted node e; its connection number with H = 1 < θ(5) = 3, so it is not refilled. Regenerate the node and find that the connection number between the boundary node j and H = 3 ≥ θ(5) = 3. Add j, and we get H = {f, g, h, i, j}, |H| = 5, θ = 3, thus the verification is successful.
[0054] S4: Merge all seed exploration results and output the optimal Flexi-clique.
[0055] The specific process is as follows: The results aggregation module collects all candidate Flexi-clique subgraphs obtained from seed exploration. The merging strategy is based on multi-metric sorting: priority is given to the solution with the largest subgraph size |H|; if the sizes are the same, the average degree of nodes or edge density within the subgraph is compared. The final output subgraph needs to undergo connectivity verification (checked by BFS / DFS) and degree constraint review (degree of each node). The output includes the optimal subgraph node set, size, and related statistics. For example, after all seeds have been explored: seeds f and h both yield subgraphs {f,g,h,i,j} with a size of 5; seeds a, g, and i each yield subgraphs with a size of 4. The merging module selects the largest subgraph {f,g,h,i,j} as the output. Verification shows that this subgraph is connected and each node has a degree ≥ 3 (θ=3), making it a valid Flexi-clique.
[0056] This embodiment demonstrates the complete execution flow of the PTG algorithm on a specific network through the above steps, reflecting the synergistic effect of multiple sub-hybridization, adaptive growth, and fine-grained repair mechanisms, and providing an efficient and reliable solution for Flexi-clique mining.
[0057] Obviously, the above description of the embodiments of the Power Threshold Growth (PTG) algorithm and its Flexi-clique mining method of the present invention is merely an example to clearly illustrate the specific implementation of the present invention, and is not intended to limit the scope of protection of the present invention. For those skilled in the art, based on the above embodiments, various changes and adjustments can still be derived for different graph structure data, parameter configurations (such as τ values, seed selection strategies, growth patterns, repair and backfilling mechanisms, etc.) or application scenarios, and these changes and adjustments all fall within the scope covered by the spirit and principles of the technical solution of the present invention.
Claims
1. A method of flexi-clique mining based on power threshold adaptive growth, characterized in that Includes the following steps: S1: Obtain graph data and generate an initial seed set; S2: Performs parallel seed exploration and adaptive growth; S3: Perform refined repair and backfill optimization on the growth results; S4: Merge all seed exploration results and output the optimal Flexi-clique.
2. The excavating method according to claim 1, characterized by The specific process of step S1 above is as follows: First, the system loads the input graph data, parses the node set V and edge set E, and constructs a memory representation of the graph structure; then, it calculates the basic topological features of each node, including the degree deg(v) and the core number core(v); it selects the initial seed using a multi-sub-mixing strategy: configures the seed quantity parameter topk, sets the degree weight α and the core number weight β, α+β=1; it calculates the comprehensive score score(v) = α×normalized(deg(v)) + β×normalized(core(v)) for each node v, where normalized() is the normalization function; after sorting by score in descending order, it selects the top k nodes as the initial seed set.
3. The excavating method according to claim 2, characterized by The specific process of step S2 above is as follows: Each seed independently starts an power-threshold growth algorithm; during initialization, the current subgraph H is set as the seed node, and the power-threshold parameter is set. ; the growth process employs an adaptive two-stage mechanism: a strict growth phase requires the new node v to be connected to H ; If there is no growth for N consecutive rounds, switch to a soft growth phase, relax the constraint to connections ≥ θ(|H| + 1) - Δ, where Δ is an adaptive relaxation factor; the growth iteration continues until there is no new node that satisfies the condition; monitor the subgraph size |H| in real time during the process, and dynamically calculate .
4. The excavation method according to claim 3, characterized in that... The specific process of step S3 above is as follows: First, calculate the defect degree of each node in the current H: defect(v) = max(0, θ - deg_H(v)); adopt a mini-batch deletion strategy, and remove the K nodes with the highest defect degree each time; after deletion, update H and |H|, and recalculate θ; then, check the set of deleted nodes, calculate the number of connections between each node u and the current H, and if the number of connections is ≥ θ(|H|+1), then add it back to H; after repair, re-verify whether H meets the constraints, and if it does, growth can be triggered again.
5. The excavation method according to claim 4, characterized in that... The specific process of step S4 above is as follows: Collect all candidate Flexi-clique subgraphs obtained from seed exploration; the merging strategy is based on multi-index sorting: prioritize the solution with the largest subgraph size |H|; if the sizes are the same, compare the average degree of nodes or edge density within the subgraph; the final output subgraph needs to be verified for connectivity and reviewed for degree constraints; the output includes the optimal subgraph node set, size and related statistical information.
6. A digging system for implementing the digging method according to any one of claims 1-5, characterized in that... Includes the following modules: The data preprocessing module is responsible for selecting high-quality initial seed nodes from the original graph data, using multiple sub-mixing strategies, and combining topological features such as node degree and core number to generate an initial seed set. Growth control module: Based on the power threshold growth algorithm, it implements an adaptive two-stage growth mechanism, including two modes: strict growth and soft growth, and dynamically adjusts the constraints added to the nodes; Repair module: Optimizes subgraphs generated during the growth process through a small-batch deletion strategy and node backfilling mechanism; Results aggregation module: Collects all candidate subgraphs obtained from seed exploration and finally outputs a subgraph including the optimal subgraph node set and size statistics.