A network pusher prediction method and device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By acquiring and analyzing users' dissemination data, a hypergraph prediction model was constructed, solving the problem of identifying online promoters in the early stages of public opinion dissemination, achieving higher prediction accuracy and resource utilization, and meeting real-time requirements.

CN122243481APending Publication Date: 2026-06-19BEIJING UNIV OF POSTS & TELECOMM

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: BEIJING UNIV OF POSTS & TELECOMM
Filing Date: 2026-02-10
Publication Date: 2026-06-19

Application Information

Patent Timeline

10 Feb 2026

Application

19 Jun 2026

Publication

CN122243481A

IPC: G06Q10/44; G06F18/213

AI Tagging

Application Domain

Instruments

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies struggle to accurately identify potential online manipulators in the early stages of public opinion dissemination, especially when user behavior is disguised, making it difficult to capture early risk signals. This results in inaccurate predictions, low resource utilization, and an inability to meet real-time requirements.

Method used

By acquiring users' dissemination behavior data, content data, and structural data, behavioral features, content features, and structural features are extracted to construct a hypergraph prediction model. The hypergraph prediction model is used to predict risks during non-stationary phases. Combined with the compression activation mechanism of dissemination deviation, the accuracy and lead time of prediction are improved.

Benefits of technology

It improves the accuracy and stability of predicting online promoters, reduces invalid calculations, improves resource utilization, meets real-time requirements, and can identify potential online promoters in advance.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122243481A_ABST

Patent Text Reader

Abstract

This application provides a method and apparatus for predicting online promoters, comprising: acquiring user propagation behavior data, propagation content data, and propagation structure data; extracting features from the propagation behavior data, propagation content data, and propagation structure data respectively to obtain behavioral features, content features, and structural features; determining the propagation stage based on the behavioral features, content features, and structural features; and when the propagation stage is a non-stationary stage, constructing a hypergraph prediction model and using the hypergraph prediction model based on the behavioral features, content features, and structural features to determine the user's risk prediction result. This application can improve the accuracy, stability, and lead time of predicting online promoters, and improve resource utilization.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, and in particular to a method and apparatus for predicting online promoters. Background Technology

[0002] With the rapid development of social media and online information dissemination technologies, online public opinion analysis has become a key technology for cyberspace security governance. Its core lies in modeling and analyzing the relationship between user behavior and information diffusion during the online public opinion dissemination process. By characterizing users' structural positions, behavioral patterns, and evolutionary characteristics within the dissemination network, it identifies key users who significantly influence the direction of public opinion—the online promoters. The key to predicting online promoters lies in anticipating the potential risk of users evolving into abnormal leaders before their influence is fully manifested. Especially in the early stages of public opinion dissemination, potential online promoters often disguise their behavior by engaging in routine interactions with a large number of ordinary users, diluting abnormal dissemination signs with the massive amount of normal neighborly behavior, making it difficult to accurately capture early risk signals. Summary of the Invention

[0003] In view of this, the purpose of this application is to provide a method and apparatus for predicting online promoters.

[0004] To achieve the above objectives, embodiments of this application provide a method for predicting online promoters, including:

[0005] Acquire user dissemination behavior data, dissemination content data, and dissemination structure data; Features are extracted from the propagation behavior data, propagation content data, and propagation structure data respectively to obtain behavioral features, content features, and structural features; Based on the aforementioned behavioral, content, and structural characteristics, the propagation stage is determined; When the propagation phase is a non-stationary phase, a hypergraph prediction model is constructed, and the user's risk prediction result is determined based on the behavioral features, content features, and structural features.

[0006] Optionally, based on the behavioral characteristics, content characteristics, and structural characteristics, the propagation stage is determined, including: Calculate the semantic change intensity based on the content characteristics of adjacent prediction time windows; The intensity of propagation behavior fluctuations is determined based on the behavioral and structural characteristics of adjacent prediction time windows; Calculate the significance metric for the current prediction time window based on the intensity of semantic change and the intensity of propagation behavior fluctuations. The propagation stage is determined based on the relationship between the significance metric and the dynamic significance threshold constructed based on the significance metric.

[0007] Optionally, determining the intensity of propagation behavior fluctuations based on the behavioral and structural characteristics of adjacent prediction time windows includes: Based on the behavioral and structural characteristics of adjacent prediction time windows, the propagation state statistics are calculated. The intensity of propagation behavior fluctuations is calculated based on the propagation status statistics of adjacent prediction time windows.

[0008] Optionally, the propagation stage is determined based on the relationship between the significance metric and a dynamic significance threshold constructed based on the significance metric, including: Calculate the absolute median based on the significance measure; The dynamic significance threshold is determined based on the absolute median and the median of the significance sequence; wherein the significance sequence consists of significance measures corresponding to multiple prediction time windows; If the significance metric is greater than or equal to the dynamic significance threshold, the propagation phase is a non-stationary phase; If the significance metric is less than the dynamic significance threshold, the propagation phase is a stationary phase.

[0009] Optionally, constructing the hypergraph prediction model includes: Construct a hypergraph structure with users participating in the propagation as user nodes and the set of user nodes that jointly participate in the propagation in the same propagation event or propagation cascade process as hyperedges.

[0010] Optionally, the hypergraph prediction model is used to determine the user's risk prediction result based on behavioral features, content features, and structural features, including: Using the aforementioned behavioral features, content features, and structural features as the initial state of user nodes, messages are iteratively transmitted on the hypergraph structure. During the message transmission process, the corresponding propagation deviation is calculated based on the cross-community propagation ratio, propagation scale parameter, and diffusion depth parameter of each hyperedge. The corresponding hyperedge activation gating coefficient is determined based on the degree of propagation deviation of each hyperedge and the dynamic deviation threshold constructed based on the degree of propagation deviation. Based on the hyperedge activation gating coefficient of each hyperedge, determine the information compression or activation modulation method of each hyperedge during message transmission.

[0011] Optionally, determining the corresponding hyperedge activation gating coefficient based on the propagation deviation degree of each hyperedge and a dynamic deviation threshold constructed based on the propagation deviation degree includes: Calculate the corresponding absolute median based on the propagation deviation of each hyperedge; Based on the absolute median and the median of the propagation deviation set, the corresponding dynamic deviation threshold is determined; the propagation deviation set is the set of propagation deviations of all superedges within the prediction time window. Calculate the corresponding hyperedge activation gating coefficient based on the propagation deviation degree of each hyperedge and the corresponding dynamic deviation threshold.

[0012] Optionally, the message transmission process of the hypergraph structure includes a user node and hyperedge aggregation stage and a hyperedge and user node back transmission stage; In the user node and hyperedge aggregation stage, the aggregation of user nodes and hyperedges is represented as follows: (14) in, For learnable transformations or nonlinear mappings, For user nodes v In the k The user node representing message passing at layer -1 is... For the first k Aggregate representation of messages passed between layers; V ( e ) for hyperedge e The set of associated user nodes; During the backhaul phase between the hyperedge and the user node, compression or activation modulation is performed using the hyperedge activation gating coefficient, as follows: (15) in, E ( v ) contains user nodes v The set of superedges a e For super-edge e The hyperedge activation gating coefficient, For super-edge e The weights; For user nodes v In the k The amount of intermediate messages received during the message passing process at each layer; User node updated to: (16) in, W (k) For the first k The linear transformation parameter matrix during the user node update phase of the layer. It is a non-linear activation function. This is the fusion function.

[0013] Optionally, the comprehensive objective loss function of the hypergraph prediction model is:

[0014] in, As a weighting factor, c v For user nodes v The prior risk weight of the information manipulation index y v For user nodes v The true risk label, The risk prediction results output by the model. s vj For user nodes v , j Similarity between them For temperature coefficient, v + As a positive sample, This is the set of negative samples.

[0015] This application also provides a device for predicting online promoters, including: The acquisition module is used to acquire user dissemination behavior data, dissemination content data, and dissemination structure data. The feature extraction module is used to extract features from the propagation behavior data, propagation content data, and propagation structure data respectively to obtain behavioral features, content features, and structural features. The triggering module is used to determine the propagation stage based on the behavioral and content characteristics; The prediction module is used to construct a hypergraph prediction model when the propagation phase is a non-stationary phase, and use the hypergraph prediction model to determine the user's risk prediction result based on the behavioral features, content features and structural features.

[0016] As can be seen from the above, the network promoter prediction method and apparatus provided in this application acquire user propagation behavior data, propagation content data, and propagation structure data. Features are extracted from these data to obtain behavioral features, content features, and structural features. Based on these features, the propagation stage is determined. When the propagation stage is non-stationary, a hypergraph prediction model is constructed. Using this model, based on the behavioral, content, and structural features, the user's risk prediction result is determined. This application can improve the accuracy, stability, and lead time of network promoter prediction, thereby increasing resource utilization. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 This is a schematic diagram of the method flow of an embodiment of this application; Figure 2 This is a schematic diagram of the hypergraph structure according to an embodiment of this application; Figure 3 This is a schematic diagram comparing the effect of the method in this application embodiment with other methods in terms of predicting time in advance. Figure 4 This is a block diagram of the device structure according to an embodiment of this application; Figure 5 This is a block diagram of the electronic device structure according to an embodiment of this application. Detailed Implementation

[0019] To make the objectives, technical solutions, and advantages of this disclosure clearer, the following detailed description is provided in conjunction with specific embodiments and the accompanying drawings.

[0020] It should be noted that, unless otherwise defined, the technical or scientific terms used in the embodiments of this application should have the ordinary meaning understood by one of ordinary skill in the art to which this disclosure pertains. The terms "first," "second," and similar terms used in the embodiments of this application do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Terms such as "comprising" or "including" mean that the element or object preceding the word encompasses the elements or objects listed after the word and their equivalents, without excluding other elements or objects. Terms such as "connected" or "linked" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. Terms such as "upper," "lower," "left," and "right" are only used to indicate relative positional relationships; when the absolute position of the described object changes, the relative positional relationship may also change accordingly.

[0021] As described in the background section, accurately predicting online manipulators is difficult in the early stages of public opinion dissemination. Most methods primarily model binary relationships between users, failing to capture the high-order collaborative behavior of multiple users participating in the same cascading / propagation event, making it difficult to structurally characterize group manipulation patterns. Some methods typically assume relatively stable user behavior patterns, failing to fully consider the dynamic evolution of user influence over time and changes in the propagation structure during online public opinion dissemination. Some methods rely on static influence indicators, centrality measures, or the assumption of equivalent aggregation of neighboring nodes, making it difficult to capture early risk signals where deviations from the propagation structure often precede the manifestation of large-scale influence, limiting the lead time and stability of prediction results. Furthermore, continuous prediction throughout the entire public opinion dissemination cycle easily generates a large amount of invalid output during the stable phase of public opinion, significantly increasing system computational overhead, resulting in low resource utilization and difficulty meeting real-time requirements.

[0022] In view of this, embodiments of this application provide a network pusher prediction method. By combining a robust statistically driven propagation stage determination strategy to predict the triggering timing, the risk prediction of potential network pushers is only performed during critical risk stages, which can improve the accuracy and stability of the prediction results, improve resource utilization, and meet real-time requirements. After triggering the prediction, a hypergraph structure oriented towards propagation cascade and cross-community diffusion events is constructed to realize structural modeling of high-order collaborative behaviors. In the message transmission process of the hypergraph prediction model, a compression activation mechanism based on propagation deviation is adopted, which can reduce the dilution and interference of a large number of normal neighbor behaviors on potential abnormal signals, and improve the accuracy of the prediction results and the lead time of the prediction.

[0023] The technical solution of this application will be further described in detail below through specific embodiments.

[0024] like Figure 1 As shown in the figure, this application provides a method for predicting online promoters, including: S101: Obtain user's dissemination behavior data, dissemination content data, and dissemination structure data; In this embodiment, during the dissemination of public opinion, user behavior and its propagation relationships change dynamically over time, exhibiting different activity patterns and influence characteristics at different time periods. Specifically, this manifests in the following ways: phased changes occur within time windows; there are activity peaks and troughs during the dissemination of public opinion; and the behavioral changes and peak propagation influence of different users do not overlap. Based on these characteristics, the risk level of a user evolving into a potential online promoter and exerting an abnormal guiding effect on the dissemination of public opinion can be predicted based on user behavior and propagation structure information within a predicted time window.

[0025] Therefore, in the process of public opinion dissemination, the first step is to acquire data related to user behavior and its dissemination relationships, including dissemination behavior data, dissemination content data, and dissemination structure data. Dissemination behavior data includes the number of times users forward data, the number of times they comment on published comments, the number of times they cite data, and the interaction time for forwarding, commenting, and citing. Dissemination content data includes the text content of forwarded, published, and cited data. Dissemination structure data includes the user's out-degree, betweenness centrality, and hierarchical position in the dissemination cascade within the public opinion dissemination network.

[0026] S102: Extract features from the dissemination behavior data, dissemination content data, and dissemination structure data respectively to obtain behavioral features, content features, and structural features; In this embodiment, behavioral features are extracted from the acquired propagation behavior data. These behavioral features are used to describe the user's activity level and propagation behavior pattern within the prediction time window. The data is obtained by statistically analyzing the number of times a user forwards, comments, citations, and interaction time distribution within the prediction time window, and then normalizing the data within the prediction time window.

[0027] Content features are extracted from the acquired dissemination content data. These features describe the semantic attributes and emotional tendencies of information involved by users during the dissemination process. Semantic representation learning can be performed on multiple text contents published, forwarded, and cited by users within the prediction time window to obtain corresponding text vector representations. The text vectors are then aggregated within the prediction time window to form the overall semantic feature representation (i.e., content features) of the user within that prediction time window, thus avoiding the impact of noise from individual texts on feature modeling.

[0028] For example, by using sentiment dictionary matching, sentiment classification models, or sentiment regression models, the sentiment polarity and intensity contained in the text content posted or forwarded by users can be quantitatively calculated to obtain the corresponding sentiment intensity score. For multiple texts from the same user within the prediction time window, the sentiment intensity scores of each text can be aggregated, such as by taking the average, weighted average, or maximum value, to obtain the overall sentiment intensity representation of the user within the current prediction time window, and normalization processing can be performed to eliminate the influence of differences in the number of texts from different users.

[0029] Structural features are extracted from the acquired propagation structure data. These features describe the user's positional relationship in the public opinion propagation network and its propagation connections with neighboring nodes. They can be calculated based on propagation structure attributes such as the user's out-degree, betweenness centrality, and hierarchical position in the propagation cascade.

[0030] In some embodiments, the process of online public opinion propagation is modeled, and within a given time window, the process can be represented as a weighted directed graph. Gs =( V s , E s ),in, V s This represents the set of user nodes that participated in the dissemination of public opinion within this time window. E s This represents the set of propagation relationships between users. Propagation relationships (i.e., propagation edges) include information propagation relationships formed between users through forwarding, commenting, or quoting, and can be supplemented with information such as timestamps and propagation intensity.

[0031] The set of user behaviors that require risk prediction is represented as a subgraph. G u =( V u , E u ),in, This represents the set of propagation relationships between users. Represents a set of user nodes, for each user node The current propagation state of a user node is represented by its node characteristics at time [time value missing]. t1 The node features are denoted as ,in, A v (t1) represents time. t1 behavioral characteristics C v (t1) represents time. t1 Content characteristics, S v (t1) represents time. t1 Structural characteristics, in the prediction time window t =[ t a , t e Within ], node features are represented as .

[0032] S103: Determine the dissemination stage based on behavioral characteristics, content characteristics, and structural characteristics; In this embodiment, during the dissemination of public opinion, the semantic changes and fluctuations in users' dissemination behavior typically exhibit alternating characteristics of stable and abrupt phases. To avoid triggering a large amount of invalid computation during stable phases and to improve the responsiveness of the prediction model during critical risk phases, the dissemination stage of the public opinion network is determined based on user behavioral and content characteristics before prediction, and then the need for risk prediction is determined according to the dissemination stage.

[0033] In some embodiments, the propagation stage is determined based on behavioral characteristics, content characteristics, and structural characteristics, including: Calculate the semantic change intensity based on the content characteristics of adjacent prediction time windows; The intensity of propagation behavior fluctuations is determined based on the behavioral and structural characteristics of adjacent prediction time windows; Calculate the significance metric for the current prediction time window based on the intensity of semantic change and the intensity of fluctuations in propagation behavior; The propagation stage is determined based on the relationship between the significance measure and the dynamic significance threshold constructed based on the significance measure.

[0034] Specifically, within the prediction time window t Internally, it collects text content that users post, forward, and quote. , n Given the number of texts, each text is encoded using a pre-trained semantic representation model to obtain a corresponding text semantic vector representation. z i Multiple text semantic vectors are aggregated to obtain content features. The semantic change intensity of the same user in adjacent prediction time windows is defined as: (1) in, e t For the current forecast time window t Content characteristics, e t-1 For the previous forecast time window t -1 content characteristics.

[0035] In some embodiments, the intensity of propagation behavior fluctuations is determined based on the behavioral and structural characteristics of adjacent prediction time windows, including: Based on the behavioral and structural characteristics of adjacent prediction time windows, the propagation state statistics are calculated. The intensity of propagation behavior fluctuations is calculated based on the propagation status statistics of adjacent prediction time windows.

[0036] In this embodiment, within two adjacent prediction time windows, propagation status statistics are calculated based on factors such as the number of times users forward data is shared, the number of comments, changes in propagation scale, and the degree of propagation diffusion. Specifically, within the prediction time window, the number of times user forward data is shared is counted to obtain the window-level forwarding intensity. N forward ( t ); Count the number of user comments to obtain window-level comment intensity. N comment ( tBased on the propagation cascade structure, the change in the number of participating users in the propagation cascade compared to the previous prediction time window is statistically analyzed to obtain the increment of the cascade scale. This cascading scale increment can be used to characterize the expansion trend of the propagation scale; based on the user forwarding relationship network, user nodes are divided into communities, resulting in multiple communities. The cross-community diffusion ratio is obtained by statistically analyzing the proportion of users who engage in cross-community propagation behavior within the predicted time window. R cross ( t The cross-community diffusion ratio can reflect the degree of diffusion of public opinion across different communities.

[0037] The window-level forwarding intensity, window-level comment intensity, cascading scale increment, and cross-community diffusion ratio are normalized, and the propagation state statistics are calculated by weighted summation, expressed as: (2) in, For prediction time window t The propagation status statistics within the prediction time window are used to describe the overall activity level and diffusion characteristics of public opinion propagation behavior within the prediction time window; , which is a weighting coefficient used to adjust the degree of influence of the corresponding propagation behavior characteristics on the propagation status statistics; This is the normalization function.

[0038] Based on the propagation status statistics of the current prediction time window Propagation state statistics for the previous prediction time window The method for calculating the intensity of propagation behavior fluctuations is as follows: (3) The significance metric for the current prediction time window is obtained by fusing the intensity of semantic change and the intensity of propagation behavior fluctuations. The method is as follows: (4) in, These are the weighting coefficients. ,and .

[0039] In some embodiments, the propagation stage is determined based on the relationship between a significance metric and a dynamic significance threshold constructed based on the significance metric, including: Calculate the absolute median based on the significance measure; The dynamic significance threshold is determined based on the absolute median and the median of the significance sequence; wherein the significance sequence consists of significance measures corresponding to multiple prediction time windows. If the significance measure is greater than or equal to the dynamic significance threshold, the propagation stage is a non-stationary propagation stage. If the significance measure is less than the dynamic significance threshold, the propagation phase is a stable propagation phase.

[0040] Specifically, based on the significance measure, the Median Absolute Deviation (MAD) is calculated, and the MAD is used for robust significance determination. (5) in, d It is a significance sequence consisting of significance measures corresponding to multiple prediction time windows. median ( d ) is the median of the significant sequence, used as a robust statistical benchmark.

[0041] The dynamic significance threshold is determined based on the absolute median and the median of the significance sequence, and is expressed as: (6) in, This represents the gating sensitivity coefficient.

[0042] Define saliency-gated signals for: (7) According to formula (7), if the prediction time window t The significance measure d(t) is greater than or equal to the constructed dynamic significance threshold. Significant gating signal This indicates that the state of public opinion dissemination within the prediction time window has changed significantly, and the dissemination is in a non-stationary phase with sudden or fluctuating events, requiring subsequent hypergraph structure modeling and risk prediction. If the significance metric d(t) is less than the dynamic significance threshold... Significant gating signal This indicates that the propagation is in a stable phase, and risk prediction is unnecessary. By outputting a suppression flag, subsequent hypergraph structure modeling and risk prediction are blocked. This triggering mechanism can identify the propagation stage of the network based on user behavior and the dynamic changes in the content, adaptively controlling the timing of hypergraph prediction model invocation. Hypergraph modeling and risk prediction are only performed during non-stationary phases of critical evolution in public opinion propagation, improving resource utilization and prediction accuracy, and reducing resource waste and ineffective interference in predictions across all stages.

[0043] S104: When the propagation phase is non-stationary, construct a hypergraph prediction model and use the hypergraph prediction model based on behavioral features, content features, and structural features to determine the risk prediction results for users.

[0044] In this embodiment, when it is determined that the public opinion dissemination network is in a non-stationary phase, a hypergraph prediction model is constructed. The hypergraph prediction model is used to predict whether a user is a potential online promoter based on the user's behavioral characteristics, content characteristics, and structural characteristics.

[0045] In some embodiments, constructing a hypergraph prediction model includes: constructing a hypergraph structure with users participating in the propagation as user nodes and a set of user nodes jointly participating in the propagation in the same propagation event or propagation cascade process as hyperedges.

[0046] In this embodiment, users participating in propagation are treated as user nodes, and the collaborative behavior of multiple users around the same propagation event or propagation cascade process is mapped as hyperedges, constructing a hypergraph structure. This hypergraph structure can reflect the group-based collaborative behavior patterns of users in higher-order propagation relationships. Specifically, when multiple users participate in the same propagation cascade, cross-community diffusion event, or propagation path segment, each user node is associated through the same hyperedge, forming a higher-order structural relationship of node-propagation unit-node, i.e., a propagation unit. One hyperedge corresponds to one propagation unit, and different propagation units can overlap by sharing user nodes, thereby characterizing the composite participation behavior of users in multiple propagation events. Through hypergraph modeling, the collaborative behavior patterns formed by multiple users around the same propagation unit can be explicitly expressed, providing a unified structural representation basis for subsequent hypergraph message transmission.

[0047] In some implementations, such as Figure 2 As shown, the hypergraph structure is denoted as... H t =( V t , E t ),in, V t V s For prediction time window t The set of user nodes within, E t Let be a set of hyperedges, each hyperedge corresponding to a type of propagation unit (including propagation cascades, cross-community diffusion events, propagation path segments, or cascaded subtrees, etc.). For any hyperedge... The set of user nodes associated with it is denoted as V ( e ) Define the hypergraph incidence matrix. : (8) Meanwhile, to emphasize the importance of the propagation unit, the weights of each hyperedge are set based on factors such as cascading scale, diffusion depth, and cross-community proportion. Construct a weight matrix based on the weights of each hyperedge. Complete the construction of the hypergraph structure.

[0048] In this embodiment, when the propagation network is determined to be in a non-stationary phase, propagation units are dynamically extracted based on the propagation cascades, cross-community diffusion events, and propagation path fragments corresponding to the current propagation phase. A hypergraph structure is adaptively constructed, thereby achieving an integrated process of prediction triggering, propagation unit extraction, and hypergraph modeling. Compared to static hypergraph modeling, this provides a hypergraph structure that better aligns with the evolution mechanism of public opinion, offering a more accurate structural foundation for subsequent risk prediction based on the hypergraph prediction model.

[0049] In some embodiments, a hypergraph prediction model is used to determine the user's risk prediction result based on behavioral features, content features, and structural features, including: Using behavioral features, content features, and structural features as the initial state of user nodes, messages are iteratively passed on the hypergraph structure. During the message transmission process, the corresponding propagation deviation is calculated based on the cross-community propagation ratio, propagation scale parameter, and diffusion depth parameter of each hyperedge. The corresponding hyperedge activation gating coefficient is determined based on the degree of propagation deviation of each hyperedge and the dynamic deviation threshold constructed based on the degree of propagation deviation. Based on the hyperedge activation gating coefficient of each hyperedge, determine the information compression or activation modulation method of each hyperedge during message transmission.

[0050] In this embodiment, after constructing the hypergraph structure, the user's behavioral features, content features, and structural features are input into the hypergraph prediction model based on the hypergraph structure. The hypergraph prediction model is then used to predict the risk of whether a user is a potential network promoter. The hypergraph structure is used to characterize the high-order collaborative relationships formed by multiple users around the same propagation unit. As the structural input of the hypergraph prediction model, it limits the connection relationship and information transmission range between user nodes and propagation units. The hypergraph prediction model is a parameterized message passing model constructed under the constraints of the hypergraph structure. It iteratively aggregates and transmits information between user nodes and the hyperedges of propagation units to achieve layer-by-layer updates of user node representations.

[0051] Based on the constraints of the aforementioned hypergraph structure, the hypergraph prediction model introduces learnable parameters, nonlinear mapping functions, and hyperedge-level gating modulation mechanisms to model and modulate the information in the propagation unit, forming a user risk prediction framework based on the propagation unit.

[0052] In a hypergraph structure, the initial state of a user node is defined by its behavioral characteristics, content characteristics, and structural characteristics. In the subsequent kDuring layer message passing, the state of the user node is represented as follows: Iterative updates are performed on the propagation unit-level structure of user node → hyperedge → user node.

[0053] To alleviate the excessive smoothing of user node representations during multi-layer propagation and improve the ability to distinguish spoofed propagation behavior, a hyperedge-level nonlinear structure activation operator based on propagation deviation is introduced during message transmission. The propagation deviation is used as a nonlinear activation control parameter for the hyperedge-level information flow, and nonlinear information compression or activation modulation is applied to the propagation unit information. This can adaptively suppress the interference of low-deviation, low-risk propagation units on user node representations during message transmission, while enhancing the structural influence of high-deviation propagation units.

[0054] Specifically, the activation operator for hyperedge-level nonlinear structures based on propagation deviation is defined as follows: (9) in, Indicates the first k Hyperedge representation obtained by aggregating layer nodes and hyperedges. For propagation deviation PDI ( e The hyperedge activation gating coefficients driven by ) are used for nonlinear information compression or activation modulation of the hyperedge information stream. e It can simultaneously represent a propagation unit and its corresponding hyperedge.

[0055] Propagation deviation PDI ( e ) is used to represent a propagation unit e The degree of structural and behavioral deviation from the normal information diffusion pattern can be calculated by weighting factors such as the cross-community propagation ratio, propagation scale parameter, and diffusion depth parameter within the propagation unit (hyperedge), and is expressed as follows: (10) in, N ccr ( e ) represents the propagation unit e The cross-community dissemination ratio is used to measure the extent to which information breaks through the original community boundaries during dissemination, through statistical dissemination units. e The percentage of forwarding relationships involving cross-community propagation was obtained. Cross-community propagation refers to a forwarding relationship where the source user node and the target user node belong to different user communities. Specifically, the propagation unit... e All forwarding relationships within the system are considered as the statistical population. The number of forwarding relationships in which the source user node and the target user node belong to different communities is counted, and this number is then compared with the number of propagation units. e The ratio of the total number of internal forwarding relationships is used as the cross-community propagation ratio, which is obtained after normalization.N outDegree ( e ) represents the propagation unit e The propagation scale parameter is used to characterize the propagation activity of propagation units within the current prediction time window, through statistical propagation units. e The metrics, such as the number of user nodes participating in the propagation, the number of forwarding edges, or the cumulative number of forwards, are obtained and then normalized. N Depth ( e The propagation depth parameter () represents the propagation path and reflects the longitudinal diffusion capability of information during propagation. It can be measured within the propagation unit. e In the corresponding forwarding cascade structure, the maximum path length or maximum number of levels formed by its downward propagation is calculated with the initial publishing node as the root node, and then normalized. These are the corresponding weighting coefficients.

[0056] In some embodiments, the corresponding hyperedge activation gating coefficient is determined based on the propagation deviation degree of each hyperedge and a dynamic deviation threshold constructed based on the propagation deviation degree, including: Calculate the corresponding absolute median based on the propagation deviation of each hyperedge; Based on the absolute median and the median of the propagation deviation set, a corresponding dynamic deviation threshold is constructed; the propagation deviation set is the set of propagation deviations of all hyperedges within the prediction time window. Calculate the corresponding hyperedge activation gating coefficient based on the propagation deviation degree of each hyperedge and the corresponding dynamic deviation threshold.

[0057] The absolute median is calculated based on the propagation deviation and is expressed as follows: (11) in, PDI It is the set of propagation deviations of all propagation units within the prediction time window, i.e. , median ( PDI ) represents the median of the propagation deviation across all propagation units, which serves as a robust statistical benchmark for the level of propagation deviation.

[0058] Based on the absolute median and the median of the propagation deviation set, a corresponding dynamic deviation threshold is constructed. , represented as: (12) in, k p This is the gating sensitivity coefficient, used to adjust the response strength of the dynamic deviation threshold to changes in the propagation deviation. Specifically, k pControl propagation deviation threshold Compared to the amplification ratio of the overall dispersion of the propagation deviation distribution, when k p When the value is small, the dynamic deviation threshold is closer to the median level of propagation deviation, causing more propagation units to be identified as high-deviation propagation units, thereby increasing the sensitivity of hyperedge activation; when k p When the value is large, the dynamic deviation threshold increases significantly with the degree of dispersion, triggering hyperedge activation only for propagation units with propagation deviations significantly higher than the overall level, which helps suppress interference from noise propagation units. This is achieved by introducing a gating sensitivity coefficient. k p It can adaptively adjust the superedge activation gating strategy according to the differences in the propagation deviation distribution under different public opinion scenarios, while ensuring sensitivity to high-risk propagation units and improving the stability and robustness of the overall gating mechanism.

[0059] Calculate the hyperedge activation gating coefficient based on the propagation deviation and the corresponding dynamic deviation threshold. a e The method is as follows: (13) in, For the Sigmoid function, The temperature coefficient is used to adjust the sensitivity and rate of change of the hyperedge activation function, which can control the propagation deviation. PDI ( e Relative to dynamic deviation threshold The steepness of the activation curve, when the temperature coefficient... When the value is large, the hyperedge activation function is more sensitive to changes in propagation deviation, allowing propagation units with high propagation deviation to be quickly activated and participate in information transmission after exceeding the dynamic deviation threshold; when the temperature coefficient... When the value is small, the response curve of the hyperedge activation function is flatter, which helps to suppress gated false activation caused by slight fluctuations in the propagation deviation near the dynamic deviation threshold. This is achieved by introducing a temperature coefficient. It can amplify the structural influence of key propagation units when the propagation deviation is significant, while maintaining the stability of the model when the propagation deviation is close to the dynamic deviation threshold, thereby improving the ability to distinguish spoofed propagation behavior.

[0060] like Figure 2As shown, in some embodiments, the message passing process in the hypergraph prediction model employs a two-stage aggregation mechanism of user node → hyperedge → user node to explicitly characterize the high-order collaborative behavior of multiple users surrounding the same propagation unit. Specifically, the user node → hyperedge aggregation stage aggregates the representations of multiple user nodes participating in the same propagation unit to generate the hyperedge representation of the corresponding propagation unit, thus characterizing the overall structure and behavioral features of the propagation unit at the current level. The hyperedge → user node backhaul stage redistributes the compressed or activated modulated information of the propagation unit to the associated user nodes, enabling the user node representation to perceive the structural deviations and risk signals carried by the different propagation units it participates in. Through the alternating execution of user node → hyperedge aggregation and hyperedge → user node backhaul, the user node representation gradually integrates individual behavioral characteristics and propagation unit-level structural characteristics during the multi-layer propagation process, thereby achieving high-order modeling of the collaborative behavior of network promoter groups.

[0061] In some ways, in the first k In the aggregation phase of layer message passing between user nodes and hyperedges, the user node is denoted as... The aggregate representation of user nodes and hyperedges can be defined as: (14) in, For learnable transformations or nonlinear mappings, For the first k An aggregate representation of messages passed between layers.

[0062] During the backhaul phase between the hyperedge and user nodes, a hyperedge activation gating coefficient is introduced. a e Nonlinear compression or activation modulation is performed using the following method: (15) in, E ( v ) contains user nodes v The set of superedges. For user nodes v In the k The amount of intermediate messages received during the layer computation process is used to represent the result of aggregating node association information based on the hypergraph structure.

[0063] User node updated to: (16) in, W (k) For the first k The layer user node represents the linear transformation parameter matrix of the update phase, used to represent the upper layer user node. With the intermediate aggregation message it received Feature mapping and recombination are performed to achieve layer-by-layer updates and enhancements of user node representations within the hypergraph structure; It is a non-linear activation function. This is a fusion function for user node representation and aggregated messages. For example, it can be implemented by concatenating features and then performing a linear transformation.

[0064] During the backpropagation phase between the hyperedge and user nodes, by employing a hyperedge-level nonlinear structure activation operator based on propagation deviation, nonlinear suppression can be implemented on low-deviation propagation units, reducing their homogenization pull on user node representations and suppressing the excessive smoothing phenomenon of user node representations in multi-layer propagation. At the same time, activation and amplification are implemented on propagation units with high propagation deviation, so that early structural deviation signals are not submerged by normal propagation units, effectively suppressing the camouflage behavior of abnormal propagation signals being masked by a large number of low-deviation and normal propagation units.

[0065] For each user node v If its final user node is represented as The risk prediction results output by the risk prediction head of the Hypergraph prediction model are as follows: (17) In some methods, when the propagation phase is in a non-stationary phase (significant gating signal) When =1), the user stage set is output using the Hypergraph prediction model. V u The risk prediction results for all users can be further sorted from high to low according to the risk prediction results (predicted risk probability) to form a risk user list, which can be used to provide early warning and intervention decision support for potential network pushers in subsequent prediction time windows.

[0066] In some embodiments, after user node representation learning is completed through message passing, to further improve the hypergraph prediction model's ability to predict potential online promoters in advance and to enhance its ability to distinguish spoofing behavior, an Information Manipulation Index (IMI) is introduced during the model training phase as a user-level risk prior and gradient adjustment basis. A cost-sensitive loss function is constructed, and a contrastive learning mechanism combining hard negative sample mining is used to adjust the model's optimization objective in a targeted manner. During training, by imposing a higher misjudgment cost on user nodes with high IMI, the distinction constraint between samples with similar behavioral representations but significantly different risks is strengthened, shifting the model's optimization objective from simply improving recognition accuracy to maximizing lead-time for early warning capabilities. Lead-time represents the relative advance time by which the model makes risk predictions about potential online promoters before the spread of public opinion enters a significant outbreak or clear manipulation stage, characterizing the model's ability to identify high-risk users in the early stages of dissemination.

[0067] In some methods, the information manipulation index can be calculated by weighting the content and structural features of the user node's historical propagation behavior, and is expressed as: (18) in, IMI ( v ) represents a user node v The information manipulation index is used to represent the potential manipulation tendency of users in the historical dissemination of public opinion. As a risk prior in the training phase, it introduces cost-sensitive loss and difficult negative sample screening process. Indicates the out-degree of a user node, used to reflect the range of its propagation influence; It represents the betweenness centrality of user nodes, used to measure their structural mediating role; This represents the emotional intensity within the user's content features, with all features being normalized. These are the weighting coefficients.

[0068] Risk prior weights for constructing the information manipulation index c v for: (19) in, A constant greater than 0, used to adjust the information manipulation index. IMI(v) The amplification strength in the prior risk weights. When When the value is larger, the risk prior weight corresponding to users with high information manipulation index increases more significantly, thus obtaining a higher cost for misjudgment during training; when When the value is small, the adjustment of the risk prior weight is more gradual, which helps to achieve a balance between model stability and risk sensitivity.

[0069] In binary classification training, cost-sensitive loss can be defined as: (20) in, y v For user nodes v The real risk label is used to indicate whether the user is a network promoter, serving as a supervisory signal during the model training phase.

[0070] By incorporating the information manipulation index as a cost coefficient into the classification loss term, users with high information manipulation indices can receive a greater gradient penalty when they are missed, thereby generating a larger parameter update magnitude during backpropagation and guiding the model to pay more attention to potentially high-risk users.

[0071] In some embodiments, contrastive learning is also introduced during the training phase to enhance the ability to distinguish between spoofed similarities but different risks. By comparing the similarity between the representation vectors of different user nodes, discriminative modeling of user behavior representations in the representation space is achieved. Specifically, the similarity between two users is calculated as follows: (twenty one) in, z v , z j user nodes v , j The user node representation vector, s vj For user nodes v , j The similarity.

[0072] Based on the similarity of user nodes, construct an InfoNCE-type loss function: (twenty two) Among them, positive samples v + Indicates the relationship with user nodes v User nodes exhibiting similar propagation behavior patterns or structural participation methods within the prediction time window can originate from users participating in the same or similar propagation units and possessing similar propagation activity characteristics. This serves to characterize the consistency constraints of user representations under normal or similar propagation patterns. Negative sample set. Indicates the relationship with user nodes vUser nodes that share some similarity in the representation space but differ significantly in risk characteristics or manipulative tendencies. By introducing negative sample constraints, the model can avoid judging solely based on representational similarity, thereby enhancing its ability to distinguish users who appear similar but have different risks. Temperature parameter. It is used to adjust the smoothness of the similarity distribution in the contrastive loss function, and affects the model's sensitivity to the difference between positive and negative samples by controlling the similarity scaling, thereby achieving a balance between discriminative ability and training stability.

[0073] In some methods, during the screening of difficult negative samples, the process is based on user nodes. i The similarity between related candidate user nodes is ranked, and a predetermined number of candidate user nodes with similarity higher than a similarity threshold or ranking high are selected as the set of user nodes to participate in the hard negative sample screening. Based on the selected set of user nodes, a difference constraint of information manipulation index is introduced, and information manipulation index risk differences exceeding a threshold are selected from the highly similar candidate users. The samples are used as difficult negative samples to enhance the model's ability to distinguish spoofed users. The negative sample set can be defined as: (twenty three) in, This indicates that the selected set of user nodes contains user nodes. v With candidate user nodes j A set of similarity values used to characterize the similarity with user nodes. v The closest subset of users in the behavioral representation space; M For similarity threshold, IMI ( v ), IMI ( j () are user nodes v , j Information manipulation index.

[0074] The overall objective loss function of the Hypergraph prediction model is: (twenty four) in, The coefficient is used for weighting. A cost-sensitive contrastive learning mechanism guided by an information manipulation index makes the model more sensitive to high-risk users during the training phase and combats spoofing behavior through difficult negative samples, thereby improving its predictive ability and stability.

[0075] In some embodiments, the proposed method is compared with existing random methods and methods based on propagation structure centers (in_degree), as shown in Table 1.

[0076] Table 1. Performance comparison of different methods on the network pusher prediction task.

[0077] In terms of overall predictive performance metrics, the method in this application outperforms the comparative methods in both AUROC and AUPRC. Specifically, the AUROC values of the random method and the method based on propagation structure centers (in_degree) are close to 0.5, indicating that their overall ability to distinguish between potential online promoters and ordinary users is close to the level of random discrimination. However, the AUROC value of the method in this application is improved to 0.6149, indicating that it has a stronger risk ranking ability at the overall user level and can more effectively distinguish between potential online promoters and ordinary users. At the same time, the improvement in the AUPRC metric of the method in this application is more significant, indicating that in actual public opinion scenarios where the sample proportion of potential online promoters is relatively low, the model's ability to identify and rank high-risk users is significantly enhanced.

[0078] Regarding Top-K prediction metrics, the method in this application achieves a significant improvement in Recall@20, indicating that under the constraint of fixed early warning resources, it can cover more real high-risk users in advance. Meanwhile, Precision@20 remains at a reasonable level while ensuring a significant improvement in risk coverage, demonstrating that the model does not sacrifice prediction accuracy for improved recall. These results indicate that the method in this application has higher practical application value in network pusher prediction scenarios where early warning and risk coverage are the core objectives.

[0079] like Figure 3 As shown, in terms of prediction timeliness, this application can provide early warning of potential online promoters in the early stages of public opinion dissemination. Comparative results show that, under strict cross-time window prediction settings, this application can cover a large proportion of genuine promoter users within a relatively short lead time. A comparison of lead-time distributions reveals that this application can identify more than half of the potential online promoters approximately 20 days in advance, while in-degree methods typically require the propagation structure to be fully formed before gradually identifying relevant users. Therefore, this application overcomes the limitation of traditional structure-centrality methods in post-hoc identification, achieving true early prediction.

[0080] Specifically, by introducing structural constraints and nonlinear activation mechanisms based on the Propagation Deviation Index (PDI) during the propagation structure modeling and node representation learning process, this application can effectively suppress the structural masking of abnormal propagation signals by a large number of normal propagation units in the early stage of public opinion propagation. This allows the model to pay more attention to key propagation units and key neighbor nodes with abnormal propagation indication significance during information aggregation and representation update, thereby significantly improving the sensitivity and accuracy of identifying potential network pusher risk characteristics and alleviating the problem of excessive smoothing of node representations during multi-layer propagation modeling.

[0081] Meanwhile, by introducing a prediction triggering and judgment mechanism based on median absolute deviation (MAD), it is possible to avoid continuously executing ineffective prediction analysis during periods of stable or low-risk public opinion. This allows the risk prediction process to adaptively focus on key stages of public opinion evolution, enabling on-demand risk assessment and computing power scheduling, thereby improving the timeliness, stability, and overall computational efficiency of the prediction results.

[0082] Furthermore, the Information Manipulation Index (IMI) is introduced as a priori risk constraint and cost-sensitive adjustment factor during the risk prediction and model training stages. This enhances the ability to characterize users with long-term manipulation tendencies but short-term disguised behavior. It also makes the model pay more attention to the cost of missing high-risk users during the optimization process, thereby promoting the shift of prediction objectives from ex-post identification to early warning orientation, which is more in line with the risk evolution law in real public opinion dissemination scenarios.

[0083] In terms of practical application effectiveness, the method of this application can identify potential high-risk users in the early stage of public opinion dissemination, providing a more sufficient time window for public opinion risk assessment and intervention, enabling relevant management departments or platforms to take targeted measures before public opinion spreads further, thereby effectively reducing the possibility of the expansion of public opinion risks, and has significant practical application value and promotion significance.

[0084] It should be noted that the method in this embodiment can be executed by a single device, such as a computer or server. The method can also be applied in a distributed scenario, where multiple devices cooperate to complete the task. In such a distributed scenario, one of these devices may execute only one or more steps of the method in this embodiment, and the multiple devices will interact with each other to complete the method described.

[0085] It should be noted that the above description describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims may be performed in a different order than that shown in the embodiments and still achieve the desired results. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0086] like Figure 4 As shown in the figure, this application provides a network pusher prediction device, including: The acquisition module is used to acquire user dissemination behavior data, dissemination content data, and dissemination structure data. The feature extraction module is used to extract features from the dissemination behavior data, dissemination content data, and dissemination structure data respectively, to obtain behavioral features, content features, and structural features. The triggering module is used to determine the propagation stage based on behavioral characteristics, content characteristics, and structural characteristics; The prediction module is used to construct a hypergraph prediction model when the propagation phase is non-stationary. The hypergraph prediction model is used to determine the user's risk prediction result based on behavioral features, content features, and structural features.

[0087] For ease of description, the above devices are described in terms of function, divided into various modules. Of course, in implementing the embodiments of this application, the functions of each module can be implemented in one or more software and / or hardware.

[0088] The apparatus described above is used to implement the corresponding methods in the foregoing embodiments and has the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0089] Figure 5 This embodiment illustrates a more specific hardware structure of an electronic device. The device may include a processor 1010, a memory 1020, an input / output interface 1030, a communication interface 1040, and a bus 1050. The processor 1010, memory 1020, input / output interface 1030, and communication interface 1040 are interconnected internally via the bus 1050.

[0090] The processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this specification.

[0091] The memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 1020 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented by software or firmware, the relevant program code is stored in the memory 1020 and is called and executed by the processor 1010.

[0092] The input / output interface 1030 is used to connect input / output modules to realize information input and output. Input / output modules can be configured as components within the device (not shown in the figure) or externally connected to the device to provide corresponding functions. Input devices may include keyboards, mice, touchscreens, microphones, various sensors, etc., while output devices may include displays, speakers, vibrators, indicator lights, etc.

[0093] The communication interface 1040 is used to connect a communication module (not shown in the figure) to enable communication between this device and other devices. The communication module can communicate via wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).

[0094] Bus 1050 includes a pathway for transmitting information between various components of the device, such as processor 1010, memory 1020, input / output interface 1030, and communication interface 1040.

[0095] It should be noted that although the above-described device only shows the processor 1010, memory 1020, input / output interface 1030, communication interface 1040, and bus 1050, in specific implementations, the device may also include other components necessary for normal operation. Furthermore, those skilled in the art will understand that the above-described device may only include the components necessary for implementing the embodiments of this specification, and not necessarily all the components shown in the figures.

[0096] The electronic devices described above are used to implement the corresponding methods in the foregoing embodiments and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0097] The computer-readable medium of this embodiment includes permanent and non-permanent, removable and non-removable media, and information storage can be implemented by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transfer medium that can be used to store information accessible by a computing device.

[0098] Those skilled in the art should understand that the discussion of any of the above embodiments is merely exemplary and is not intended to imply that the scope of this disclosure (including the claims) is limited to these examples; within the framework of this disclosure, the technical features of the above embodiments or different embodiments can also be combined, the steps can be implemented in any order, and there are many other variations of different aspects of the embodiments of this application as described above, which are not provided in the details for the sake of brevity.

[0099] Additionally, to simplify the description and discussion, and to avoid obscuring the embodiments of this application, the well-known power / ground connections to integrated circuit (IC) chips and other components may or may not be shown in the provided drawings. Furthermore, the apparatus may be shown in block diagram form to avoid obscuring the embodiments of this application, and this also takes into account the fact that the details of the implementation of these block diagram apparatuses are highly dependent on the platform on which the embodiments of this application will be implemented (i.e., these details should be fully understood by those skilled in the art). While specific details (e.g., circuits) have been set forth to describe exemplary embodiments of this disclosure, it will be apparent to those skilled in the art that the embodiments of this application can be implemented without these specific details or with variations thereof. Therefore, these descriptions should be considered illustrative rather than restrictive.

[0100] Although this disclosure has been described in conjunction with specific embodiments thereof, many substitutions, modifications, and variations of these embodiments will be apparent to those skilled in the art from the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may be used with the embodiments discussed.

[0101] The embodiments of this application are intended to cover all such substitutions, modifications, and variations that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the embodiments of this application should be included within the protection scope of this disclosure.

Claims

1. A method for predicting online promoters, characterized in that, include: Acquire user dissemination behavior data, dissemination content data, and dissemination structure data; Features are extracted from the propagation behavior data, propagation content data, and propagation structure data respectively to obtain behavioral features, content features, and structural features; Based on the aforementioned behavioral, content, and structural characteristics, the propagation stage is determined; When the propagation phase is a non-stationary phase, a hypergraph prediction model is constructed, and the user's risk prediction result is determined based on the behavioral features, content features, and structural features.

2. The method according to claim 1, characterized in that, Based on the aforementioned behavioral, content, and structural characteristics, the dissemination stages are determined, including: Calculate the semantic change intensity based on the content characteristics of adjacent prediction time windows; The intensity of propagation behavior fluctuations is determined based on the behavioral and structural characteristics of adjacent prediction time windows; Calculate the significance metric for the current prediction time window based on the intensity of semantic change and the intensity of propagation behavior fluctuations. The propagation stage is determined based on the relationship between the significance metric and the dynamic significance threshold constructed based on the significance metric.

3. The method according to claim 2, characterized in that, The determination of the propagation behavior fluctuation intensity based on the behavioral and structural characteristics of adjacent prediction time windows includes: Based on the behavioral and structural characteristics of adjacent prediction time windows, the propagation state statistics are calculated. The intensity of propagation behavior fluctuations is calculated based on the propagation status statistics of adjacent prediction time windows.

4. The method according to claim 2 or 3, characterized in that, The propagation stage is determined based on the relationship between the significance metric and a dynamic significance threshold constructed based on the significance metric, including: Calculate the absolute median based on the significance measure; A dynamic significance threshold is determined based on the absolute median and the median of the significance sequence; wherein the significance sequence consists of significance measures corresponding to multiple prediction time windows; If the significance metric is greater than or equal to the dynamic significance threshold, the propagation phase is a non-stationary phase; If the significance metric is less than the dynamic significance threshold, the propagation phase is a stationary phase.

5. The method according to claim 1, characterized in that, The construction of the hypergraph prediction model includes: Construct a hypergraph structure with users participating in the propagation as user nodes and the set of user nodes that jointly participate in the propagation in the same propagation event or propagation cascade process as hyperedges.

6. The method according to claim 5, characterized in that, The hypergraph prediction model, based on behavioral, content, and structural features, determines user risk prediction results, including: Using the aforementioned behavioral features, content features, and structural features as the initial state of user nodes, messages are iteratively transmitted on the hypergraph structure. During the message transmission process, the corresponding propagation deviation is calculated based on the cross-community propagation ratio, propagation scale parameter, and diffusion depth parameter of each hyperedge. The corresponding hyperedge activation gating coefficient is determined based on the degree of propagation deviation of each hyperedge and the dynamic deviation threshold constructed based on the degree of propagation deviation. Based on the hyperedge activation gating coefficient of each hyperedge, determine the information compression or activation modulation method of each hyperedge during message transmission.

7. The method according to claim 6, characterized in that, The step of determining the corresponding hyperedge activation gating coefficient based on the propagation deviation degree of each hyperedge and the dynamic deviation threshold constructed based on the propagation deviation degree includes: Calculate the corresponding absolute median based on the propagation deviation of each hyperedge; Based on the absolute median and the median of the propagation deviation set, the corresponding dynamic deviation threshold is determined; the propagation deviation set is the set of propagation deviations of all superedges within the prediction time window. Calculate the corresponding hyperedge activation gating coefficient based on the propagation deviation degree of each hyperedge and the corresponding dynamic deviation threshold.

8. The method according to claim 7, characterized in that, The message transmission process of the hypergraph structure includes the user node and hyperedge aggregation stage and the hyperedge and user node back transmission stage. In the user node and hyperedge aggregation stage, the aggregation of user nodes and hyperedges is represented as follows: （14） in, For learnable transformations or nonlinear mappings, For user nodes v In the k The user node representing message passing at layer -1 is... For the first k Aggregate representation of messages passed between layers; V ( e ) for hyperedge e The set of associated user nodes; During the backhaul phase between the hyperedge and the user node, compression or activation modulation is performed using the hyperedge activation gating coefficient, as follows: （15） in, E ( v ) contains user nodes v The set of superedges a e For super-edge e The hyperedge activation gating coefficient, For super-edge e The weights; For user nodes v In the k The amount of intermediate messages received during the message passing process at each layer; User node updated to: （16） in, W (k) For the first k The linear transformation parameter matrix during the user node update phase of the layer. It is a non-linear activation function. This is the fusion function.

9. The method according to claim 1, characterized in that, The overall objective loss function of the hypergraph prediction model is: in, As a weighting factor, c v For user nodes v The prior risk weight of the information manipulation index y v For user nodes v The true risk label, The risk prediction results output by the model. s vj For user nodes v , j Similarity between them For temperature coefficient, v + As a positive sample, This is the set of negative samples.

10. A device for predicting online promoters, characterized in that, include: The acquisition module is used to acquire user dissemination behavior data, dissemination content data, and dissemination structure data. The feature extraction module is used to extract features from the propagation behavior data, propagation content data, and propagation structure data respectively to obtain behavioral features, content features, and structural features. The triggering module is used to determine the propagation stage based on the behavioral and content characteristics; The prediction module is used to construct a hypergraph prediction model when the propagation phase is a non-stationary phase, and use the hypergraph prediction model to determine the user's risk prediction result based on the behavioral features, content features and structural features.