Cross-platform malicious user identification method based on two-way adaptive cross attention

By acquiring behavioral and textual data from code collaboration platforms and social networking platforms, and using a dual-path adaptive cross-attention method to generate user-level fusion feature vectors, the problem of lag in malicious user detection in existing technologies is solved, enabling timely and accurate identification of malicious users.

CN122241679APending Publication Date: 2026-06-19WUHAN UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WUHAN UNIV OF TECH
Filing Date
2026-01-23
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing methods for detecting malicious users cannot provide early warnings in the early stages of the attack chain, and therefore cannot meet the security requirement of early identification of malicious users.

Method used

By acquiring behavioral and textual data of target users on code collaboration platforms and social networking platforms, a dual-path adaptive cross-attention method is used for cross-platform fusion to generate user-level fused feature vectors, and a classifier is used to identify malicious users.

Benefits of technology

It enables timely and accurate identification of malicious users, provides early warnings before the emergence of substantial attack products such as malicious code, and significantly enhances the ability to identify users with spoofing and sparse behavior.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241679A_ABST
    Figure CN122241679A_ABST
Patent Text Reader

Abstract

This invention provides a cross-platform malicious user identification method based on dual-path adaptive cross-attention, relating to the field of information security technology. The method includes the following steps: acquiring behavioral data of a target user on a code collaboration platform and textual data of a target user on a social networking platform, wherein the target user is determined through publicly available associations between the code collaboration platform and the social networking platform; performing time-based alignment and encoding on the behavioral and textual data to obtain behavioral feature sequences and textual semantic sequences; using the behavioral feature sequences as queries and the textual semantic sequences as keys and values, fusing the data through two cross-attention paths—near-window and global—and dynamically adjusting the fusion weights based on the amount of valid information retrieved in the near-window path to obtain a user-level fused feature vector; inputting the user-level fused feature vector into a classifier for training and recognition, and outputting a judgment result indicating whether the target user is a malicious user.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of information security technology, and in particular to a cross-platform malicious user identification method based on dual-path adaptive cross-attention. Background Technology

[0002] With the rapid development of the open-source software ecosystem, open-source collaboration platforms such as GitHub have attracted a large number of developers, but they have also bred malicious users. These malicious users may upload malicious code, launch supply chain attacks, or publish false information, posing a serious threat to the security of the software supply chain.

[0003] Existing malicious user detection methods mainly target existing malicious repositories, malicious code, and other visible outputs, and make judgments through rule matching, static code analysis, or signature features.

[0004] However, this solution suffers from a lag: its detection begins after malicious content is generated, failing to provide early warnings in the early stages of the attack chain, such as planning, discussion, and tool preparation. This makes it ineffective against advanced persistent threats characterized by stealth and long-term planning, and fails to meet the security requirement of early identification of malicious users. Summary of the Invention

[0005] The purpose of this invention is to provide a cross-platform malicious user identification method based on dual-path adaptive cross-attention, so as to solve the problem of lag in existing malicious user detection methods mentioned in the background art.

[0006] To achieve the above objectives, the present invention provides the following technical solution: a cross-platform malicious user identification method based on dual-path adaptive cross-attention, comprising the following steps: acquiring behavioral data of a target user on a code collaboration platform and text data of a target user on a social networking platform, wherein the target user is determined through the publicly disclosed association between the code collaboration platform and the social networking platform; performing time-based alignment and encoding processing on the behavioral data and the text data to obtain a behavioral feature sequence and a text semantic sequence; using the behavioral feature sequence as a query and the text semantic sequence as a key and value, fusing the data through two cross-attention paths, namely a near window and a global path, and dynamically adjusting the fusion weights based on the amount of effective information retrieved in the near window path, to obtain a user-level fusion feature vector; inputting the user-level fusion feature vector into a classifier for training and recognition, and outputting a judgment result on whether the target user is a malicious user.

[0007] Optionally, the step of aligning the behavioral data and the text data with a time reference specifically includes: using the time span of the behavioral data as a reference interval, linearly discretizing the timeline into several consecutive time steps; mapping each behavioral event in the behavioral data to the corresponding time step according to its occurrence timestamp; mapping each piece of text content in the text data to the corresponding time step according to its publication timestamp; and using a mask to identify cases where no behavioral event or text content is observed at any time step.

[0008] Optionally, the encoding process specifically includes: extracting four types of feature sequences of the behavioral event: event type, normalized time interval, associated warehouse information, and user context profile; mapping the four types of feature sequences to the same hidden space through a linear projection layer; inputting the projected features into a multi-layer encoder, where each encoder sequentially performs multi-head self-attention calculation within each type of sequence, cross-attention fusion across the four types of features at the same time step, and residual connection and layer normalization processing; concatenating and linearly mapping the final output four-way features along the feature dimension to output the behavioral feature sequence; using a pre-trained language model to semantically encode the text content mapped to each time step, and pooling multiple text content vectors within the same time step to output the text semantic sequence.

[0009] Optionally, the step of using the behavioral feature sequence as the query and the text semantic sequence as the key and value, fusing them through two cross-attention paths—near window and global—and dynamically adjusting the fusion weights based on the number of valid information retrieved in the near window path, specifically includes: projecting the behavioral feature sequence and the text semantic sequence onto a common feature space through linear layers and superimposing sinusoidal positional encodings to generate query vectors, key vectors, and value vectors; constructing near window attention paths and global attention paths based on valid bitmasks; calculating the attention weights of the query vector and key vector in the near window attention paths and global attention paths respectively, and generating corresponding path output vectors; for each time step, calculating adaptive weights based on the number of valid candidates in the near window path, and weighting and fusing the two output vectors to obtain the cross-modal fusion feature representation for each time step; and converging the fusion feature representations of all time steps along the time dimension to generate the user-level fusion feature vector.

[0010] Optionally, the steps of constructing the near-window attention path and the global attention path specifically include: constructing a visibility mask matrix for the near-window attention path, such that for each time step i in the action sequence, only key-value pairs in the text sequence that are located within the local time window and have valid masks are visible; constructing a visibility mask matrix for the global attention path, such that for each time step i in the action sequence, all key-value pairs in the text sequence with valid masks are visible; and calculating attention weights based on the visibility mask matrices in the two paths, performing the calculation only at the visible positions of the matrix.

[0011] Optionally, before the weighted fusion, the highest attention score calculated by the near window attention path and the global attention path at the current time step i is determined; if the highest attention score is lower than a preset threshold, the fusion is abandoned and the fusion output of that time step is reverted to the original behavioral feature sequence.

[0012] Optionally, the step of inputting the user-level fused feature vector into the classifier for training and recognition specifically includes: concatenating the user-level fused feature vector with the static and statistical features of the target user extracted from the code collaboration platform to form the final feature vector; using a gradient boosting decision tree model as the classifier, training the classifier on the training set, and optimizing the model hyperparameters and classification decision threshold through the validation set; inputting the final feature vector of the target user to be identified into the trained classifier to obtain the probability that it belongs to a malicious user, and outputting the final discrimination result by comparing it with the decision threshold.

[0013] On the other hand, the present invention also provides a cross-platform malicious user identification system based on dual-path adaptive cross-attention, comprising: an acquisition module for acquiring behavioral data of a target user on a code collaboration platform and text data of a target user on a social network platform, wherein the target user is determined through the publicly disclosed association between the code collaboration platform and the social network platform; an alignment and encoding module for performing time-based alignment and encoding processing on the behavioral data and the text data to obtain a behavioral feature sequence and a text semantic sequence; a feature fusion module for using the behavioral feature sequence as a query and the text semantic sequence as a key and value, fusing them through two cross-attention paths, namely a near window and a global path, and dynamically adjusting the fusion weights according to the amount of effective information retrieved in the near window path, to obtain a user-level fused feature vector; and an identification module for inputting the user-level fused feature vector into a classifier for training and identification, and outputting a judgment result on whether the target user is a malicious user.

[0014] On the other hand, the present invention also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the above-described cross-platform malicious user identification method based on dual-path adaptive cross-attention.

[0015] On the other hand, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the above-described cross-platform malicious user identification method based on dual-path adaptive cross-attention.

[0016] Compared with the prior art, the beneficial effects of the present invention are: This application leverages publicly available inter-platform relationships to obtain the correlation between the behavior and text data of the same target user on code collaboration platforms and social networking platforms. By aligning with a unified time benchmark, heterogeneous and asynchronous data from both platforms are mapped onto the same timeline. Further encoding processing transforms the original behavioral events and text content into machine-understandable and computable high-dimensional feature sequences, eliminating data heterogeneity and extracting and preserving deep spatiotemporal semantic information. Through two parallel attention paths—near-window and global—intelligent and dynamic association between behavioral sequences and text semantic sequences is achieved. The near-window path captures local temporal correlations, while the global path addresses information sparsity and long-range dependencies. The fusion weights are adaptively adjusted based on the effective information content of the near-window path, effectively filtering out the semantic evidence most relevant to the behavior and suppressing irrelevant noise. This results in a highly condensed, semantically consistent user-level fusion feature vector that accurately reflects user behavior. Ultimately, by leveraging the classifier's powerful discriminative capabilities, the system achieves automated and high-precision identification of malicious users, significantly enhancing the ability to identify users exhibiting spoofing and sparse behavior. This enables early warnings before the emergence of substantial attack artifacts such as malicious code, thus achieving more timely and accurate identification of malicious users. Attached Figure Description

[0017] Figure 1 This is a schematic diagram of the method steps of the present invention.

[0018] Figure 2 This is a schematic diagram of the system structure of the present invention.

[0019] In the diagram: 10 - Acquisition module, 20 - Alignment encoding module, 30 - Feature fusion module, 40 - Recognition module. Detailed Implementation

[0020] The present invention will now be clearly and completely described in conjunction with the accompanying drawings of the embodiments thereof. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them.

[0021] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be used interchangeably where appropriate for the embodiments of this application described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0022] Those skilled in the art will understand that, unless explicitly stated otherwise, the singular forms “a,” “an,” “the,” and “the” used herein may also include the plural forms. It should be further understood that the term “comprising” as used in the specification of this application means the presence of features, integers, steps, operations, elements, and / or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof. It should be understood that when we say an element is “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, or there may be intermediate elements. Furthermore, “connected” or “coupled” as used herein can include wireless connections or wireless coupling. The term “and / or” as used herein includes all or any units and all combinations of one or more associated listed items.

[0023] It will be understood by those skilled in the art that, unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains. It should also be understood that terms such as those defined in general dictionaries should be understood to have the same meaning as in the context of the prior art, and should not be interpreted in an idealized or overly formal sense unless specifically defined as herein.

[0024] It should be understood that the sequence number and size of each step in this embodiment do not imply the order of execution. The execution order of each process is determined by its function and internal logic, and should not constitute any limitation on the implementation process of this application embodiment.

[0025] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.

[0026] Please refer to Figure 1This invention discloses a cross-platform malicious user identification method based on dual-path adaptive cross-attention, comprising the following steps: acquiring behavioral data of a target user on a code collaboration platform and text data of a target user on a social networking platform, wherein the target user is determined through the publicly disclosed association between the code collaboration platform and the social networking platform; performing time-based alignment and encoding processing on the behavioral data and the text data to obtain a behavioral feature sequence and a text semantic sequence; using the behavioral feature sequence as a query and the text semantic sequence as a key and value, fusing the data through two cross-attention paths—near window and global—and dynamically adjusting the fusion weights based on the amount of effective information retrieved in the near window path to obtain a user-level fusion feature vector; inputting the user-level fusion feature vector into a classifier for training and recognition, and outputting a judgment result on whether the target user is a malicious user.

[0027] Specifically, the code collaboration platform mentioned in this application includes GitHub, and the social networking platforms include Twitter and Weibo; the publicly disclosed associations include the association information publicly disclosed by platform users, and the account binding association information publicly disclosed by the code collaboration platform. To achieve cross-source fusion between GitHub and Twitter / X, this application obtains the target user's personal profile through the public API of the code collaboration platform GitHub, and parses out the publicly associated Twitter / X accounts. Subsequently, the application crawls the user's behavioral event stream on GitHub and the text content of the tweets posted on Twitter through the API interfaces of the two platforms respectively. Next, using the time span of the GitHub behavioral data as the main benchmark, the time axis is unified to UTC time and linearly discretized into a time step sequence of fixed length L. The data from the two platforms are mapped to the corresponding time steps according to the timestamp, and time steps without data are masked. Then, a Transformer-based behavioral sequence encoder is used to encode the GitHub behavioral sequence to obtain a behavioral feature sequence, and the tweet text is encoded and pooled using the tweet domain pre-trained model BERTweet to obtain a text semantic sequence. Furthermore, using the behavioral feature sequence as the query and the text semantic sequence as the key and value, a dual-path adaptive cross-attention fusion is performed to generate a user-level fused feature vector. Finally, this vector is concatenated with the user's static features and input into a CatBoost classifier for training and recognition, outputting the malicious user identification result.

[0028] This application leverages publicly available inter-platform relationships to obtain the correlation between the behavior and text data of the same target user on code collaboration platforms and social networking platforms. By aligning with a unified time benchmark, heterogeneous and asynchronous data from both platforms are mapped onto the same timeline. Further encoding processing transforms the original behavioral events and text content into machine-understandable and computable high-dimensional feature sequences, eliminating data heterogeneity and extracting and preserving deep spatiotemporal semantic information. Through two parallel attention paths—near-window and global—intelligent and dynamic association between behavioral sequences and text semantic sequences is achieved. The near-window path captures local temporal correlations, while the global path addresses information sparsity and long-range dependencies. The fusion weights are adaptively adjusted based on the effective information content of the near-window path, effectively filtering out the semantic evidence most relevant to the behavior and suppressing irrelevant noise. This results in a highly condensed, semantically consistent user-level fusion feature vector that accurately reflects user behavior. Ultimately, by leveraging the classifier's powerful discriminative capabilities, the system achieves automated and high-precision identification of malicious users, significantly enhancing the ability to identify users exhibiting spoofing and sparse behavior. This enables early warnings before the emergence of substantial attack artifacts such as malicious code, thus achieving more timely and accurate identification of malicious users.

[0029] In some embodiments, the step of aligning the behavioral data and the text data with a time reference specifically includes: using the time span of the behavioral data as a reference interval, linearly discretizing the timeline into several consecutive time steps; mapping each behavioral event in the behavioral data to the corresponding time step according to its occurrence timestamp; mapping each piece of text content in the text data to the corresponding time step according to its publication timestamp; and using a mask to identify cases where no behavioral event or text content is observed at any time step.

[0030] Specifically, the user identifier field is read from the cleaned Parquet file. Tweet identifier timestamp With text content and by user identifier With timestamp Ascending order is a stable sorting method.

[0031] During the text encoding stage, the pre-trained BERTweet model (vinai / bertweet-base version optimized for tweets) and its accompanying tokenizer are used, with BERTweet's built-in `normalizeTweet` function enabled for normalization. Subsequently, the tokenizer converts the text into tokens, which are then uniformly processed to a fixed length. For sequences that are too short, padding is added to the right to reach the specified length; for sequences that are too long, they are truncated from the right. Finally, an appropriate batch size is set for batch processing to improve computational efficiency.

[0032] The model performs forward computation using both eval and gradient-free inference modes, prioritizing the hidden state of the first and last tokens of the sequence as the semantic vector for each tweet. All vectors are then stacked... Write out the float32 matrix, and at the same time... , Compared with the original timestamp The resulting index table is saved as a separate Parquet file, maintaining row-level alignment with the vector file.

[0033] To ensure consistency of user identifiers across platforms, Twitter initializes its numerical identifiers before constructing the sequence. Mapped to GitHub login names, and grouped and aggregated accordingly. Then, based on the generated alignment window... With a fixed length L, the tweet timestamps t of each user u are linearly bucketed. This involves calculating the relative position and mapping it to a bucket index, then truncating the index to [0, L-1] to ensure that all tweets fall into a specific time bucket, achieving a one-to-one correspondence with GitHub in the time dimension. The bucket index is then defined. Defined as: ; In the formula: For bucket indexing functions, For timestamps, The start time of the observation window. The end time of the observation window. The sequence has a fixed length.

[0034] Then, the vectors of multiple tweets from the same user within the same time bucket are averaged and pooled to generate a length of L with dimensions at each step. time series The time steps without tweets are padded with a fixed placeholder value of 1.0, and the mask vector is marked as mask=True in parallel to automatically mask them during subsequent cross-attention and normalization calculations. The final result is the same as the result from the GitHub side. A text sequence representation that is aligned one-to-one in the time dimension.

[0035] This application linearly discretizes the timeline into continuous time steps of fixed length and aligns data from two platforms, establishing a unified and accurate time reference for subsequent temporal feature analysis and cross-attention computation. By using a mask to identify the time steps of unobserved data, the interference of padding noise on model computation is effectively shielded, ensuring the accuracy and robustness of subsequent sequence processing and attention fusion, and laying a solid foundation for the effective alignment and fusion of cross-modal information.

[0036] In some embodiments, the encoding process specifically includes: extracting four types of feature sequences for the behavioral event: event type, normalized time interval, associated repository information, and user context profile; mapping the four types of feature sequences to the same hidden space through a linear projection layer; inputting the projected features into a multi-layer encoder, where each encoder sequentially performs multi-head self-attention calculation within each type of sequence, cross-attention fusion across the four types of features at the same time step, and residual connection and layer normalization processing; concatenating and linearly mapping the four output features along the feature dimension to output the behavioral feature sequence; using a pre-trained language model, semantically encoding the text content mapped to each time step, and pooling multiple text content vectors within the same time step to output the text semantic sequence.

[0037] Specifically, the code portion of Trans-BL is used to model user behavior on GitHub and construct behavioral features.

[0038] The event sequence (event_sequence), the normalized time interval sequence (time_sequences), the repository interaction sequence (repository_sequence), and the context profile (event_context_features) are preprocessed separately, and the step size of the four sequences is unified to L.

[0039] The four sequences are each linearly projected into a unified latent space. A fixed sinusoidal position code is superimposed to preserve global timing.

[0040] Then, two encoder layers are stacked. Each layer first applies temporal multi-head self-attention to the four sequences to characterize the long-term dependencies within each modality. Then, cross-feature attention is used to interactively fuse time / event / warehouse / context information at the same time step. The outputs of the two layers are combined with residual connections, layer normalization, and feedforward network FFN, and dropout is applied to suppress overfitting.

[0041] The propagation along the mask occurs between layers, participating in attention and normalization calculations only at effective time steps to mask the 1.0 padding noise. Finally, the four representations are concatenated along the feature dimension. And through a linear mapping back This yields a behavioral sequence representation that evolves over time. .

[0042] This implementation borrows the encoding idea of ​​Trans-BL and removes the BiLSTM on the decoding side so that the GitHub side representation can be stably output as an offline feature extractor and saved as github_sequences_Hg.npy. At the same time, it exports an effective bitmask gh_mask consistent with time_sequence for subsequent cross-attention fusion stage.

[0043] This application extracts multi-dimensional features from behavioral events, including event type, time interval, repository information, and contextual profiles, and performs parallel encoding and cross-feature fusion. This enables a deep, multi-perspective characterization of complex patterns in user behavior, enhancing the richness and discriminative power of behavioral sequence representations. Simultaneously, by utilizing a pre-trained language model and pooling operations to process text, it effectively captures the semantic intent of users at different time steps. This encoding method allows behavioral and textual features to be fully expressed in a high-dimensional space, significantly improving the accuracy of the association between behavioral and textual features.

[0044] In some embodiments, the step of using the behavioral feature sequence as a query and the text semantic sequence as a key and value, fusing them through two cross-attention paths—near-window and global—and dynamically adjusting the fusion weights based on the number of valid information retrieved in the near-window path, specifically includes: projecting the behavioral feature sequence and the text semantic sequence onto a common feature space through linear layers and superimposing sinusoidal positional encodings to generate query vectors, key vectors, and value vectors; constructing near-window attention paths and global attention paths based on valid bitmasks; calculating the attention weights of the query vector and key vector in the near-window attention paths and global attention paths respectively, and generating corresponding path output vectors; for each time step, calculating adaptive weights based on the number of valid candidates in the near-window path, and weighting and fusing the two output vectors to obtain a cross-modal fusion feature representation for each time step; and converging the fusion feature representations of all time steps along the time dimension to generate the user-level fusion feature vector.

[0045] This application provides a comparable benchmark for cross-modal attention computation by projecting bimodal features onto a common space and superimposing positional encoding. Based on effective bitmasks, a dual-pathway is constructed and attention is computed, enabling directional and selective information retrieval from behavioral sequences to text sequences. Through adaptive weighted fusion based on the number of effective candidates near the window, and finally convergence along the time dimension, it intelligently balances local relevance and global contextual information, generating a user-level feature vector that contains both details and reflects the overall behavioral pattern. This provides a highly condensed and information-rich discriminative basis for the final classification.

[0046] In some embodiments, the steps of constructing the near-window attention path and the global attention path specifically include: constructing a visibility mask matrix for the near-window attention path, such that for each time step i in the action sequence, only key-value pairs in the text sequence that are located within the local time window and have valid masks are visible; constructing a visibility mask matrix for the global attention path, such that for each time step i in the action sequence, all key-value pairs in the text sequence with valid masks are visible; and calculating attention weights based on the visibility mask matrices in the two paths, performing the calculation only at the visible positions of the matrix.

[0047] Specifically, an adaptive cross-attention fusion mechanism is used with behavior as the query to obtain a feature vector that combines behavioral features and tweet features from GitHub.

[0048] GitHub behavior sequence To query Q, a text sequence of the same user and scale. Perform cross-attention fusion on key K and value V.

[0049] First, the behavior sequence With text sequence Each is mapped to a common space via linear projection. And superimpose a fixed sine position code: ; ; ; In the formula: For query vector, For key vectors, For value vectors, For behavior sequences, It is a text sequence. The length of the behavior sequence. The length of the text sequence. , , Let be the projection weight matrix. and This is a sine wave position encoding.

[0050] in, , , and For sinusoidal position encoding, the effective bit mask obtained by alignment is defined as follows: ,in It only participates in the calculation at valid locations.

[0051] query vector Key vector Value vector The number of heads H is divided into Effective bitmask obtained based on alignment Construct two attention paths to obtain the visibility mask matrix M for the two paths: Near-window path at each time step t i The above only allows attention to the time window of 12000. Valid tweets within: ; In the formula: This is the visibility mask matrix for near-window paths. For the forward window size, For the size of the back window, , This is the valid bitmask.

[0052] The strip window global path opens attention to all valid tweets: ; In the formula: This is the global path visibility mask matrix. , This is the valid bitmask.

[0053] When the number of heads is H, for each path The weighted score of the h-th head and attention weight for: ; ; The two pathways are obtained through multi-head attention. and and its attention score: ; ; Statistical analysis of near-window hits And calculate adaptive weights : ; ; Press again Perform soft fusion; In the formula: The weighted score for the h-th head. Let h be the attention weight of the h-th head. , , Let h be the query vector, key vector, and value vector of the h-th head. For each attention head, the feature dimensions, This is the output vector of the near-window path at position i. Let i be the output vector of the global path at position i. The number of valid candidates for the near-window path at position i. For adaptive fusion weights, It is a scaling constant. This is the fusion vector.

[0054] This application precisely controls the information retrieval scope of each attention path by constructing two visibility mask matrices: a near-window one and a global one. The near-window path focuses on capturing semantic cues that are adjacent to the current behavior and may have a direct causal relationship, while the global path is responsible for retrieving potential associations from all historical information. This allows the model to simultaneously consider the immediate context of the behavior and long-term interest patterns, thereby gaining a more comprehensive and in-depth understanding of the complex relationship between user behavior and semantic expression.

[0055] In some embodiments, before the weighted fusion, the highest attention score calculated by the near window attention path and the global attention path at the current time step i is determined; if the highest attention score is lower than a preset threshold, the fusion is abandoned and the fusion output of that time step is reverted to the original behavioral feature sequence.

[0056] Specifically, if the maximum attention of both pathways is below the threshold τ at a certain step, then it reverts to the original state. To avoid noise injection, the fusion results are robustly integrated through residual connections, layer normalization, and a feedforward network. Then, at each time step, the original GitHub behavioral features are incorporated. With the merged context Concatenate and linearly map along the feature dimension to The enhanced behavior sequence is formed by calculating the time dimension of GitHub and then averaging the effective masks to obtain the user-level fused feature vector. This is used to concatenate with static features and input into a classifier for discrimination.

[0057] This application sets an attention score threshold and introduces a fallback mechanism. When the system finds that the current behavior and text information lack sufficient evidence of correlation, it can automatically abandon unreliable fusion results and fall back to relying only on more credible behavioral features for judgment. This effectively avoids the risk of misjudgment caused by hard fusion under conditions of insufficient information or noise interference, and significantly improves the decision reliability and overall robustness of the system in edge scenarios.

[0058] In some embodiments, the step of inputting the user-level fused feature vector into a classifier for training and recognition specifically includes: concatenating the user-level fused feature vector with the static and statistical features of the target user extracted from the code collaboration platform to form a final feature vector; using a gradient boosting decision tree model as a classifier, training the classifier on the training set, and optimizing the model hyperparameters and classification decision threshold through a validation set; inputting the final feature vector of the target user to be identified into the trained classifier to obtain the probability that it belongs to a malicious user, and outputting the final discrimination result by comparing it with the decision threshold.

[0059] Specifically, after completing the cross-modal sequence fusion, the fused features are then concatenated with the static and statistical features from the GitHub side, and a decision tree-based discriminator is trained to complete the final malicious user identification.

[0060] First, the user profile feature table, submission time statistics feature, and repository interaction feature are merged by inner connection according to the user identifier user_id and the fusion vector. Then, a stratified random partition is performed according to the ratio of training set: test set = 7:3. Finally, 20% of the training part is set as a validation subset.

[0061] The discriminator employs a CatBoost binary classification model. To balance expressive power and generalization ability, an Optuna-based TPE sampler is introduced to jointly search for key model structures and regularization parameters: tree depth, learning rate, L2 leaf regularization, random perturbation intensity, sample and feature subsampling temperature and ratio, minimum number of samples per leaf node, and iteration rounds are used as search variables to automatically determine the optimal combination within the training-validation framework, and an early stopping strategy is employed to select a stable solution. After determining the hyperparameters, the model is refitted using the training and validation sets to obtain the final model. The posterior probability output by the model is used on the validation subset to automatically select a decision threshold based on a preset comprehensive discrimination criterion. If the constraints cannot be met, it degenerates into an F1-based alternative threshold, which is then used for discrete discrimination on the test subset. Ultimately, a unified terminal discrimination is achieved for static profiles, temporal statistics, and cross-modal fusion signals, generating application-oriented malicious user identification results.

[0062] This application achieves joint modeling of dynamic user behavior patterns and static attribute features by splicing dynamic temporal fusion features with static user profile features. It fully leverages the complementarity of information from different sources and of different natures, ensuring that the final judgment is based not only on what the user "did" and "said," but also on the background information of "who the user is," thus forming a more three-dimensional and comprehensive user profile and further improving the accuracy of malicious user identification and the confidence level of decision-making.

[0063] Please refer to Figure 2On the other hand, the present invention also provides a cross-platform malicious user identification system based on dual-path adaptive cross-attention, comprising: an acquisition module for acquiring behavioral data of a target user on a code collaboration platform and text data of a target user on a social network platform, wherein the target user is determined through the publicly disclosed association between the code collaboration platform and the social network platform; an alignment and encoding module for performing time-based alignment and encoding processing on the behavioral data and the text data to obtain a behavioral feature sequence and a text semantic sequence; a feature fusion module for using the behavioral feature sequence as a query and the text semantic sequence as a key and value, fusing them through two cross-attention paths, namely a near window and a global path, and dynamically adjusting the fusion weights according to the amount of effective information retrieved in the near window path, to obtain a user-level fused feature vector; and an identification module for inputting the user-level fused feature vector into a classifier for training and identification, and outputting a judgment result on whether the target user is a malicious user.

[0064] To verify the effectiveness of this invention in cross-platform malicious user identification tasks, a comparative experiment was conducted using Trans-BL as the baseline method, and the results were evaluated under the same data partitioning and feature configuration. Simultaneously, an ablation experiment was designed to examine the necessity and contribution of the proposed "near-window + global" dual-path cross-attention fusion and cross-modal splicing. The results and key intermediate representations are shown in Tables 1 and 2 below: Table 1: Comparison results between the present invention and Transbl.

[0065]

[0066] As shown in Table 1, the present invention significantly improves recall (+0.2056) while maintaining the precision at a basically unchanged rate, resulting in an overall gain of F1 (+0.0787) and AUC (+0.0683). This indicates that the cross-modal sequence alignment and dual-path cross-attention fusion can more fully recall malicious samples without significantly sacrificing discrimination accuracy.

[0067] Table 2: Comparison results of ablation experiments.

[0068]

[0069] As shown in Table 2, the overall performance is low when using only GitHub behavior sequence representation. After introducing cross-modal fusion, the F1 and AUC are improved to 0.7396 and 0.8986 respectively, which are 0.2009 and 0.0947 higher than "behavior sequence only", proving the effective supplementary role of semantic signals on the Twitter side.

[0070] Both "global routing only" and "local routing only" are significantly inferior to this solution: this solution improves F1 / AU by 0.1376 / 0.0914 compared to "global routing only" and by 0.1291 / 0.1160 compared to "local routing only". This result is consistent with the original design intention, that is, local routing is better at capturing causal relationships with temporal proximity, while global routing is responsible for long-term and intertemporal semantic alignment. The adaptive weighting of the two is significantly better than either single path.

[0071] On the other hand, the present invention also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the above-described cross-platform malicious user identification method based on dual-path adaptive cross-attention.

[0072] On the other hand, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the above-described cross-platform malicious user identification method based on dual-path adaptive cross-attention.

[0073] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0074] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the processes of the embodiments of the above methods. Any references to memory, storage, database, or other media used in the embodiments provided by this invention can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and RAMbus dynamic RAM (RDRAM), etc.

[0075] The above are merely embodiments of the present invention and do not limit the patent scope of the present invention. Any equivalent modifications made based on the content of the present invention's specification and drawings, or direct or indirect applications in related technical fields, are similarly included within the patent protection scope of the present invention.

Claims

1. A cross-platform malicious user identification method based on dual-path adaptive cross-attention, characterized in that the steps are as follows: include: Acquire behavioral data of target users on code collaboration platforms and text data of target users on social networking platforms, wherein the target users are determined through the publicly disclosed association between the code collaboration platform and the social networking platform; The behavioral data and the text data are time-referenced and encoded to obtain behavioral feature sequences and text semantic sequences. Using the behavioral feature sequence as the query and the text semantic sequence as the key and value, the feature vector is fused through two cross-attention paths: the near window and the global path. The fusion weight is dynamically adjusted based on the amount of effective information retrieved in the near window path to obtain a user-level fused feature vector. The user-level fused feature vector is input into a classifier for training and recognition, and the result of the judgment of whether the target user is a malicious user is output.

2. The cross-platform malicious user identification method based on dual-path adaptive cross-attention as described in claim 1, characterized in that, The step of aligning the behavioral data and the text data with a time reference specifically includes: Using the time span of the behavioral data as a reference interval, the time axis is linearly discretized into several consecutive time steps; Each behavioral event in the behavioral data is mapped to a corresponding time step based on its occurrence timestamp. Each piece of text content in the text data is mapped to the corresponding time step based on its publication timestamp; For cases where no behavioral event or text content is observed at any time step, a mask is used to identify the situation.

3. The cross-platform malicious user identification method based on dual-path adaptive cross-attention as described in claim 2, characterized in that, The encoding process specifically includes the following steps: Extract four types of feature sequences from the behavioral events: event type, normalized time interval, associated repository information, and user context profile. The four types of feature sequences are respectively mapped to the same hidden space through a linear projection layer; The projected features are input into a multi-layer encoder. Each encoder sequentially performs temporal multi-head self-attention calculation within each type of sequence, cross-attention fusion across four types of features at the same time step, and residual connection and layer normalization processing. The four final output features are concatenated and linearly mapped along the feature dimension to output the behavioral feature sequence. Using a pre-trained language model, semantic encoding is performed on the text content mapped to each time step, and multiple text content vectors within the same time step are pooled to output the text semantic sequence.

4. The cross-platform malicious user identification method based on dual-path adaptive cross-attention as described in claim 1, characterized in that, The steps of using the behavioral feature sequence as a query, the text semantic sequence as a key and value, fusing the data through two cross-attention paths—near window and global—and dynamically adjusting the fusion weights based on the amount of valid information retrieved in the near window path, specifically include: The behavioral feature sequence and the text semantic sequence are projected onto a common feature space through a linear layer, and sinusoidal positional encoding is superimposed to generate query vector, key vector and value vector; Construct near-window attention path and global attention path based on effective bitmask; In the near-window attention path and the global attention path, the attention weights of the query vector and the key vector are calculated respectively, and the corresponding path output vectors are generated. For each time step, an adaptive weight is calculated based on the number of effective candidates in the near-window path, and the two output vectors are weighted and fused to obtain the cross-modal fusion feature representation for each time step. The fused feature representations of all time steps are converged along the time dimension to generate the user-level fused feature vector.

5. The cross-platform malicious user identification method based on dual-path adaptive cross-attention as described in claim 4, characterized in that, The steps for constructing the near-window attention path and the global attention path specifically include: Construct a visibility mask matrix for the near-window attention path. For each time step i in the action sequence, only key-value pairs in the text sequence that are within the local time window and whose masks are valid are visible. Construct a visibility mask matrix for the global attention path, so that for each time step i in the action sequence, all masked key-value pairs in the text sequence are visible; Attention weights are calculated based on the visibility mask matrix in both paths, and the calculation is performed only at the locations where the matrix is ​​visible.

6. The cross-platform malicious user identification method based on dual-path adaptive cross-attention as described in claim 5, characterized in that, Before the weighted fusion, determine the highest attention score calculated by the near window attention path and the global attention path at the current time step i; If the highest attention score is lower than a preset threshold, the fusion is abandoned and the fusion output at that time step is reverted to the original behavioral feature sequence.

7. The cross-platform malicious user identification method based on dual-path adaptive cross-attention as described in claim 1, characterized in that, The step of inputting the user-level fused feature vector into the classifier for training and recognition specifically includes: The user-level fused feature vector is concatenated with the static and statistical features of the target user extracted from the code collaboration platform to form the final feature vector. The gradient boosting decision tree model is used as a classifier. The classifier is trained on the training set and the model hyperparameters and classification decision threshold are optimized using the validation set. The final feature vector of the target user to be identified is input into the trained classifier to obtain the probability that it belongs to a malicious user. The final discrimination result is then output by comparing it with the decision threshold.

8. A cross-platform malicious user identification system based on dual-path adaptive cross-attention, characterized in that, include: The acquisition module is used to acquire the target user's behavioral data on the code collaboration platform and the target user's text data on the social network platform, wherein the target user is determined through the publicly disclosed association between the code collaboration platform and the social network platform; The alignment and encoding module is used to perform time-based alignment and encoding on the behavioral data and the text data to obtain behavioral feature sequences and text semantic sequences. The feature fusion module is used to use the behavioral feature sequence as a query and the text semantic sequence as a key and value, and to fuse them through two cross-attention paths: a near window and a global path. The fusion weight is dynamically adjusted according to the amount of effective information retrieved in the near window path to obtain a user-level fused feature vector. The identification module is used to input the user-level fused feature vector into the classifier for training and identification, and output the judgment result of whether the target user is a malicious user.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the cross-platform malicious user identification method based on dual-path adaptive cross-attention as described in any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the steps of the cross-platform malicious user identification method based on dual-path adaptive cross-attention as described in any one of claims 1 to 7.